# Final Four: Having “nun” of the old-school analytics

### When doing analysis to get data that is most helpful, it is important to ask the best question you can

There may be two kinds of higher powers at work in this captivating March Madness tournament. Sister Jean has become the darling of the 2018 NCAA tournament for her powerful belief in offering better analysis to support her team’s coach, even as she occasionally breaks with data and turns to a different higher power.

As each of these teams are deep into statistical analysis to define their path to victory, let’s draw inspiration from Sister Jean and look at how asking better questions yields more actionable data that can strengthen team performance.

You could ask, “What is the average number of points Duncan Robinson gets in a game?” and the data would show that he averages 9.5 points a game. Or, “How many points does Ibi Watson score per game?” and the data would show that he averages 2.3 points per game.

Those answers tell you something, but it isn’t as informative as asking, “How many points does Robinson (or Watson) get per minute he is on the court?” It turns out that, this year, Robinson averages 25.9 minutes and Watson averages 5.6 minutes per 40-minute game. Translating these stats into a comparison reveals that Watson scores .4 points per minute, while Robinson scores .36 per minute in the game.

### (Next page: How to ask the right questions to get the right data)

Now this doesn’t mean Watson would score 16 points a game if he played the whole game. We’d have to look at the data on what happens when he plays eight minutes or more in a game. Maybe ask, “How many points does the opposing team score per minute that Robinson (or Watson) is on the court?” Here’s another good question: “How often does Michigan win when Robinson plays at least 20 minutes?”

Alternatively, we could train our algorithm to search for which players are on court when Michigan has the best ratio of points scored to points opponent scored. Even though we use machine learning, it all starts with experienced humans asking really good questions.

The higher ed connection

Basketball is a good analogy for higher education in some specific ways. For example, the game is dynamic and the team is made up of individuals who at any point in time could be at their highest or lowest mental and physical performance in the game, just like all students at Michigan are in the “game” of getting through graduation. Following the basketball analogy, points could be analogous to grades, skills, knowledge attained, or job offers. For the sake of this exercise we will equate points with grades.

You could ask, “What can we know about the students with the most As?” and the data would give you a persona or profile. This data tells you something but isn’t very actionable. You could go deeper and ask, “What can we know about students with the most As who are first-time college students vs. students with the most As who have prior college experience?” This will give slightly more actionable data if there are strong differences.

Or, switching it around, ask, “What resources (academic or otherwise) at Michigan do these two different groups of A students interact with?” It may turn out that students with prior college engage with the career center frequently from day one, or those with no prior college engage with social networks more frequently from the beginning. This could suggest a different freshmen or sophomore pathway to success.

Another intriguing point in the Wolverines’ example is the value of performance by Robinson vs Watson. Top down, in points per game and time playing, Robinson’s stats are higher. However, using a bottom-up approach, in points per minute Watson beats Robinson.

In much of higher education, including reliance on academic grades and standardized tests, the questions and data tracked miss important high-performing stats that reveal the whole student body’s contributions and performance in ways that can lead to more wins. As stated earlier, using machine learning makes it possible to ask better questions that require a tedious amount of data crunching, yet removes the possibility of human error and fatigue in getting to the better data.