Is It Possible to Predict March Madness Cinderella Stories?

Bruin Sports Analytics
Jun 15, 2022
7 min read

Updated: Jun 22, 2022

By: Alisha Dhar

Sports at different levels all across the world draw in increased excitement during the playoffs: players and fans are always on edge to see if their team has what it takes to take home a championship. The mens’ college basketball playoffs, however, draw in a different level of excitement and competition, both among teams and fans. The month-long frenzy, referred to as March Madness by basketball enthusiasts all over the country, is especially popular for a few reasons. For starters, the format of the competition places stakes at an all-time high. The best 64 teams in the NCAA compete in a single-elimination bracket-style tournament during a lull in the professional sports world.

What makes it special, though, is the possibility that any team has a chance to win, and any individual player has a chance of becoming a hero and putting their name and school on the map. We’ve seen it year after year, from St. Peters this past year upsetting three teams to make it to the Elite Eight, to UMBC beating Virginia in the only time a 16-seed has beaten a 1-seed, to #8 Villanova winning the national championship over #1 defending champion Georgetown back in 1985. Is there something in particular that makes these upsets happen? Can we predict what makes underdogs more likely to be a part of a “Cinderella Story” or even a single upset?

We can start by looking at a boxplot of a team’s seed versus how far in the postseason they reach using data from 2013 to 2019. While the plot does show a trend of better seeds making it farther in the postseason, we can see a few outliers and potential upsets. For example, the mean seed for the Final Four (3.429) is less than the mean seed for the Elite Eight (4.643), and there is a #7 seed that won the championship (UConn in 2014).

Note that for this article lower-seeded teams refers to teams with seed closer to 16 and higher-seeded teams refers to teams with seed closer to 1.

Comparing the Statistics of Lower-Seeded March Madness Teams

For these plots, I compared how far teams ranked between 8 and 16 reached in the tournament with different basketball statistics. One detail is that no team seeded below #8 (8-16) made it to the championship in the years included in the dataset.

This first plot shows the Adjusted Offensive Efficiency of lower ranked teams over their furthest round in March Madness. The adjusted offensive efficiency of a team is the estimated points scored per hundred possessions a team would have against an average Division 1 defense. We can see a general increase in the trends of adjusted offensive efficiency as the rounds get deeper – the median ADJOE for each round (from the Round of 64 to Final Four) is 107.3, 110.4, 111.7, 113.3, and 110.6. Interestingly, the median ADJOE for lower seeded teams that made the Final Four was lower than the ADJOE for Elite Eight and Sweet Sixteen teams. The upward trend apart from this, however, indicates that ADJOE is a potential indicator of the success of a lower-seeded team in the March Madness tournament.

We can next look at the Adjusted Defensive Efficiency of lower-seeded teams, which is a team’s estimated points allowed per 100 possessions played against the average D1 defense. From the Round of 64 to the Final Four, the median ADJDEs are 100.5, 96.3, 94.5, 97.15, and 93.6. This shows a decreasing overall trend, with the exception of the Elite Eight round. While this box plot shows a better ADJDE for the Final Four teams than Elite Eight teams, the previous box plot shows a better ADJOE for the Elite Eight teams than the Final Four teams. So from 2013-2019, underdogs that got eliminated in the Elite Eight had better offenses than underdogs that made it to the Final Four, and underdogs that got eliminated in the Elite Eight had worse defenses than underdogs that made it to the Final Four.

This next plot looks at the Free Throw Rate (FTR), or the ratio of free throw attempts to field goal attempts of a team. The boxplots do not show a clear trend between Free Throw Rate and Round Reached – the medians for each round (Round of 64 to Final Four) are 37.3, 35.6, 35.7, 38.8, and 35.5. This could indicate that a team’s general ability to draw fouls and get to the free throw line has a smaller impact on their success in the tournament. This would also be validated by the recent yearly decrease in average fouls per game: teams seem to be becoming more conscious of both fouls resulting in free throws and letting opponents go into bonus or double bonus. Note that while there is not a clear trend for free throw rate, there may still be a pattern with free throw percentage.

We can next look at the defensive side of this stat: the Free Throw Rate Allowed (FTRD), or how often a team’s opponents shoot free throws. While the median’s for the first four rounds are pretty similar (34.1, 34.1, 32.3, and 35.1), the median for the underdogs that reached the Final Four was noticeably lower at 28, indicating that these teams committed less fouls resulting in free throws. The box plot also shows that no lower-seeded team with a Free Throw Rate Allowed over 43.0 made it past the Round of 32. Considering all these factors, this indicates that while an extremely low FTRD is not necessary for success in these teams, it still benefits not to give your opponent opportunities for easier shots at the free throw line.

This boxplot depicts the Offensive Rebound Rate (ORB), or a team’s average percentage of offensive rebounds. This plot also shows an overall increase in ORB among round reached in the tournament. The medians for each round (again from Round of 64 to the Final Four) are 30.6, 30.8, 32.5, 33.1, and 33.5. A higher offensive rebound rate increases the time a team possesses the ball and consequently creates opportunities for second chance points, increasing a team’s chances of success.

We can again look at the defensive side of this stat: the Offensive Rebound Rate Allowed. We can see that the medians for the first four rounds are pretty similar, at 29.1, 29.1, 28.6, and 29.3, while the median for underdogs that made the Final Four is 26.5. Because of this lack of pattern, we can assume (from the data included) that Offensive Rebound Rate Allowed is statistically meaningless. This can be explained because this statistic could have more to do with the opponents’ offensive rebounding skills rather than the underdogs’ ability to prevent offensive rebounds. It depends mostly on offensive teams’ decisions between going for the rebound or transitioning into defense.

The two plots above show the Offensive and Defensive Effective Field Goal Percentages for underdogs. Offensive refers to the percentage of shots made out of shots attempted and defensive refers to the percentage of the opponents’ shots made out of shots attempted. Effective means that the statistic accounts for the difference between 2-point shots and 3-point shots. In both of these, there is no clear pattern between rounds, meaning that teams that progress further in the tournament do not have any clear advantages in field goal percentage or opponent shooting percentage (or ability to lower/maintain the opponent field goal percentage). This is surprising considering the ADJOE and ADJDE box plots both had patterns and these two pairs of statistics are closely related. One of the main explanations for this is free throws. Although we examined free throw rate earlier and found no patterns, free throw shooting percentage was not included in the dataset, so there is a possibility that underdogs that made it further in March Madness were better able to capitalize on these opportunities and had higher free throw percentages or faced teams with lower free throw percentages. Another explanation is offensive rebounding – we showed that teams with better offensive rebounding tended to make it farther in the tournament, creating opportunities for points without adding possessions. So, teams as a whole could have similar shooting percentages but those that had more shot opportunities were able to score more.

We can next look at turnover rate, or a team’s estimated number of turnovers per 100 possessions, and steal rate, or a team’s estimated number of steals per 100 opponent possessions. There is a positive correlation between Turnover Rate and Round (with medians 17.8, 17.2, 17.6, 17.95, and 19.1), meaning that teams that make it further in the tournament turn over the ball at a higher rate. This is surprising because turnovers create increased opportunities for opponents to score, so one would expect teams with higher success to have better ball handling skills. On the other hand, there is an expected positive correlation between Steal Rate and Round Reached, meaning that teams that make it further in the tournament have a greater proportion of steals per 100 possessions. The overall increase in averages (18.8, 19.4, 19.5, 19.1, and 20) suggests that teams with higher success in the tournament are better able to force turnovers and capitalize on easier baskets in transition.

This final box plot depicts the Adjusted Tempo for underdogs based on the round reached in the tournament. Adjusted Tempo refers to a team’s estimated number of possessions for a full 40-minute game. Although there is only a slight downward trend in the medians of each round (67.5, 66.9, 66.8, 67.45, and 65.6 from R64 to F4), we can see that the range of tempos gets smaller and smaller, especially compared to other boxplots shown. This could indicate that an Adjusted Tempo around 66 is a sweet spot – 66 possessions per 40 minutes is not too slow or too fast for a team, enabling them to consistently create shot opportunities while not rushing plays or turning over the ball.

Conclusion and Next Steps:

From the box plots created using statistics of lower-seeded March Madness teams between 2013 and 2019, we can see trends in certain statistics that suggest a greater importance. The fact that underdogs with higher success in the tournament had better Adjusted Offensive Efficiencies, Adjusted Defensive Efficiencies, Offensive Rebound Rates, and Steal Rates means that these are the skills most critical to beating higher-seeded teams in March Madness. One statistic it would benefit to look at in the future would be Free Throw Percentage, to delve further into the discrepancy between ADJOE/ADJDE and Offensive/Defensive Effective Field Goal Percentage.

In the future, I would like to look at data regarding the direct matchups between lower-seeded teams that upset higher-seeded teams. Is there a trend or pattern on both the winning and losing side of this matchup that helps underdogs move forward in the tournament? This would help determine whether the statistics a team succeeds in during the game are consistent with their strengths all season. In addition, it would be interesting to look at the specific performances of top players on both of these teams to examine further reasonings for these upsets. Cinderella stories and upsets are more common in March Madness than nearly every other sports playoffs and every year in March fans across the country try to pick the correct upsets and create the first ever perfect bracket. There is always uncertainty when it comes to college basketball, but can statistics beyond what was examined in this article be used to help predict upsets and Cinderella stories consistently?