# What Makes a Master?

### By: **Erik Chen and Brendan Zytowski**

**Introduction**

The Masters Tournament, traditionally held in the first week of April, is one of the most important tournaments every golf season. It is extremely prestigious because of the tournament's long history and pedigree as well as the signature green jacket for the winners. Interestingly, the Masters is the only major tournament every year that is held at the same golf course. Therefore, it can be extremely valuable for professional players to analyze which aspects of golf are the most important to winning the tournament.

In general, different golf courses can be advantageous for different types of players. For example, Torrey Pines’ South Course--the longest course on the PGA Tour at 7698 yards--can play easier for longer hitters. Another example is Harbor Town Golf Links, where the course’s narrow fairways lined with tall trees incentivize accuracy off the tee.

In this article, we will utilize advanced golf statistics such as Strokes Gained: Putting (SG: Putting) or Strokes Gained: Tee to Green (SG: TTG) to analyze which stats are the greatest indicators of success in the tournament. Strokes Gained is a relatively recent group of metrics used to explain which areas of a golfer’s game contribute the most to their score. For example, suppose a player on a par 4 drives into the fairway and hits a poor approach shot to just off the green followed by a hole out for birdie. The player gained a stroke against par, but each shot had a different contribution to their score. Their drive and chip positively contributed while their approach shot negatively contributed. By convention, their Strokes Gained: Off the Tee (SG: OTT) and Strokes Gained: Around the Green (SG: ARG) were positive for the hole, while their Strokes Gained: Approach (SG: APR) was negative for the hole. Tracking these metrics for 72 holes in a tournament, for nearly 40 tournaments a season tends to give a strong indication of which players excel in certain aspects of the game.

It is important to note that Shotlink, which records advanced metrics for the PGA Tour of America, is not present at the Masters so our analysis will focus on performance leading up to the Masters in each season.

**Strokes Gained: Tee to Green**

One important metric used in golf is Strokes Gained: Tee to Green (SG:TTG), which takes into account a player’s relative performance off the tee, approaching the green, and around the green relative to their playing field. SG:TTG is effectively an overall view of how a player compares to every other tour player in every category except putting. A common trend among dominant players is a high SG:TTG as they tend to drive the ball farther and with higher accuracy, land the ball on the green with close proximity to the hole, and scramble to save par more frequently than players with lower SG:TTG.

This boxplot shows the last 12 seasons’ rankings in SG: TTG coming into the Masters. Each year is represented by that year’s Masters’ winner in green. What stands out from this graph is how excellent these champions are when it comes to SG: TTG as every one of these players except for Charl Schwartzel was *at least* one standard deviation above the mean coming into the tournament. Furthemore, five of the champions are more than two standard deviations above the mean, with Bubba Watson’s first victory, Jordan Spieth’s victory, and Sergio Garcia’s victory occurring when Watson led the PGA Tour and when Spieth and Garcia were ranked fourth in the PGA Tour in SG: TTG. On the golf course, these players are the best at hitting greens in regulation (GIR) and consistently hit the ball closest to the hole on their approach shots as well as their recovery shots.

Augusta National Golf Course, where The Masters is held every year, is recognized as one of the toughest courses in the United States because of the dense pine tree woods surrounding the severely sloped fairways and small but rapid greens. As a result, Augusta plays favorably to those who can place themselves in the fairway with regularity because the greens are typically very difficult to approach, especially when scrambling from the woods or from bunkers. Masters champions often have a strong recovery game, meaning they excel at hitting great shots in difficult scenarios to give themselves an opportunity to save par or even make birdie. These players tend to have a high SG: TTG as they rarely drop strokes and are able to save strokes in situations that their peers would falter. For example, Bubba Watson hit arguably the greatest shot in Masters history in a sudden death playoff against Louis Oosthuizen (“Oos-tay-zen”) on the 10th hole, hooking the ball an incredible forty yards to within fifteen feet of the hole.

Another example is Tiger Woods’ famed pitch-in on hole 16 in 2005 to take a two-stroke lead over Chris DiMarco en route to his **fourth** Green Jacket. Notice how Tiger is aimed toward the far edge of the green and plays his ball *into* the slope, allowing the ball to make a full turn and feed directly into the hole. These are the types of shots that Augusta encourages and the shots that Masters champions have to accomplish, or at least come close to accomplishing. Generally, Masters champions need to be among the best, if not the best, in the world in the SG: TTG category.

**Strokes Gained: Putting**

For most amateur and professional golf players, putting is very different to ball striking because the two require different skills and mentality. For ball striking, power is an important attribute to have since it shortens shots for players. In general, a longer shot is usually more difficult than a shorter shot because a larger distance requires increased accuracy. On some golf holes, specifically par 5s, a player with more power has an immense advantage because they can reach the green in 2 strokes which sets up easy birdies.

However, once a player is putting, power is no longer necessary and is sometimes even detrimental. Instead, players focus on finesse and touch as well as knowledge on reading the greens. Thus, putting and ball striking are pretty dissimilar and a good putter does not have too much correlation with a good ball striker.

Strokes Gained: Putting is a statistic that compares the actual number of putts a player uses with the amount of expected putts needed. The amount of expected putts reflects the number of putts a scratch player would need to hole the putt. The statistic is useful for understanding how good a putter a certain player is.

Normally, people may think that putting is the most important part of the game. However, the statistics from the Masters Tournament indicates that is actually not the case. In each of the last 12 Masters, there has not been much indication that an excellent putter is essential for winning the tournament. Out of 12 winners, 5 of them have SG: Putting metrics that are negative, which means the player is actually losing strokes relative to the rest of the field. In addition, there have only been 3 winners that have SG: Putting numbers that are 1 standard deviation above the mean. All of this indicates that putting might not be as important as we think; however, we are only looking at the performances of previous winners which is not necessarily the best way to understand the required attributes for the Masters. For example, the 2nd place through 10th place players could all be above average putters, but they are not utilized in analysis.

From what we have seen so far, it seems like players should place a bigger emphasis on SG: Tee to Green than SG: Putting since most winners have SG: TTG stats above the competition while the winners’ SG: Putting stats are much more mixed. However, we should be aware that SG: TTG is actually the combination of 3 different SG statistics (Off the Tee, Approach the Green, Around the Green), so SG: TTG actually encompasses a lot more golf than SG: Putting does. Thus, the result that SG: TTG seems to correlate a lot more with success should not be too surprising.

To further illustrate and prove this idea, we will utilize statistical tests, specifically the one sample test for means on each year’s winner of the Masters. After performing a series of 12 t-tests, we reach the conclusion that 8 of 12 winners have a statistically higher result than 0. For our other statistic, SG: Tee to Green, 12 out of the 12 winners had a statistically higher result than 0, which backs up our idea that SG: TTG is a better predictor of success at the Masters.

**Predictive Power**

We have seen that among previous Masters winners a high SG: TTG is a strong indicator of success whereas a high SG: Putting is not as strong of an indicator of success. To test the importance of each category we fit a logistic regression model to classify players as top 10 finishers in the Masters or outside of the top 10.

We find that altogether, these statistics cannot be the only metrics used to predict a player’s success at the Masters as their respective p-values are all greater than 0.05 and the model has a pseudo R-squared of -3.242. This pseudo R-squared score indicates the ratio of the Log-Likelihood of our full model divided by the Log-Likelihood of the model under the null hypothesis is very large. This would be a good sign if both of the Log-likelihood values were positive, but since they are both negative we actually interpret this as a very poor fit for our model. In context, our model is so poor that a horizontal line would follow the trend of the data more than our model does. To visualize this we can construct a confusion matrix which displays the number of true negatives, false positives, false negatives, and true positives in a 2 by 2 grid, and we can graph a Receiver operating characteristic (ROC) curve.

Our confusion matrix tells us that out of 871 observations in our test data, our model predicted 834 true negatives, 2 false positives, 35 false negatives, and 0 true positives. Coupled with our ROC curve, we can see that our model has a high accuracy rate (roughly 96% according to our model’s score). This tells us our model effectively labels every entry as a negative, it lacks the ability to recognize any players who would finish in the top ten at the Masters. This suggests either it is difficult for our model to be effective because of the few positives in our data, or there is an inherent bias in our data because there are so few players in the top-ten each year relative to the field.

**Conclusion**

