By: Joshua Sujo
Masters, PGA Championship, US Open, and The Open. These are the four most important events in the golfing year. Winning one of these tournaments cements a golfer’s place in history. On the other hand, without winning one, a successful career will always have a gaping hole. What does it take to win one of these tournaments? In this article, we will use past majors to find what skills are needed to win each of these four tournaments. Using the results, we will predict the 2024 Major Champions!
Golf’s Four Major Championships
Golf’s four different majors are held on various courses in different locations. In addition, each major is run by a certain organization that set up its courses in different ways. The type of course and course setup play a big factor in the eventual winner of the tournament.
Masters
The Masters is run by the Augusta National Golf Club and is the only major that is held on the same course every year. Augusta National Golf Course is known for its lightning-fast greens with steep slopes, lush green fairways, beautiful azaleas, dogwood trees surrounding the holes, the famous green jacket, and a prestigious history of past champions. Since it is held on the same course every year, the players know what to expect and experienced players tend to have slight advantages.
PGA Championship
The PGA Championship is run by the Professional Golfers’ Association of America (PGA of America). The PGA chooses the golf course from a small group of well-regarded golf courses in the US, including Valhalla and Oak Hill. The PGA Championship is known for having the strongest field of golfers each year, bringing the best players from all around the world.
US Open
The US Open is run by the United States Golf Association (USGA) and is known for being the hardest test in golf. The course is also chosen from a small group of well-regarded courses in the US, including Pebble Beach and Pinehurst. The USGA is infamous for presenting extremely long golf courses with narrow fairways, thick rough, and hard, fast greens.
The Open Championship
The Open Championship (AKA The Open, The British Open) is the oldest and arguably the most prestigious golf tournament in the world. This tournament is run by the R&A, the organization the governs the Rules of Golf. It was established in 1860 and is played in a rotation of golf courses in Scotland, England, and Northern Ireland. The tournaments are known for its historic golf courses, thick fescue, and extreme weather conditions.
Because of the differences in styles between the majors, certain players perform much better in some majors over others. For example, Nick Faldo, a great English player from the 80s and 90s, won 3 Masters and 3 Opens but did not win the other two majors. Brooks Koepka, one of the best current players, has won 3 PGA Championships and 2 US Opens but has not had as much success in the other two majors. In this article, we will look at data from the past 10 years to determine the type of player each major suits best.
Strokes Gained System
Each golf hole is made up of a combination of longer and shorter shots that add up to give the score of the hole. However, the same score can be achieved in a variety of ways. Because of this, golf statisticians created the Strokes Gained (SG) system to quantify the value of each shot. Please see my previous article for an in-depth description on the Strokes Gained system written by Max. In summary, the Strokes Gained system allows us to rate how good a golfer is in each part of the game (ex: Off the Tee, Approach, Around the Greens, Putting, Total).
Data Collection
Strokes Gained data for all PGA Tour players from the past 20 years was collected on pgatour.com/stats. This includes data from all different statistical categories such as putting and driving. Unfortunately, this does not include players who played on other tours such as LIV, European Tour, or DP World Tour.
The major results data from 2014-2023 was collected from pgatour.com. The SG value of each player in the majors was computed by taking the difference between the player’s score and the average score for that round. This allows for courses of varying difficulty to be compared on the same scale.
The SG and Major datasets were merged to create a single dataset for each major that we could analyze.
What does the data say?
Correlations
Using the data, we want to find which SG category is most important for performing well at each major. Let’s look at the US Open as an example.
To find how important driving is for performing well at the US Open, we can find the correlation between AVG SG:OTT (Average Strokes-Gained: Off the Tee for the entire season) and their result at the US Open. A higher correlation means that a player who is better Off the Tee tends to translate to better results in the US Open and vice versa.
Since there are so many factors that affect golf performance each week, predicting golf victories is very difficult. For instance, no player has won 3 golf majors in a year in the past 20 years. However, in the past 20 years, a player has won 3 out of 4 tennis majors in a year 8 times. This variability tends to lead to very weak correlations.
Since it is difficult to determine if a correlation coefficient of 0.298 is strong or weak, we can compare the correlations across the different majors to find conclusions.
This graph shows the correlation coefficients between Strokes Gained (Total, Off the Tee, Approach, Around the Green, Putting) and each major’s results.
From the first group, we can see the strongest correlations between Strokes Gained:Total and the US Open. This means that, over the past 10 years, the US Open tends to bring the best players to the top of the leaderboard.
In the other categories, Off the Tee is most important for the PGA and US Open, Approach is the most important for the Masters, Around the Green is similar for all majors, and Putting is the most important for the US Open.
Predicting 2024
Using this data from the past 10 years, we can now make predictions for the 2024 majors. A linear regression model was used with Strokes Gained (Off the Tee, Approach, Around the Green, and Putting) as the input variables and Major Strokes Gained as the output variable. This creates a model with 4 numerical input variables and 1 numerical output variable. The model was trained on the results from 2014-2023. Here are the model coefficients for each major:
These results are similar to the correlation coefficients, but adjusted for the magnitude of each variable
Now that we have the model, we can test it on the SG data from the 2024 season to make major predictions for this year.
Masters Results
These are the Masters “power rankings” for 2024.The prediction variable is the expected average strokes gained for each round in the Masters based on their statistics throughout the season. A higher value predicts a better chance at success.
Fortunately, since the 2024 Masters has already been played, we also have each player’s result in the POS column. The model was able to predict three top finishers, Scheffler, Aberg, and Morikawa, in the top 7 positions in the model.
PGA Championship Results
These are the 2024 PGA Championship predictions. Scottie Scheffler is currently having one of the greatest seasons in PGA tour history and is leading in a multitude of categories. Expect him to be number 1 on all the model predictions.
The winner, Xander Schauffele, is the second highest on the model. 6 of the Top-12 finishers also ended up in the model’s Top-10 players.
US Open Results
These are the Top 15 players in the 2024 US Open predictions. The US Open has been going on during the writing of this article.
From the model coefficients for the US Open (see above), there is a heavy emphasis on Driving (Off the Tee). It is no accident that the top two finishers this year are Bryson DeChambeau and Rory McIlroy, two of the greatest drivers in the history of the game.
Open Championship Results
These are my 2024 Open Championship Power Rankings. Don’t be surprised if Scheffler, Schauffele, Matsuyama, McIlroy, or Morikawa lift the Claret Jug in July!
Overall Results
While all four models generally output the best overall players as the strongest candidates to win, each model has differences that favor certain strengths. Let’s compare the model results across the different majors to highlight these differences.
A few things to notice:
The model outputs stronger results (higher predictions) in the PGA Championship and US Open among the top golfers (Scheffler, Schauffele, McIlroy). Therefore, it is more likely for top players throughout the season to perform better in these two majors.
The Open Championship favors Strokes Gained: Around the Green (ATG) much more than the other majors. Hideki Matsuyama and Webb Simpson, the top two ATG players this year, are projected to perform much better in the Open. In contrast, Viktor Hovland, one of the worst ATG players, is projected to perform much worse in the Open.
The US Open favors Strokes Gained: Putting much more than the other majors. Denny McCarthy, one of the best putters this year, is predicted to perform better in the US Open than the other majors. On the other hand, Justin Thomas, one of the worst putters on tour, is projected to do much worse in the US Open.
Overall, while the top players are projected to perform the best in all the majors, each major favors certain strengths that can benefit some players over others.
Limitations and Future Steps
One of the major limitations of this model is missing data from current players on the LIV Tour and DP World Tour and former players on the European Tour. Since players on different tours play different events, the SG system cannot compare them. In addition, when making the predictions, we only have 2024 PGA Tour data. As a result, this year’s US Open Champion Bryson DeChambeau was not in the model predictions because he plays on the LIV tour.
In the future, we can also collect SG statistics from the LIV tour and DP World Tour and input them into the model. The result would be a separate LIV tour power ranking and DP World Tour power ranking.
Another limitation is the strength of field in different PGA Tour events. Since SG is determined using the average score from each event, a player’s SG value is very dependent on the strength of the other player’s in the tournament. For example, a mediocre player in tournaments with weaker players may have higher SG values than a similar player in tournaments with stronger players.
To address this issue, the OWGR (Official World Golf Ranking) can be used to determine the strength of the field, which can be incorporated into the SG model.
One way to improve this model is including a recency bias for a player’s Strokes Gained. Golf is a lot about current form, and players who are hot can going on a streak of high finishes. By incorporating a bias for recent events, the model can better predict who will perform well in the upcoming tournaments.
Conclusion
Strokes Gained data and major results reveal differing correlations between certain SG categories and certain majors. For example, stronger players Off the Tee tend to perform better in the US Open and PGA Championship. This corresponds with the thick rough and narrow fairways in these tournaments. Using machine learning, the past data and correlations can be used to predict which players will perform better in upcoming majors.
Comments