Player Efficiency Rating in the AUDL: Developing an Impact Metric for Ultimate Frisbee

Bruin Sports Analytics
Jun 9, 2021
22 min read

Updated: Dec 12, 2021

By: Ian Geertsen

One of the most interesting and challenging aspects of sports analytics is developing single-number metrics, or metrics that attempt to look at a player holistically by boiling their value down to just one number. One such example of this is John Hollinger’s Player Efficiency Rating, or PER, a per-minute rating system using box score statistics to analyze and compare NBA players. As you can see here, the calculations this system makes are far from simple, although this is actually one of the least complex impact metrics of its kind; other advanced impact metrics like LEBRON or RAPTOR—funny names, I know—take into account things like player tracking, play-by-play data, on-off data, role-adjustments, and even make adjustments for luck. I personally find these and similar plus-minus basketball metrics to be fascinating, and they inspired me to attempt what I do not believe anyone has yet done: to develop a similar, yet comparatively rudimentary, impact metric for ultimate frisbee.

This project will look at data from the 2019 AUDL season—I would have loved to do a similar analysis using data from the club circuit, but unfortunately the statistics that would be necessary simply aren’t compiled. First, I will walk you through how I came up with the equation for my version of PER and why I included each piece of the system, then we will look at how the metric views the impact and performance of AUDL players. The end result should give us a per-point single value metric which quantifies overall performance and, ideally, provides accurate ratings.

The Equation

Box Component

First, we will calculate unadjusted PER, or uPER: uPER will be calculated based on aggregate data from the entirety of the season, and will later be adjusted to a per-point value and normalized. My equation, like many single number impact-metrics, relies heavily on box components; box statistics are simply data that can be easily tallied and presented by a box score, so for the NBA think points, assists, rebounds, steals, blocks, etc. For the purposes of this project, though, the ultimate box stats I incorporated are goals, assists, blocks, throwaways, drops, stalls and callahans—catching a disc in the opponent’s end zone while on defense. Not all of these actions are of equal weight, however.

Ultimate frisbee is unique as a team sport in that at least two players must work together to score—with the exception of the callahan, but let’s be reasonable here. While one player might contribute more to scoring a goal than another, at the end of the day to score a goal on offense someone needs to throw you the disc. Goals and assists also occur at a much higher frequency than the rest of the available box statistics—the average AUDL player gets about twice as many goals as blocks, for instance—and anyone who has played the game knows that simply scoring a goal or throwing an assist does not automatically make you the most impactful player on the field for that possession. So how should we value the act of scoring a goal or throwing an assist? Let’s compare this to an action such as getting a block; while the actions and positions of your teammates can certainly help you get a D, only one player will block the disc. This also brings up the more philosophical question of what is more important, a score or a turnover? You might be tempted to say a goal, but I would actually beg to differ. When ultimate is played at a high level, such as in the AUDL, the team with the disc is generally expected to score; this is why the aggregate plus minus for all players on offensive-lines in 2019 was +17,239.

Let’s imagine that player A makes a great deep cut and player B throws a huck for the goal. If we replace player A with less-talented player X, let’s say that he never gets open on the cut. Presuming that player B holsters—doesn’t throw the disc—the team on offense still keeps possession, and is still statistically likely to score eventually during the point. Now let’s say the same team is now on defense, and player A makes a great lay-out D. If we replace player A with player X who does not get the block, we go from gaining possession of the disc—and being favored to score—to remaining without possession—and being favored to get scored on. Obviously it isn’t always so black and white, but because of all of everything mentioned above, I would assert that blocks, drops, and throwaways are on average more impactful actions than goals and assists. Taking that all into account, here is what our equation looks like so far:

Unadjusted PER box component = (0.5)goals + (0.5)assists - (0.75)drops - (0.75)throwaways + (0.75)blocks - (0.75)stalls + (1.0)Callahans

We aren’t done here yet, though. While the box component of the metric provides a large source of input, I didn’t want the box statistics to overwhelm the other aspects of the metric, so to lessen the impact of the box component I set all of these statistics to the three-fourths power, as you can see here:

Unadjusted PER box component = (0.5)goals0.75 + (0.5)assists0.75 - (0.75)drops0.75 - (0.75)throwaways0.75 + (0.75)blocks0.75 - (0.75)stalls0.75 + (1.0)Callahans0.75

Passing Component

All of the statistics that I analyzed above either result in a turnover or a score; while these results clearly have a large impact on the game, it is also important that we look at what happens in between and leading up to these actions, namely the accuracy and volume with which players move the disc around the field. It takes precise throws, sharp cuts, and quick decision making to get the disc down the field, although if a player doesn’t throw an assist or catch a goal he could throw and catch the disc 100 times without it being recorded by traditional box statistics. To place value on the key actions necessary to advance the disc, we will look at the number of completions, completion percentage, number of catches, and catch percentage.

There are two main components to this exercise that make this difficult: first, determining the weight of importance between passing volume and passing quality, and second of all trying to balance a player’s completion percentage—expressed as a decimal—with the high number of throws and catches made throughout the season—reaching well over 500 in a few cases. It is worth noting that, at this point in the analysis, we are dealing with aggregate statistics over the course of the entire season; the metric will be adjusted to a per-point value later in the progression. To make sense of the problems brought up above, let’s look at a specific player. In the 2019 AUDL season, Pawel Janas of the Chicago Wildfire threw more than any other player, throwing the disc 752 times. He, like many other handlers, threw the disc at a much higher clip than the rest of his teammates: Tommy Gallagher was second on the team with 463 throws, and nobody else managed to break 260. This clearly demonstrates the problem that makes generating a single-unit metric for ultimate so difficult, which is weighing the difference in output of different roles. When it comes to things like scoring, assisting, and volume passing, offensive-line players are given a distinct advantage, while d-line players have an innate advantage when it comes to blocks and avoiding turnovers. While I have attempted to account for this in part by weighing blocks and turns as more important than goals and assists, offensive players are still going to be given the advantage in this regard. But is that necessarily a bad thing? Ultimate is an offensively weighted game, and if o-line players are scoring more and contributing more to winning then perhaps it’s alright that they are given an advantage. I don’t have concrete answers for these questions, but it’s important to know that I had these ideas in mind while I shaped this formula. It is also extremely difficult to quantify the actions and intangibles that make up great defense; if a player plays great defense on a cutter and the offensive player doesn’t get the disc the defender most certainly has helped his team, although this won’t be reflected by the box score.

Pawel Janas (pictured right) throws an around past a Madison Radicals defender. After just three AUDL seasons, Janas is already sixth all time in assists (262), and leads the AUDL in all-time assists per game (6.2) and completions per game (60.0). (Source: Pawal Janas Highlights)

Getting back to the passing component of my metric, while players like Janas who serve as primary handlers certainly contribute a large amount to the success of their offenses, I have doubts that this contribution is directly proportional. What I mean by that is that while Janas on average throws and catches the disc five times more than another player, that does not necessarily make him five times more valuable. To help balance out this statistic and remove a bit of the variance, I decided to set completions and catches to the three-fourths power each, bringing all the values closer together and in that way putting less emphasis on the statistic. With completion and catching percentage, though, I wanted to do the opposite. The accuracy with which you throw and catch the disc is extremely important, but because these stats are expressed as percentages the variation between players is quite low—the difference between completion rates of 0.95 and 0.75 is a lot smaller than the difference between 400 and 300 throws, although I would posit that the former difference is much more important. What I wanted to do was widen the variation of this statistic and in that way give the statistic a larger weight, which I did so by setting completion percentage and catch percentage to the third power each. So this is what the passing component of the metric looks like so far:

Unadjusted PER passing component = (completions0.75 * completion%3.0) + (catches0.75 * catch%3.0)

Because the number of completions and catches are so high compared to the other box statistics we looked at previously, we still have some work to do to prevent this portion of the metric from overpowering the box component. My solution to this problem was rather simple: multiply all of this by 0.05, making the values of the passing component of uPER much more similar to the box component of the metric. So now here’s what the formula looks like:

Unadjusted PER passing component = 0.05((completions0.75 * completion%3.0) + (catches0.75 * catch%3.0))

And here’s what the whole uPER equation looks like:

Plus-minus Component

Plus-minus data for ultimate functions largely in the same way as it does for other sports, with one fundamental difference: the presence of offensive and defensive lines. The probability of scoring is drastically different when on an o-line than on a d-line, and since plus-minus deals exclusively in whether or not you score a goal, o-line and d-line plus-minus should be treated very differently. Even the greatest d-lines, for instance, are likely to come out negative in plus-minus evaluations because of the nature of the sport—it’s a lot harder to score when you start on defense, clearly. So how should we go about doing this? For starters, I decided that I wanted to seperate the plus-minus data for each player’s o-line and d-line points and compare each player’s data to that of the league average, respectively. To do this, I needed to see what each player’s average o-line and d-line plus-minus values were, which I can see by taking a player’s o-line and d-line plus-minus over the entire season and dividing each by offensive and defensive points played, respectively. All in all, by comparing player’s plus-minus to league average o-line and d-line values we should eliminate the inherent advantage that o-line players are given when dealing with plus-minus values. This is what it looks like so far:

Unadjusted PER plus-minus component = (o-line plus-minus / o-line pts played) - [league avg (o-line plus-minus / o-line pts played)] + (d-line plus-minus / d-line pts played) - [league avg (d-line plus-minus / d-line pts played)]

It is interesting to note that the league average of d-line plus-minus divided by d-line points played is negative, as the expected value of a d-line possession is negative, meaning that we actually are actually adding the league average because we are subtracting a negative value. As of now in our equation, we have our o-line and d-line plus-minus components weighted evenly, but this should obviously not be the case. We need to weight these components according to how many points the player actually played on each line, something we can easily do by multiplying the o-line portion by offensive points played and the d-line portion by defensive points played.

The plus-minus component of this PER metric is unique in that it is the only portion that deals not with individual data, but with team statistics. This is both a benefit and a curse, as it allows you to capture the intangibles that box statistics can not, although your data is also influenced by the six other players on the point. This gives a big advantage to players playing on good teams, which is not ideal for a metric that is attempting to evaluate players as objectively as possible. Still, the whole purpose of metrics like this are to evaluate the value a player brings to their team, and measuring team performance when that player is on the field is a simple yet effective way of doing just that. I have one last piece to add to the equation, though; because of this inherent flaw in the statistic, and in order to bring the average values and variation of this component closer to that of the other components, I decided to multiply everything in this equation by 0.1. Now let’s take a look at how our formula is shaping out:

Unadjusted PER plus-minus component = 0.1(o-line pts played) * (o-line plus-minus / o-line pts played - [league avg (o-line plus-minus / o-line pts played)]) + 0.1(d-line pts played) * (d-line plus-minus / d-line pts played - [league avg (d-line plus-minus / d-line pts played)])

The last piece of this puzzle is more logistical in nature than anything: what to do with the players who have played no o-line or no d-line points? As the formula involves dividing by points played, I need to replace these zeros with a natural number. But because I am also multiplying by points played, it does not matter what this number will be because I am multiplying that portion of the equation by zero. To rectify this, I added a simple if statement, as you can see below. Kind of confusing I know, but here’s what that looks like on paper:

Unadjusted PER plus-minus component = 0.1(o-line pts played) * (o-line plus-minus / (o-line pts played, if o-line pts played > 0, else 1) - [league avg (o-line plus-minus / o-line pts played)]) + 0.1(d-line pts played) * (d-line plus-minus / (IF d-line pts played > 0, THEN d-line pts played, ELSE 1) - [league avg (d-line plus-minus / d-line pts played)])

And finally, here is what our entire uPER equation looks like at this point:

Unadjusted PER = (0.5)goals 0.75 + (0.5)assists0.75 - (0.75)drops0.75 - (0.75)throwaways0.75 + (0.75)blocks0.75 - (0.75)stalls0.75 + (1.0)Callahans0.75 + 0.05((completions0.75 * completion%3.0) + (catches0.75 * catch%3.0)) + 0.1(o-line pts played) * (o-line plus-minus / (o-line pts played, if o-line pts played > 0, else 1) - [league avg (o-line plus-minus / o-line pts played)]) + 0.1(d-line pts played) * (d-line plus-minus / (IF d-line pts played > 0, THEN d-line pts played, ELSE 1) - [league avg (d-line plus-minus / d-line pts played)])

Pulling Component

While much smaller in importance than the rest of the components mentioned above, pulling is a unique skill which plays a role in winning and, therefore, must be evaluated. Pulling is essentially a football kickoff but for ultimate; the team starting on defense throws the disc to the offensive team, trying to get the disc as far down the field and with the most hang time possible while keeping it in bounds. Much as with the passing component, the first thing we must do is to balance the quantity of pulls taken with the quality of those pulls. For quantity, we can look at pulling in terms of how many pulls are taken in addition to the proportion of points on which you pull. Again, very similarly to passing, I do not believe that the relationship between quantity of pulls and value is directly proportional, so a player who pulls twice as much is not necessarily twice as valuable. To account for this, we can take total pulls and set the number to the one-fourth power, in addition to multiplying this number by the proportion of pulls to points played.

Here Jonathan Nethercutt throws the opening pull for his Raleigh Flyers, signifying the start of the point and the start of the game. Pulling on nearly 40% of his points played, Nethercutt ranked 28th in my metric’s per-point pulling rankings among eligible players. (Source: @theAUDL on twitter)

While measuring the volume of pulls seems relatively straight forward, how should we go about measuring the quality of pulls? Pull quality has two general, easily traceable characteristics: whether it is inbounds, and its hangtime. We know players’ average pull hang times, but because pulling is such a small part of the game—and to ensure that players who only pull rarely aren’t given an advantage—we must make some adjustments. Most players’ average pull hang times range from between four to eight seconds, but to lower this value and lower the value of the pulling component in general, I decided to subtract 3.0 from each players’ average pull hangtime. This has the effect of making the pulling component less important in relation to the other components, although because I used subtraction instead of division the distances between players remains the same—basically, I wanted to lower all values without decreasing the variation. We run into another problem, though, with low-volume pullers. Some players have only pulled a handful of times throughout the season, which can lead to some wildly inflated hang times. To counteract this, I decided to include a threshold of more than 20 pulls—players who do not meet this threshold over the course of a season are contributing very little to their teams by pulling, and I didn’t want the formula to place high value on players who pull at very low volume but with good results. Players who do not meet the 20 pull threshold will have their hang time values cut in half. So here’s what the pulling component formula is as of now:

Unadjusted PER pulling component = (# pulls / pts played) * (# pulls1/4) + (IF # pulls > 20, THEN avg hang time - 3.0, ELSE 0.5(avg hang time -3.0))

Finally, we need to incorporate whether or not the pulls land inbounds. At first this is simple enough, we just need to add in the pulling accuracy percentage, or number of inbound pulls divided by the number of total pulls thrown. We again have to decide how we want to weight pulling accuracy with the other component of overall pulling quality, which is hang time. Even with the subtraction of 2.5 pulling hangtime is likely going to be expressed as a number somewhere between 2 and 6, which pulling accuracy is expressed as a percentage. Because pulling accuracy is given a smaller numerical value but is likely the more important aspect to pulling, I decided to raise the pulling accuracy percentage to the third power. This spreads out the variation of results between players, and by doing so adds comparative value to the best pullers and detracts value from the worst. Finally, because some players do not throw any pulls over the course of the season, we must insert an if statement to ensure that the equation runs smoothly (so that it doesn't divide by a zero). Here’s what all of this put together looks like:

Unadjusted PER pulling component = (# pulls / pts played) * (# pulls1/4) + (IF # pulls > 20, THEN avg hang time - 3.0, ELSE 0.5(avg hang time -3.0)) * (# inbounds pulls / (IF # pulls > 0, THEN # pulls, ELSE 1))3

Finally, we can look at the formula for unadjusted PER in its entirety! First, though, we divide everything we have by total points played, converting uPER from an aggregate measurement to a per-point one.

Unadjusted PER =

(1 / points played) *

{ (0.5)goals 0.75 + (0.5)assists0.75 - (0.75)drops0.75 - (0.75)throwaways0.75

+ (0.75)blocks0.75 - (0.75)stalls0.75 + (1.0)Callahans0.75

+ 0.05((completions0.75 * completion%3.0)

+ (catches0.75 * catch%3.0))

+ 0.1(o-line pts played) * (o-line plus-minus / (o-line pts played, if o-line pts played > 0, else 1) - [league avg (o-line plus-minus / o-line pts played)])

+ 0.1(d-line pts played) * (d-line plus-minus / (IF d-line pts played > 0, THEN d-line pts played, ELSE 1) - [league avg (d-line plus-minus / d-line pts played)])

+ (# pulls / pts played) * (# pulls1/4)

+ (IF # pulls > 20, THEN avg hang time - 3.0, ELSE 0.5(avg hang time -3.0))

* (# inbounds pulls / (IF # pulls > 0, THEN # pulls, ELSE 1))3 }

Now just a few more details and we can get into the results. First and foremost, the dataset of players I have includes many players who only played a handful of points; since this metric will end up being a per-point valuation, this can lead to some very inflated results. To fix this problem, I put in place a threshold of at least 30 points played, meaning that only players who have played over 30 points will be included. This goes for calculations of league averages within the metric too. And finally, I wanted to normalize the data so that the league average for the PER metric is 15—we can do this by taking each player’s uPER, dividing by the league average of uPER, and multiplying it by 15. Normalizing in this way cleans the data up a little and makes it much more palatable, as the average PER value would otherwise be a decimal after the per-point adjustment. Here is what that looks like:

PER = uPER * (15 / league avg uPER)

The Results

For starters, let’s take a look at how my metric evaluates the top players in the game; more specifically, the top 25 players:

Top 25 Players According to PER

Feel free to access the entirety of this data at any time here, also linked at the bottom of the article. While there are some familiar names on this list, one of the things that stands out the most with my new metric’s ranking is the lack thereof. Because my ultimate PER is an estimate of per-point value, the door is left open for players who played a small number of points but had a high impact to be rated very highly, a pattern that is exemplified by the top of this list. Three of the top five players on this list—Nethercutt, Cashman, and Nelson—didn’t clear 100 points on the season, and the former two both played less than 50 points. While I made a point of not over-emphasizing the impact of volume when making the PER formula, these results show that I may have gone too far in this direction, as players who have an unquestionably large impact are often rated lower than other players who performed well on a much smaller sample. This is exemplified by the placement of Ben Jagt, named the MVP of the AUDL after having a monster season, at just 23rd on this list. Out of the top 25 players, eight of them played less than 100 points; raise the points-played threshold and remove these players, however, and this is what the top 25 would look like:

Top 25 Players with 100-point Threshold

Quinn Snider of the Minnesota Wind Chill lays out to make a difficult catch. The electrifying rookie scored 29 goals in just five games, good for second in goals per game among all AUDL players during the 2019 season. Snider managed to earn a spot on the All-AUDl rookie second team despite his shortened season; he was the only player to make the all-rookie team while competing in less than eight games. (Source: Trent Erickson)

While still far from perfect, the rankings provided by this list seem to be slightly more in touch with reality than the 30 points-played threshold list. Going back to my original list, though, lets breakdown why each player was ranked where they were based on their performances in each of the four components of the metric:

Breakdown of Top-25 by Component

As you can see, no one player dominated in all four of the categories, or even in more than one category, as no player was ranked in the top ten in multiple components. Three of the highest ranked players, Nethercutt, Snider, and Nelson, achieved this by performing very well in the box or passing components; this is what we would expect, as these two components were given the largest weighting compared to the others. However, this does not hold true for the third ranked player on our list, Pat Cashman. Who? Yes. Cashman’s valuation this high on the list is mainly due to his highly valued pulling, but at the end of the day his presence illustrates a mistake, or a flaw in the matrix if you will. While he also performs well in the box component, Cashman actually gets as much value out of his pulling as his box score, making him an outlier and showing that my ratings system isn’t perfect after all—much to everyone’s surprise, I’m sure. I could continue with this pattern of analysis down the list, but I feel it's more important to understand how each component of the metric compares in weighting and importance to the other components, something that I hope the following graphics will help to illustrate:

These histograms show buckets, or ranges of values, on the x-axis and how many players are in each bucket on the y-axis. Let’s look at the pulling component graphic as an example; the range of -1.25–0.63 includes so many players because most players never pull, or do so very rarely. These histograms reveal some very interesting patterns; for one, the extreme right-tailed nature of the passing component shows us that most players are grouped together in the four to nine points range, while there are a few outlying players all the way out into the twenties. This is actually exactly the kind of pattern I would’ve expected from the passing component, as it makes sense that a few primary o-line handlers would have a much higher performance than the rest of the league in this category. In contrast, the box component histogram is more center/left-tailed, largely because a number of players finished deep in the negative. Given the nature of this component—players can lose points for throwaways, drops, and stalls—this also makes some sense. Looking at these two components, we can see that the mode of the box histogram is at around 12, while for the passing histogram is around six. But because of the left-tailed nature of the box component and right-tailedness of the passing component, these two portions of the metric end up having very close averages, as you can see:

Data on each PER Component

The table above shows us that the box and passing components of the metric are given the most weight, as I hoped they would when making the formula. It also shows that the box and plus-minus components are the most variable, which also makes sense given the wide birth of positive and negative values coming from each of these components. This table demonstrates some of the flaws in my metric as well, though. First of all, the negative average for the plus-minus component is a little disconcerting; this is likely a result of my removing nearly 150 players who did not meet the points-played criteria from the dataset, although it does reveal a flaw as the average of the plus-minus portion should in theory have been equal or closer to zero. The fact that the maximum value of the pulling component was 35.98, higher than that of the box component, illustrates another problem with my formula. This value was given to David Baer, who played only 3 games and 38 points. All in all there were four players who scored values of over 20 from the pulling component of the metric—it’s good that there weren’t more, but that’s four too many.

Players like the aforementioned Baer are able to take advantage of the per-point nature of this metric to climb up the rankings, but what if we look at this data from a different perspective instead? We can do this if we sort by unnormalized, aggregated PER data, or data that isn’t based on valuing players on a per-point basis but rather on volume across a whole season. This leads to the sheer number of points played having much more impact, but it also leads to an interesting set of rankings:

Aggregate uPER Rankings

While he ranks just 23rd in per-point PER, the AUDL’s reigning MVP Ben Jagt (pictured left) led the league in aggregate PER, ranking first among all players in my aggregated plus-minus component and third in my aggregated box score component. Jagt achieved this through his versatile yet dominant offensive play; recording both 56 goals and assists, Jagt ended up finishing fourth in the AUDL in goals and seventh in the league in assists. (Source: Ultiworld)

First off, it is worth noting how the per-point and aggregate rankings differ from a team perspective. The per-point rankings have four of their top 25 players coming from the Madison Radicals, three from the DC Breeze, and three from the Raleigh Flyers. The Radicals finished just 6-6, good for 12th in the league and fifth in the Midwest Division, while the Breeze were tied for eighth and third in their division and the Flyers finished tied for second and first in their division. The aggregate rankings, however, have four players from NY Empire, four from the SD Growlers, and three from the DC Breeze and Indianapolis Alleycats each. New York Empire dominated the season, going 15-0 and winning the championship, while the Growlers finished tied for second and first in their division and the Alleycats finished with a record tied for fourth and also won their division. The teams which the aggregate ranking had the most players from, NY Empire and the SD Growlers, each had just one player on the per-point rankings. Overall, the per-point rankings included just seven of their top 25 players from the four division-winning teams, while the aggregate rankings had 13 of their top 25 players coming from one of these four teams.

Aggregate PER also features a top 25 entirely made up of players who played more offensive points than defensive ones, while the original per-point rankings include five mainly d-line players—mostly a product of the list including players it probably shouldn’t. We can also look at the data from a positional point of view: the per-point PER top 25 is made up of 12 cutters, six hybrids, and seven handlers as classified by the AUDL, while the aggregate rankings are comprised of seven cutters, 13 hybrids, and five handlers. This gives the per-point top 25 a 1.71 ratio of cutters to handlers, while the aggregate players show a 1.40 ratio. The fact that there are more defensive players and more cutters from the original top 25 than the aggregate 25 again points to the fact that there may be weaker players on the per-point list that shouldn’t be there.

While this aggregated list appears to be comprised of better players and seems more accurate than the per-point rankings, is there any way we can test this? As a matter of fact, there is. For one, we can compare both my original rankings and aggregate rankings to a plus minus statistic as provided by UltiAnalytics. Their formula for plus minus is as follows: +1 for a goal, +1 for an assist, +1 for a D, -1 for a drop, -1 for a passer turnover (throwaway, stalled, misc. penalty), +2 for a callahan (+1 for D and +1 for goal), and -1 for being callahaned. While I would argue that this isn’t the most efficient way to rank and analyze players—obviously, that’s why I’m doing this after all—it provides a unique tool of comparison. Using all 601 eligible AUDL players, comparing my PER rankings with UltiAnalytics’ plus minus gives a covariance of 143.97 and a correlation coefficient of 0.621. This demonstrates a moderate correlation between PER and UltiAnalytics’ plus minus, but when we make the same comparison using my aggregated uPER we see a covariance of 93.60 and a correlation coefficient of 0.892, demonstrating a much stronger correlation. This correlation does come with some caveats, though, as both UltiAnalytics’ plus minus and my aggregated uPER are directly influenced by the number of points played. To demonstrate this, here are the average number of points played by the top 25 players from my original list, aggregated list, and UltiAnalytics’ plus minus: original - 151.70 points-played, aggregated - 270.84 points-played, and UltiAnalytics’ plus minus - 270.04 points-played.

Another way that we can test the accuracy of my lists is by comparing them to the 2019 AUDL All-Star and All-AUDL teams. These teams are comprised of the game’s top players, and making one of these teams provides a player with an objective stamp of approval. So let's see how some of the AUDL’s top players are evaluated by PER:

All-AUDL First Team

With an average PER rank of just 64.71, PER ranks these high-level players much worse than aggregate uPER, which has their average rank at 13.29. In fact, six out of the seven players were within the top six in aggregate uPER, showing that the aggregate model seems to be much more in line with the general consensus.

All-AUDL Second Team

While not as extreme, a similar story is told by the All-AUDL second team’s data. The average PER rank came in slightly less at 66.86, while the average aggregate uPER rank was only 17.43. The fact that the average ranks of both increased from the first to the second team, though, speaks to the validity of both measures.

AUDL All-Stars

The 2019 All-Star game saw fierce competition between teams drafted by selected captains Rowan McDonnell and Kevin Pettit-Scantling (pictured fourth from left). Team KPS won a 28-27 double overtime victory in the legendary game, and are captured celebrating above. (Source: The AUDL)

Once again, we see a similar pattern from the All-Star player data. The average rank for players in PER was 123.84, while the average aggregate rank was 17.80. Although it seems clear that All-Star and All-AUDl selections are more in line with our aggregate than our per-point data, we can still test in other ways too:

Aggregate uPER Rankings

The aggregate data has 14 All-Stars and 12 All-AUDL players, making these team selections significantly more in line with the aggregate rankings than the per-point rankings—the original rankings included just four All-Stars and All-AUDL players each. This certainly shows that the aggregate system produces more accurate rankings but I still have questions as to how valid this form of measurement actually is. Because we do not divide by points played when analyzing the aggregate data, this data is heavily influenced by the number of points you have played. We saw this earlier, when we saw that the top 25 players from the aggregate data averaged nearly 120 more points played than the top 25 players of the per-point PER data. It is true that the aggregate data highly values players in large part due to volume, but so do we! While this pattern points to possible flaws in my metric, it may also point to flaws in the way we compare and conceptualize players as well, as we often discount players who perform with high efficiency when achieved in smaller samples. On the other hand, durability and consistency over time are very valuable characteristics in ultimate and sports in general, and the players from the aggregate top 25 likely played more points in part because they are simply better players. Ultimately, it is really up for everyone to decide what they view as more valuable: quality performance in a small sample, or slightly lower performance in a higher sample.

I hope this article has helped you think about the game of ultimate and the ways in which we evaluate athletes a little differently—I know writing this has for me. And while the rating system I’ve compiled is far from perfect, I hope that there will be many more systems in the future better designed to help us analyze and compare ultimate frisbee players. And if you’ve somehow managed to read this far but don’t play ultimate already, I hope this piece has encouraged and enticed you to get on the field! That disc isn’t going to throw itself.

Sources: basketball-reference.com, theaudl.com, ultianalytics.com, ultiworld.com, wikipedia.com

Data: https://docs.google.com/spreadsheets/d/1_iNyn3makGo7_yxOCTQeytc83AipCnf14YQcS

6Ysxf0/edit?usp=sharing