Bruin Sports Analytics

# Player Efficiency Rating in the AUDL: Developing an Impact Metric for Ultimate Frisbee

### By: Ian Geertsen

One of the most interesting and challenging aspects of sports analytics is developing single-number metrics, or metrics that attempt to look at a player holistically by boiling their value down to just one number. One such example of this is John Hollingerâ€™s __Player Efficiency Rating__, or PER, a per-minute rating system using box score statistics to analyze and compare NBA players. As you can see __here__, the calculations this system makes are far from simple, although this is actually one of the least complex impact metrics of its kind; other advanced impact metrics like __LEBRON__ or __RAPTOR__â€”funny names, I knowâ€”take into account things like player tracking, play-by-play data, on-off data, role-adjustments, and even make adjustments for luck. I personally find these and similar plus-minus basketball metrics to be fascinating, and they inspired me to attempt what I do not believe anyone has yet done: to develop a similar, yet comparatively rudimentary, impact metric for ultimate frisbee.

This project will look at data from the 2019 AUDL seasonâ€”I would have loved to do a similar analysis using data from the club circuit, but unfortunately the statistics that would be necessary simply arenâ€™t compiled. First, I will walk you through how I came up with the equation for my version of PER and why I included each piece of the system, then we will look at how the metric views the impact and performance of AUDL players. The end result should give us a per-point single value metric which quantifies overall performance and, ideally, provides accurate ratings.

**The Equation**

### Box Component

First, we will calculate unadjusted PER, or uPER: uPER will be calculated based on aggregate data from the entirety of the season, and will later be adjusted to a per-point value and normalized. My equation, like many single number impact-metrics, relies heavily on box components; box statistics are simply data that can be easily tallied and presented by a box score, so for the NBA think points, assists, rebounds, steals, blocks, etc. For the purposes of this project, though, the ultimate box stats I incorporated are goals, assists, blocks, throwaways, drops, stalls and callahansâ€”catching a disc in the opponentâ€™s end zone while on defense. Not all of these actions are of equal weight, however.

Ultimate frisbee is unique as a team sport in that at least two players must work together to scoreâ€”with the exception of the callahan, but letâ€™s be reasonable here. While one player might contribute more to scoring a goal than another, at the end of the day to score a goal on offense someone needs to throw you the disc. Goals and assists also occur at a much higher frequency than the rest of the available box statisticsâ€”the average AUDL player gets about twice as many goals as blocks, for instanceâ€”and anyone who has played the game knows that simply scoring a goal or throwing an assist does not automatically make you the most impactful player on the field for that possession. So how should we value the act of scoring a goal or throwing an assist? Letâ€™s compare this to an action such as getting a block; while the actions and positions of your teammates can certainly help you get a D, only one player will block the disc. This also brings up the more philosophical question of what is more important, a score or a turnover? You might be tempted to say a goal, but I would actually beg to differ. When ultimate is played at a high level, such as in the AUDL, the team with the disc is generally expected to score; this is why the aggregate plus minus for all players on offensive-lines in 2019 was +17,239.

Letâ€™s imagine that player A makes a great deep cut and player B throws a huck for the goal. If we replace player A with less-talented player X, letâ€™s say that he never gets open on the cut. Presuming that player B holstersâ€”doesnâ€™t throw the discâ€”the team on offense still keeps possession, and is still statistically likely to score eventually during the point. Now letâ€™s say the same team is now on defense, and player A makes a great lay-out D. If we replace player A with player X who does not get the block, we go from gaining possession of the discâ€”and being favored to scoreâ€”to remaining without possessionâ€”and being favored to get scored on. Obviously it isnâ€™t always so black and white, but because of all of everything mentioned above, I would assert that blocks, drops, and throwaways are on average more impactful actions than goals and assists. Taking that all into account, here is what our equation looks like so far:

*Unadjusted PER box component = (0.5)goals + (0.5)assists - (0.75)drops - (0.75)throwaways + (0.75)blocks - (0.75)stalls + (1.0)Callahans*

We arenâ€™t done here yet, though. While the box component of the metric provides a large source of input, I didnâ€™t want the box statistics to overwhelm the other aspects of the metric, so to lessen the impact of the box component I set all of these statistics to the three-fourths power, as you can see here:

*Unadjusted PER box component = (0.5)goals0.75 + (0.5)assists0.75 - (0.75)drops0.75 - (0.75)throwaways0.75 + (0.75)blocks0.75 - (0.75)stalls0.75 + (1.0)Callahans0.75*

### Passing Component

All of the statistics that I analyzed above either result in a turnover or a score; while these results clearly have a large impact on the game, it is also important that we look at what happens in between and leading up to these actions, namely the accuracy and volume with which players move the disc around the field. It takes precise throws, sharp cuts, and quick decision making to get the disc down the field, although if a player doesnâ€™t throw an assist or catch a goal he could throw and catch the disc 100 times without it being recorded by traditional box statistics. To place value on the key actions necessary to advance the disc, we will look at the number of completions, completion percentage, number of catches, and catch percentage.

There are two main components to this exercise that make this difficult: first, determining the weight of importance between passing volume and passing quality, and second of all trying to balance a playerâ€™s completion percentageâ€”expressed as a decimalâ€”with the high number of throws and catches made throughout the seasonâ€”reaching well over 500 in a few cases. It is worth noting that, at this point in the analysis, we are dealing with aggregate statistics over the course of the entire season; the metric will be adjusted to a per-point value later in the progression. To make sense of the problems brought up above, letâ€™s look at a specific player. In the 2019 AUDL season, Pawel Janas of the Chicago Wildfire threw more than any other player, throwing the disc 752 times. He, like many other handlers, threw the disc at a much higher clip than the rest of his teammates: Tommy Gallagher was second on the team with 463 throws, and nobody else managed to break 260. This clearly demonstrates the problem that makes generating a single-unit metric for ultimate so difficult, which is weighing the difference in output of different roles. When it comes to things like scoring, assisting, and volume passing, offensive-line players are given a distinct advantage, while d-line players have an innate advantage when it comes to blocks and avoiding turnovers. While I have attempted to account for this in part by weighing blocks and turns as more important than goals and assists, offensive players are still going to be given the advantage in this regard. But is that necessarily a bad thing? Ultimate is an offensively weighted game, and if o-line players are scoring more and contributing more to winning then perhaps itâ€™s alright that they are given an advantage. I donâ€™t have concrete answers for these questions, but itâ€™s important to know that I had these ideas in mind while I shaped this formula. It is also extremely difficult to quantify the actions and intangibles that make up great defense; if a player plays great defense on a cutter and the offensive player doesnâ€™t get the disc the defender most certainly has helped his team, although this wonâ€™t be reflected by the box score.

Getting back to the passing component of my metric, while players like Janas who serve as primary handlers certainly contribute a large amount to the success of their offenses, I have doubts that this contribution is directly proportional. What I mean by that is that while Janas on average throws and catches the disc five times more than another player, that does not necessarily make him five times more valuable. To help balance out this statistic and remove a bit of the variance, I decided to set completions and catches to the three-fourths power each, bringing all the values closer together and in that way putting less emphasis on the statistic. With completion and catching percentage, though, I wanted to do the opposite. The accuracy with which you throw and catch the disc is extremely important, but because these stats are expressed as percentages the variation between players is quite lowâ€”the difference between completion rates of 0.95 and 0.75 is a lot smaller than the difference between 400 and 300 throws, although I would posit that the former difference is much more important. What I wanted to do was widen the variation of this statistic and in that way give the statistic a larger weight, which I did so by setting completion percentage and catch percentage to the third power each. So this is what the passing component of the metric looks like so far:

*Unadjusted PER passing component = (completions0.75 * completion%3.0) + (catches0.75 * catch%3.0) *

Because the number of completions and catches are so high compared to the other box statistics we looked at previously, we still have some work to do to prevent this portion of the metric from overpowering the box component. My solution to this problem was rather simple: multiply all of this by 0.05, making the values of the passing component of uPER much more similar to the box component of the metric. So now hereâ€™s what the formula looks like:

*Unadjusted PER passing component = 0.05((completions0.75 * completion%3.0) + (catches0.75 * catch%3.0))*

And hereâ€™s what the whole uPER equation looks like:

*Unadjusted PER = (0.5)goals* *0.75 + (0.5)assists0.75 - (0.75)drops0.75 - (0.75)throwaways0.75 + (0.75)blocks0.75 - (0.75)stalls0.75 + (1.0)Callahans0.75 + 0.05((completions0.75 * completion%3.0) + (catches0.75 * catch%3.0)) *

### Plus-minus Component

Plus-minus data for ultimate functions largely in the same way as it does for other sports, with one fundamental difference: the presence of offensive and defensive lines. The probability of scoring is drastically different when on an o-line than on a d-line, and since plus-minus deals exclusively in whether or not you score a goal, o-line and d-line plus-minus should be treated very differently. Even the greatest d-lines, for instance, are likely to come out negative in plus-minus evaluations because of the nature of the sportâ€”itâ€™s a lot harder to score when you start on defense, clearly. So how should we go about doing this? For starters, I decided that I wanted to seperate the plus-minus data for each playerâ€™s o-line and d-line points and compare each playerâ€™s data to that of the league average, respectively. To do this, I needed to see what each playerâ€™s average o-line and d-line plus-minus values were, which I can see by taking a playerâ€™s o-line and d-line plus-minus over the entire season and dividing each by offensive and defensive points played, respectively. All in all, by comparing playerâ€™s plus-minus to league average o-line and d-line values we should eliminate the inherent advantage that o-line players are given when dealing with plus-minus values. This is what it looks like so far:

*Unadjusted PER plus-minus component = (o-line plus-minus / o-line pts played) - [league avg (o-line plus-minus / o-line pts played)] + (d-line plus-minus / d-line pts played) - [league avg (d-line plus-minus / d-line pts played)]*

It is interesting to note that the league average of d-line plus-minus divided by d-line points played is negative, as the expected value of a d-line possession is negative, meaning that we actually are actually adding the league average because we are subtracting a negative value. As of now in our equation, we have our o-line and d-line plus-minus components weighted evenly, but this should obviously not be the case. We need to weight these components according to how many points the player actually played on each line, something we can easily do by multiplying the o-line portion by offensive points played and the d-line portion by defensive points played.

The plus-minus component of this PER metric is unique in that it is the only portion that deals not with individual data, but with team statistics. This is both a benefit and a curse, as it allows you to capture the intangibles that box statistics can not, although your data is also influenced by the six other players on the point. This gives a big advantage to players playing on good teams, which is not ideal for a metric that is attempting to evaluate players as objectively as possible. Still, the whole purpose of metrics like this are to evaluate the value a player brings to their team, and measuring team performance when that player is on the field is a simple yet effective way of doing just that. I have one last piece to add to the equation, though; because of this inherent flaw in the statistic, and in order to bring the average values and variation of this component closer to that of the other components, I decided to multiply everything in this equation by 0.1. Now letâ€™s take a look at how our formula is shaping out:

*Unadjusted PER plus-minus component = 0.1(o-line pts played) * (o-line plus-minus / o-line pts played - [league avg (o-line plus-minus / o-line pts played)]) + 0.1(d-line pts played) * (d-line plus-minus / d-line pts played - [league avg (d-line plus-minus / d-line pts played)])*

The last piece of this puzzle is more logistical in nature than anything: what to do with the players who have played no o-line or no d-line points? As the formula involves dividing by points played, I need to replace these zeros with a natural number. But because I am also multiplying by points played, it does not matter what this number will be because I am multiplying that portion of the equation by zero. To rectify this, I added a simple if statement, as you can see below. Kind of confusing I know, but hereâ€™s what that looks like on paper:

*Unadjusted PER plus-minus component = 0.1(o-line pts played) * (o-line plus-minus / (o-line pts played, if o-line pts played > 0, else 1) - [league avg (o-line plus-minus / o-line pts played)]) + 0.1(d-line pts played) * (d-line plus-minus / (IF d-line pts played > 0, THEN d-line pts played, ELSE 1) - [league avg (d-line plus-minus / d-line pts played)])*

And finally, here is what our entire uPER equation looks like at this point:

*Unadjusted PER = (0.5)goals* *0.75 + (0.5)assists0.75 - (0.75)drops0.75 - (0.75)throwaways0.75 + (0.75)blocks0.75 - (0.75)stalls0.75 + (1.0)Callahans0.75 + 0.05((completions0.75 * completion%3.0) + (catches0.75 * catch%3.0)) *+ *0.1(o-line pts played) * (o-line plus-minus / (o-line pts played, if o-line pts played > 0, else 1) - [league avg (o-line plus-minus / o-line pts played)]) + 0.1(d-line pts played) * (d-line plus-minus / (IF d-line pts played > 0, THEN d-line pts played, ELSE 1) - [league avg (d-line plus-minus / d-line pts played)]) *

### Pulling Component

While much smaller in importance than the rest of the components mentioned above, pulling is a unique skill which plays a role in winning and, therefore, must be evaluated. Pulling is essentially a football kickoff but for ultimate; the team starting on defense throws the disc to the offensive team, trying to get the disc as far down the field and with the most hang time possible while keeping it in bounds. Much as with the passing component, the first thing we must do is to balance the quantity of pulls taken with the quality of those pulls. For quantity, we can look at pulling in terms of how many pulls are taken in addition to the proportion of points on which you pull. Again, very similarly to passing, I do not believe that the relationship between quantity of pulls and value is directly proportional, so a player who pulls twice as much is not necessarily twice as valuable. To account for this, we can take total pulls and set the number to the one-fourth power, in addition to multiplying this number by the proportion of pulls to points played.

While measuring the volume of pulls seems relatively straight forward, how should we go about measuring the quality of pulls? Pull quality has two general, easily traceable characteristics: whether it is inbounds, and its hangtime. We know playersâ€™ average pull hang times, but because pulling is such a small part of the gameâ€”and to ensure that players who only pull rarely arenâ€™t given an advantageâ€”we must make some adjustments. Most playersâ€™ average pull hang times range from between four to eight seconds, but to lower this value and lower the value of the pulling component in general, I decided to subtract 3.0 from each playersâ€™ average pull hangtime. This has the effect of making the pulling component less important in relation to the other components, although because I used subtraction instead of division the distances between players remains the sameâ€”basically, I wanted to lower all values without decreasing the variation. We run into another problem, though, with low-volume pullers. Some players have only pulled a handful of times throughout the season, which can lead to some wildly inflated hang times. To counteract this, I decided to include a threshold of more than 20 pullsâ€”players who do not meet this threshold over the course of a season are contributing very little to their teams by pulling, and I didnâ€™t want the formula to place high value on players who pull at very low volume but with good results. Players who do not meet the 20 pull threshold will have their hang time values cut in half. So hereâ€™s what the pulling component formula is as of now:

*Unadjusted PER pulling component = (# pulls / pts played) * (# pulls1/4) + (IF # pulls > 20, THEN avg hang time - 3.0, ELSE 0.5(avg hang time -3.0)) *

Finally, we need to incorporate whether or not the pulls land inbounds. At first this is simple enough, we just need to add in the pulling accuracy percentage, or number of inbound pulls divided by the number of total pulls thrown. We again have to decide how we want to weight pulling accuracy with the other component of overall pulling quality, which is hang time. Even with the subtraction of 2.5 pulling hangtime is likely going to be expressed as a number somewhere between 2 and 6, which pulling accuracy is expressed as a percentage. Because pulling accuracy is given a smaller numerical value but is likely the more important aspect to pulling, I decided to raise the pulling accuracy percentage to the third power. This spreads out the variation of results between players, and by doing so adds comparative value to the best pullers and detracts value from the worst. Finally, because some players do not throw any pulls over the course of the season, we must insert an if statement to ensure that the equation runs smoothly (so that it doesn't divide by a zero). Hereâ€™s what all of this put together looks like:

*Unadjusted PER pulling component = (# pulls / pts played) * (# pulls1/4) + (IF # pulls > 20, THEN avg hang time - 3.0, ELSE 0.5(avg hang time -3.0)) * (# inbounds pulls / (IF # pulls > 0, THEN # pulls, ELSE 1))3*

Finally, we can look at the formula for unadjusted PER in its entirety! First, though, we divide everything we have by total points played, converting uPER from an aggregate measurement to a per-point one.

*Unadjusted PER = *

*(1 / points played) * *

*{ (0.5)goals* *0.75 + (0.5)assists0.75 - (0.75)drops0.75 - (0.75)throwaways0.75*

*+ (0.75)blocks0.75 - (0.75)stalls0.75 + (1.0)Callahans0.75*

*+ 0.05((completions0.75 * completion%3.0) *

*+ (catches0.75 * catch%3.0)) *

+ *0.1(o-line pts played) * (o-line plus-minus / (o-line pts played, if o-line pts played > 0, else 1) - [league avg (o-line plus-minus / o-line pts played)]) *

*+ 0.1(d-line pts played) * (d-line plus-minus / (IF d-line pts played > 0, THEN d-line pts played, ELSE 1) - [league avg (d-line plus-minus / d-line pts played)]) *

*+ (# pulls / pts played) * (# pulls1/4) *

*+ (IF # pulls > 20, THEN avg hang time - 3.0, ELSE 0.5(avg hang time -3.0)) *

** (# inbounds pulls / (IF # pulls > 0, THEN # pulls, ELSE 1))3 }*

Now just a few more details and we can get into the results. First and foremost, the dataset of players I have includes many players who only played a handful of points; since this metric will end up being a per-point valuation, this can lead to some very inflated results. To fix this problem, I put in place a threshold of at least 30 points played, meaning that only players who have played over 30 points will be included. This goes for calculations of league averages within the metric too. And finally, I wanted to normalize the data so that the league average for the PER metric is 15â€”we can do this by taking each playerâ€™s uPER, dividing by the league average of uPER, and multiplying it by 15. Normalizing in this way cleans the data up a little and makes it much more palatable, as the average PER value would otherwise be a decimal after the per-point adjustment. Here is what that looks like:

*PER = uPER * (15 / league avg uPER) *

**The Results**

For starters, letâ€™s take a look at how my metric evaluates the top players in the game; more specifically, the top 25 players:

**Top 25 Players According to PER **

Feel free to access the entirety of this data at any time __here__, also linked at the bottom of the article. While there are some familiar names on this list, one of the things that stands out the most with my new metricâ€™s ranking is the lack thereof. Because my ultimate PER is an estimate of per-point value, the door is left open for players who played a small number of points but had a high impact to be rated very highly, a pattern that is exemplified by the top of this list. Three of the top five players on this listâ€”Nethercutt, Cashman, and Nelsonâ€”didnâ€™t clear 100 points on the season, and the former two both played less than 50 points. While I made a point of not over-emphasizing the impact of volume when making the PER formula, these results show that I may have gone too far in this direction, as players who have an unquestionably large impact are often rated lower than other players who performed well on a much smaller sample. This is exemplified by the placement of Ben Jagt, named the MVP of the AUDL after having a monster season, at just 23rd on this list. Out of the top 25 players, eight of them played less than 100 points; raise the points-played threshold and remove these players, however, and this is what the top 25 would look like:

**Top 25 Players with 100-point Threshold**

While still far from perfect, the rankings provided by this list seem to be slightly more in touch with reality than the 30 points-played threshold list. Going back to my original list, though, lets breakdown why each player was ranked where they were based on their performances in each of the four components of the metric:

**Breakdown of Top-25 by Component**

As you can see, no one player dominated in all four of the categories, or even in more than one category, as no player was ranked in the top ten in multiple components. Three of the highest ranked players, Nethercutt, Snider, and Nelson, achieved this by performing very well in the box or passing components; this is what we would expect, as these two components were given the largest weighting compared to the others. However, this does not hold true for the third ranked player on our list, Pat Cashman. Who? Yes. Cashmanâ€™s valuation this high on the list is mainly due to his highly valued pulling, but at the end of the day his presence illustrates a mistake, or a flaw in the matrix if you will. While he also performs well in the box component, Cashman actually gets as much value out of his pulling as his box score, making him an outlier and showing that my ratings system isnâ€™t perfect after allâ€”much to everyoneâ€™s surprise, Iâ€™m sure. I could continue with this pattern of analysis down the list, but I feel it's more important to understand how each component of the metric compares in weighting and importance to the other components, something that I hope the following graphics will help to illustrate:

These histograms show buckets, or ranges of values, on the x-axis and how many players are in each bucket on the y-axis. Letâ€™s look at the pulling component graphic as an example; the range of -1.25â€“0.63 includes so many players because most players never pull, or do so very rarely. These histograms reveal some very interesting patterns; for one, the extreme right-tailed nature of the passing component shows us that most players are grouped together in the four to nine points range, while there are a few outlying players all the way out into the twenties. This is actually exactly the kind of pattern I wouldâ€™ve expected from the passing component, as it makes sense that a few primary o-line handlers would have a much higher performance than the rest of the league in this category. In contrast, the box component histogram is more center/left-tailed, largely because a number of players finished deep in the negative. Given the nature of this componentâ€”players can lose points for throwaways, drops, and stallsâ€”this also makes some sense. Looking at these two components, we can see that the mode of the box histogram is at around 12, while for the passing histogram is around six. But because of the left-tailed nature of the box component and right-tailedness of the passing component, these two portions of the metric end up having very close averages, as you can see:

**Data on each PER Component**

The table above shows us that the box and passing components of the metric are given the most weight, as I hoped they would when making the formula. It also shows that the box and plus-minus components are the most variable, which also makes sense given the wide birth of positive and negative values coming from each of these components. This table demonstrates some of the flaws in my metric as well, though. First of all, the negative average for the plus-minus component is a little disconcerting; this is likely a result of my removing nearly 150 players who did not meet the points-played criteria from the dataset, although it does reveal a flaw as the average of the plus-minus portion should in theory have been equal or closer to zero. The fact that the maximum value of the pulling component was 35.98, higher than that of the box component, illustrates another problem with my formula. This value was given to David Baer, who played only 3 games and 38 points. All in all there were four players who scored values of over 20 from the pulling component of the metricâ€”itâ€™s good that there werenâ€™t more, but thatâ€™s four too many.

Players like the aforementioned Baer are able to take advantage of the per-point nature of this metric to climb up the rankings, but what if we look at this data from a different perspective instead? We can do this if we sort by unnormalized, aggregated PER data, or data that isnâ€™t based on valuing players on a per-point basis but rather on volume across a whole season. This leads to the sheer number of points played having much more impact, but it also leads to an interesting set of rankings:

**Aggregate uPER Rankings**

First off, it is worth noting how the per-point and aggregate rankings differ from a team perspective. The per-point rankings have four of their top 25 players coming from the Madison Radicals, three from the DC Breeze, and three from the Raleigh Flyers. The Radicals finished just 6-6, good for 12th in the league and fifth in the Midwest Division, while the Breeze were tied for eighth and third in their division and the Flyers finished tied for second and first in their division. The aggregate rankings, however, have four players from NY Empire, four from the SD Growlers, and three from the DC Breeze and Indianapolis Alleycats each. New York Empire dominated the season, going 15-0 and winning the championship, while the Growlers finished tied for second and first in their division and the Alleycats finished with a record tied for fourth and also won their division. The teams which the aggregate ranking had the most players from, NY Empire and the SD Growlers, each had just one player on the per-point rankings. Overall, the per-point rankings included just seven of their top 25 players from the four division-winning teams, while the aggregate rankings had 13 of their top 25 players coming from one of these four teams.

Aggregate PER also features a top 25 entirely made up of players who played more offensive points than defensive ones, while the original per-point rankings include five mainly d-line playersâ€”mostly a product of the list including players it probably shouldnâ€™t. We can also look at the data from a positional point of view: the per-point PER top 25 is made up of 12 cutters, six hybrids, and seven handlers as classified by the AUDL, while the aggregate rankings are comprised of seven cutters, 13 hybrids, and five handlers. This gives the per-point top 25 a 1.71 ratio of cutters to handlers, while the aggregate players show a 1.40 ratio. The fact that there are more defensive players and more cutters from the original top 25 than the aggregate 25 again points to the fact that there may be weaker players on the per-point list that shouldnâ€™t be there.

While this aggregated list appears to be comprised of better players and seems more accurate than the per-point rankings, is there any way we can test this? As a matter of fact, there is. For one, we can compare both my original rankings and aggregate rankings to a plus minus statistic as provided by __UltiAnalytics__. Their formula for plus minus is as follows: +1 for a goal, +1 for an assist, +1 for a D, -1 for a drop, -1 for a passer turnover (throwaway, stalled, misc. penalty), +2 for a callahan (+1 for D and +1 for goal), and -1 for being callahaned. While I would argue that this isnâ€™t the most efficient way to rank and analyze playersâ€”obviously, thatâ€™s why Iâ€™m doing this after allâ€”it provides a unique tool of comparison. Using all 601 eligible AUDL players, comparing my PER rankings with UltiAnalyticsâ€™ plus minus gives a covariance of 143.97 and a correlation coefficient of 0.621. This demonstrates a moderate correlation between PER and UltiAnalyticsâ€™ plus minus, but when we make the same comparison using my aggregated uPER we see a covariance of 93.60 and a correlation coefficient of 0.892, demonstrating a much stronger correlation. This correlation does come with some caveats, though, as both UltiAnalyticsâ€™ plus minus and my aggregated uPER are directly influenced by the number of points played. To demonstrate this, here are the average number of points played by the top 25 players from my original list, aggregated list, and UltiAnalyticsâ€™ plus minus: original - 151.70 points-played, aggregated - 270.84 points-played, and UltiAnalyticsâ€™ plus minus - 270.04 points-played.

Another way that we can test the accuracy of my lists is by comparing them to the 2019 AUDL All-Star and All-AUDL teams. These teams are comprised of the gameâ€™s top players, and making one of these teams provides a player with an objective stamp of approval. So let's see how some of the AUDLâ€™s top players are evaluated by PER:

**All-AUDL First Team**

With an average PER rank of just 64.71, PER ranks these high-level players much worse than aggregate uPER, which has their average rank at 13.29. In fact, six out of the seven players were within the top six in aggregate uPER, showing that the aggregate model seems to be much more in line with the general consensus.

**All-AUDL Second Team**

While not as extreme, a similar story is told by the All-AUDL second teamâ€™s data. The average PER rank came in slightly less at 66.86, while the average aggregate uPER rank was only 17.43. The fact that the average ranks of both increased from the first to the second team, though, speaks to the validity of both measures.

**AUDL All-Stars**

Once again, we see a similar pattern from the All-Star player data. The average rank for players in PER was 123.84, while the average aggregate rank was 17.80. Although it seems clear that All-Star and All-AUDl selections are more in line with our aggregate than our per-point data, we can still test in other ways too:

**Aggregate uPER Rankings**

The aggregate data has 14 All-Stars and 12 All-AUDL players, making these team selections significantly more in line with the aggregate rankings than the per-point rankingsâ€”the original rankings included just four All-Stars and All-AUDL players each. This certainly shows that the aggregate system produces more accurate rankings but I still have questions as to how valid this form of measurement actually is. Because we do not divide by points played when analyzing the aggregate data, this data is heavily influenced by the number of points you have played. We saw this earlier, when we saw that the top 25 players from the aggregate data averaged nearly 120 more points played than the top 25 players of the per-point PER data. It is true that the aggregate data highly values players in large part due to volume, but so do we! While this pattern points to possible flaws in my metric, it may also point to flaws in the way we compare and conceptualize players as well, as we often discount players who perform with high efficiency when achieved in smaller samples. On the other hand, durability and consistency over time are very valuable characteristics in ultimate and sports in general, and the players from the aggregate top 25 likely played more points in part because they are simply better players. Ultimately, it is really up for everyone to decide what they view as more valuable: quality performance in a small sample, or slightly lower performance in a higher sample.

I hope this article has helped you think about the game of ultimate and the ways in which we evaluate athletes a little differentlyâ€”I know writing this has for me. And while the rating system Iâ€™ve compiled is far from perfect, I hope that there will be many more systems in the future better designed to help us analyze and compare ultimate frisbee players. And if youâ€™ve somehow managed to read this far but donâ€™t play ultimate already, I hope this piece has encouraged and enticed you to get on the field! That disc isnâ€™t going to throw itself.

*Sources: basketball-reference.com, theaudl.com, ultianalytics.com, ultiworld.com, wikipedia.com *

*Data: *__https://docs.google.com/spreadsheets/d/1_iNyn3makGo7_yxOCTQeytc83AipCnf14YQcS__