Bruin Sports Analytics
Running over the Ages
By: Ryan Dunker
Running is a sport like no other. It was the core of the first Olympics in 776 BCE which included four different kinds of running events. The physical act of running for humans dates back even further to our early human ancestors who would run from predators or after prey while hunting.
Running as a sport has a rich history as well. The classic tale of the messenger running from Marathon to Athens gave rise to only one facet of running: the marathon. There is so much more that one can explore in running, but that is not the focus of this article.
In this article, I dive into historical data on both Olympic running and competitive running. Obviously running has progressed within the past century and enhancements have been made, but how is this actually portrayed when looking at times, age, country of origin, and gender?
[*Note: The historical data analyzed for this article did not contain any information in regards to weight, height, BMI, etc. Such variables are definitely a major field of interest, but outside the scope of this article.]
To learn more about the history of running, I collected and analyzed data on running events for each Olympics dating back to 1896. Additionally, I also collected and analyzed data that contained the top 1000 times of numerous track events for both male and female athletes.
[*Note: The data focuses on track-based events. These include: 100 meters; 200 meters; 400 meters; 800 meters; 1,500 meters; 5,000 meters; 10,000 meters; half marathon; and the marathon. There are no entries in the Olympic data set for the years 1940 and 1944 due to their cancellation in the midst of World War II.]
From this graphic, it can be easily deduced that the 100 meter sprint has truly progressed through the past century. The gold medalist, Thomas Burke in 1896, clocked in his dash at 12 seconds. Fast forward 116 years, and you have Usain Bolt clocking in at 9.63 to take the gold in the 2012 Olympic. Needless to say, this 2.37 second time gap might not seem like a lot, but in reality races like these can be decided by milliseconds and a 22% difference is really immense.
For example, the difference between winning a gold medal and a bronze medal in the 100 meters during the 2004 Olympics was two one-hundredths of a second. Furthermore the difference between silver was one one-hundredth of a second. You can see this miniscule difference on Figure 1 under the year 2004 where the bronze entry is encompassing both the gold and the silver. It becomes evident that in a 100 meter race a fraction of second can decide the race.
Below is an image created by the New York Times, it depicts the difference in time/distance for several runners throughout history:
Another interesting take away from the graphic above is that the spread amongst types of medals has slowly gone to zero. I mentioned above that in 2004, the difference was minisculely small at 0.02 seconds. Looking at 1896, for example, one can clearly see that the race certainly did not have a narrow finish, but in today's world every race is so tight that high-tech cameras/sensors are needed. Overall, one might argue that the margin of success for Olympic athletes is slimming every year.
It is important to note that after 2004, the difference in times did begin to open up again. Shockingly enough, Usain Bolt made his debut at the 2004 Olympics and it was the only Olympic race in his career that he lost. Continuing forward, he dominated Olympic sprints up until his retirement in 2017.
The graphic above was calculated from the dataset consisting of the top 1,000 times for each event. It should be noted that the entries for this dataset will most likely have occurred within recent years as race times are much faster than they were 100+ years ago.
The figure above presents the mean time in seconds for males in a 200m race for ages 16 to 38. As you can see, there is a parabolic shape again to these figures, but it's interesting to see that ages 32 and 33 seem to be far below the competition. It's also worthy to mention that the y-axis has a total scale of 0.3 seconds which goes to show how close these runners are to each other.
[*Note: There were no male entries for ages 37. Hence, there are no plots on the graph]
On the women’s side, the takeaways are very similar to their male counterparts. Once again, it is intriguing to see that athletes in their late 30s are on average still very fast and can compete at such high levels. Doesn’t this seem unreasonable?
When taking a closer look, the data has 138 entries from 23 year olds in this list of the top thousand 200m race times. For the ages 38 and 39, there was only a single entry for both of them. This obviously depicts how taking the average might certainly be misleading across age groups. It is important to mention that only elite racers can compete at the older ages and it is very impressive for them to be in the top 1000 times ever recorded.
In this Figure, I switch up the approach and look solely at the fastest time for each age. From here one can clearly make the distinction that sprinters are fastest through their early 20s to the late 20s.
Notice the upward parabolic curvature of Figure 2. It supports our notion that runners have a so-called “prime” which equates to a span of time where they are at peak performance before age and fatigue take over. With this said, I would estimate that peak performance can be elicited from a male sprinter during the age span of 22 - 25. This is obviously very speculative and there are many more other factors that go into this than just age.
[*Note: The world record for males in the 200M is 19.19 seconds and was set by a 23 year old Usain Bolt at the 2009 world championship]
The figure above is very similar to Figure 3 presented earlier; however, one can see clear differences in the overall shape and spread. For example, notice that the shape is more of a distinct parabola than the figure presented before, reinforcing this idea of athletes having prime years in their careers. In this example, I would estimate that early 20s to late 20s are the most optimal years for peak performance.
When considering the notion of prime years, it makes sense to connect the points on this graph and follow this line to the end. You’ll see that times will start high and then progressively get lower until they hit a plateau or minimum, after this the times begin to climb once more and level off at the same time range.
[*Note: The world record for males in the 200M is 21.34 seconds and was set by a 28 year old Florence Griffith-Joyner at the 1988 Summer Olympics]
Let's take a break from sprints and look at distance track events:
For this Figure, I investigated the fastest times in the 1500M race for males by year. Notice that the points are plotted in seconds which is somewhat unconventional for middle distance events like the 1500M.
The takeaways from Figure 5 are very similar to the insights discussed above. When looking at the general shape of the figure I’d estimate the “prime” of middle distance runners to be early to mid 20s.
Once again, Figure 6 continues to rehash this notion regarding an athlete's prime or years of peak performance. Something quite interesting regarding Figure 6 are the first couple entries which depict that there are nearly three female athletes that came extremely close to the world record and were also under 20 years of age.
To put this into perspective, the fastest 17, 18, and 20 year olds all have faster times than 85% of the other athletes from different age groups.
In order to investigate the country of origin, I looked into the top 9 countries for each respective class of running. I also included an “other” category to equate for the remaining countries since for some classification there are 60+ unique countries.
I divided the running events into three main categories: Sprints, Middle-Distance, and Long-Distance. For sprints, I included the following running events: 100m, 200m, and the 400m. For middle-distance, I included the 800m and 1500m. For long-distance, I included the 5000m, 10000m, half-marathon, and marathon.
In Figure 8, the top countries for producing Long-Distance runners are presented. It is very intriguing to see that the country of Kenya constitutes nearly 41.4% of the top long distance runners around the world. Additionally, the neighboring country Ethiopia contributes nearly 22.4% of the best long distance runners. Jointly, that is an astonishing 63% of the best long-distance runners in the world.
[*Note: There were a total of 66 unique countries that have produced top tier, long distance runners. The 9th-seeded country was Portugal with 148 runners out of 8,010 entries]
Figure 9 depicts the top 9 countries and an “other” category for the highest proportion of middle-distance runners ranking in the top 1000 of their respective event(s). For middle-distance, there were 67 unique countries with elite middle-distance runners, this is part of the reasoning why the other category equates to such a high proportion. Similar to Figure 8, Kenya is still the country with the largest proportion of middle-distance runners which is nearly 21.9% of all runners.
Russia and the United States are also in the top of the leaderboards with 13.5% and 7.9% runners respectively.
In Figure 10, we dive into the sprinting classification of running events. As you can see, the United States outshines all other countries with nearly 40% of the best sprinters helming from the US. Jamaica, home to the fastest man ever, is also high in the leaderboards with nearly 17.4% of all sprinters having Jamaica as their country of origin.
[*Note: There are 73 unique countries with sprinters that are the best in their respective events. It should be noticed that this is the highest number of unique elements across all classifications of running events.]
This article sought to explore the evolution of running and the many factors that go into elite running. Our analysis provided concrete evidence of common known facts that runners have gotten faster over time and that many runners have a “prime” or an age range for peak performance. Additionally, we also uncovered that there were some countries that produce an overwhelming number of the top runners for a wide range of running events.
Diving in further, we uncovered that times in the Olympic sprints have decreased nearly 22% over the past century which is an astonishing improvement. Similarly, we found that different events have certain ages where athletes are most likely to be at peak performance.
In terms of country of origin, it's evident that for specific event categories like long-distance and middle-distance, countries from the African continent have dominated the playing field. Conversely, the USA has dominated all other countries when it comes to sprinting events, but it is important to note that the spread of countries in this category is much higher than other categories.
All in all, it was a fascinating investigation into the world of running. I explored new ideas and different topics in order to uncover meaningful insights when it comes to the evolution of running. There is so much more analysis than can be done and so many other variables to consider, but for now we can be satisfied knowing that running has come a long way over the ages.
Sources: https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results, https://www.kaggle.com/jayrav13/olympic-track-field-results, https://www.kaggle.com/jguerreiro/running