By Vedant Sahu • 29 May 2019 • 10 min read
With the Indian Premier League wrapped up and the ICC World Cup just on the horizon, the cricket fever is definitely reaching its peak. As more and more people turn towards cricket to keep themselves entertained over the summer, it is probably safe to say that people who have been newly introduced to the sport won’t find it particularly easy to follow. While cricket is an already complicated game to begin with, there is one notorious component of the game that even most seasoned cricket fans fail to completely grasp - the Duckworth-Lewis-Stern (DLS) method.
At the heart of it, cricket is all about scoring more runs than the opponent. While test cricket has more flexibility in how this can be achieved, limited overs cricket (ODIs and Twenty-20s) is more straightforward in terms of the objective of the game. You have a certain number of overs available to you and you try to score as many runs as you can. Where the DLS method comes in is in the event of any interruptions during the game, most likely rain, which restricts the match from being completed in its entirety. In such a scenario, the number of overs that either one or both of the teams will face is adjusted, in order to finish the game in time, and the DLS method is used to accordingly adjust the score that the team batting second would require to win.
The DLS method is based on the principle that in a limited overs match, a team has two ‘resources’ available to them. The first one is the number of overs that they still have to face and the second is the number of wickets that they have left. It is easy to see that a team’s final score is strongly dependent on how many resources they have left at a particular stage of the game. Obviously, a team with more overs left will face more deliveries, and hence will have more chances to score. As for the wickets remaining, a team with more wickets can afford to take more risks and bat more aggressively, hence boosting their score.
These two points manifest themselves in a resource table, which provides the percentage of resources that a team has at different points of the game. Another peculiarity of cricket shows up here because in all international and first-class matches, the ‘Professional Edition’ of the system is used, which is not publicly available. Due to the presence of such a black box in the DLS calculations, I’m restricted to the use of the ‘Standard Edition’ for my analysis. The difference between these is that the ‘Professional Edition’ takes into account the score of the team batting first and makes slight adjustments to the resource table accordingly, while the ‘Standard Edition’ doesn’t take this into consideration. The ‘Standard Edition’ showed huge deviations in high scoring games and the switch to the ‘Professional Edition’ has helped fix this issue.
The DLS method was formulated to be used for 50-over games, but with the inception of the now incredibly popular Twenty-20 (T20) format, the method has been used in this format of the game as well. While it was devised considering the moderate scoring rates in 50-over games, the fact that the same formulation is being used for the much more explosive 20-over format does raise some concerns and this is exactly what I hope to investigate in this article. The relative importance of wickets and overs remaining was calibrated for the 50-over format and critics often point out that wickets are relatively less valuable in the shorter format of the game, making the DLS method less accurate. The ‘Standard Edition’ resource table scaled for a T20 game is shown below.
The DLS Method is based around this straightforward equation:
The par score is the number of runs that the team batting second has to score in order to tie the game up. For instance, a game between Rising Pune Supergiants and the Kolkata Knight Riders during IPL 2016 was interrupted by rain at the end of the 18th over during Rising Pune Supergiants’ innings due to which Kolkata Knight Riders could only bat for 9 overs. As a result, their target was reduced to 66. This is slightly higher than a target of 61, which is what would have been obtained using the ‘Standard Edition’.
Looking at the dataset of over 6000 Twenty-20 matches played between 2003 and 2017, we can see that the split between the games won by teams batting first and those won by the team fielding first is pretty even. This holds true even for games which were interrupted due to rain and required the use of the DLS method.
But when we focus just on the games played in the Indian Premier League between 2008 and 2018, we see a huge disparity in the split. While the games played under normal conditions are more or less equally distributed in terms of whether teams batting first or second got the spoils, when we look at the matches in which the DLS method came into play, the team fielding first alarmingly won 74% of the time. It is important to note that this is based on a small sample size of merely 19 games, but the fact that this divide is present, does point towards a systemic issue in the DLS method which fails to account for the explosiveness and rapidly fluctuating nature of a T20 game.
The original Duckworth-Lewis Method used a score of 245 as the average score in a 50-over game in its formulation. This translates to a run rate (average runs scored per over) of 4.90. Looking at the first innings scores for Indian Premier League games, we see that the average runs scored is 162.11, which translates to a run rate of 8.11. This is significantly higher than what the norm was when Duckworth and Lewis came up with their formulation. This does give some validity to the argument made about the number of overs remaining being a significantly more valuable resource in a T20 game than it is in its 50-over counterpart. The boxplot below shows just how high the scores in T20 games can get.
To get a better sense of whether the resource table fits well for a T20 game, I used data from the IPL games to see what the DLS method predicted the score to be for teams batting first based on the resources they had left at the end of the first half of their innings, i.e. 10 overs. The scores that the teams batting second achieved have been ignored because their strategy is often dictated by what the target is. For instance, a team chasing a lower target might chose to play conservatively and score at a rate slower than they normally would. Hence, they don’t provide an accurate representation of a team’s use of its resources.
For a fair and accurate system, one would expect the predicted scores to be close to the actual scores and the error (actual score - predicted score) to have a mean near 0 along with a small standard deviation. While the scatterplot looks pretty evenly distributed, an in-depth analysis shows that the DLS method underestimated the final score in a staggering 59.39% of the games. The histogram, which shows the distribution of the error, does follow a Gaussian distribution, but it is easily noticeable that it is not centered around 0. The mean of the error is actually 4.44, which is pretty impressive, but the standard deviation comes out to be a disappointing 26.29. These predictions are based on the Standard Edition and even though the Professional Edition will allegedly provide better results, this does point towards systemic errors in the DLS method vis-à-vis its use in T20 games.
Another criticism that the DLS method faces is that it fails to account for the individuality of the players. A loss of wicket is treated the same in terms of the resources lost regardless of the player who was actually dismissed. This is not representative at all of what transpires over the course of an actual game. An in-form, experienced batsman is likely to do way more damage and both the teams are aware of this fact. The dismissal of a set batsman who’s been at the crease and been performing well can turn the game completely on its head, whereas the loss of the wicket of someone who isn’t at the top of his game will probably won’t have that deep of an impact on the game. So the fact that the DLS method provides equal weights to each batsman is slightly troubling.
Looking at the most recent season of the Indian Premier League, we can see that David Warner, who was the highest run scorer in the tournament, was responsible for an impressive 35.5% of his team’s total score in the 12 games that he played. Similarly, Kings XI Punjab’s K.L. Rahul alone scored 24.4% of the 2429 runs that the team accumulated over their 14 games. Taking into consideration the enormous contributions that these players made to their teams, it does seem unfair that the DLS assigns a weight of only a few percent to their wickets. A similar argument could be made for bowlers as well. Jasprit Bumrah’s brilliant bowling at the death was a key factor in Mumbai Indians’ winning performances. So the fact that his over is just as valuable as that of any other bowler shows another considerable discrepancy in the DLS method.
The DLS Method is certainly an improvement on its predecessors, but that does not excuse the inherent issues that it has. With so much data being kept track of, building a system which incorporates batsmen’s strike rates, bowlers’ economy, and their averages is becoming more and more feasible. At least in the highest level of the game, making use of the numerous statistical tools and the plethora of available player and match data seems to be a logical step. Granted this would make the system even less transparent and more complex to understand, the stakes are so high that it surely warrants an attempt. Cricket is a sport where the tide can turn in a matter of a few deliveries, so it is obviously near impossible to account for each and every scenario. However, that does not mean that the current system cannot be improved. An overhaul is needed, but as for now, just hope that the rain gods are kind when you watch your favorite team play their next game.
Further Reading: Check out this M.Sc thesis by Harsha Perera focusing on the Duckworth-Lewis Method in T20 Cricket. Keep in mind that this was published in 2011, before Professor Stern became the custodian of the method.
Hopefully I will have updated my GitHub, so that you can check out the analysis that I did and get the code for the plots that I made.