Just how good is Magnus Carlsen, Really?
By: Nilay Patel
Magnus Carlsen is arguably the greatest chess player to have ever lived. He holds the record for the highest Elo rating ever achieved (the rating system used in chess). However, the Elo system is necessarily inflationary; as technology improves and computer engines like Stockfish and Leela increase their already-substantial edge over humans in the game, new levels of play and computer-inspired ideas enter the human game. It was previously unthinkable that a computer could beat a human in chess, until 1997, when engine Deep Blue beat world champion (and arguably one of the strongest players of chess ever) Garry Kasparov in a match.
However, the game is not solved by computers; top grandmasters (GMs) devote their whole lives to chess, and it’s not like engines are allowable in any competitive form of human chess; Carlsen is still undeniably a chess wizard, and it raises the question: just how good is Magnus Carlsen, really?
There are two ways to think about this: measuring his dominance over the current crop of players (many of whom have peak Elo ratings higher than past world champions), or measuring his dominance of the current era to the dominance of past world champions in their own eras. The analysis performed will be focused on classical chess games (i.e., with the classical time control, which has been the predominant time control throughout history and the most “serious” format), so rapid, blitz, and online games (which are far more volatile with less preparation) will be excluded.
We can first examine how Magnus Carlsen has fared in tournaments since he ascended to the throne in 2013. Indeed, let’s take a look at the average Elo of his opponents in a given tournament compared to the score he achieved in the tournament. The scoring system of a tournament is rather simple: for each win, 1 point is awarded; for each draw, ½; for each loss, 0. The Elo system is also fairly easy to explain: a higher rating correlates to a better player, and there are technical formulas to determine the expected score of a game between a game of two players given their Elo ratings. After each game, the Elo rating is adjusted; for example, a higher ranking opponent losing to a lower one will have his Elo adjusted downward, while his opponent’s Elo will be adjusted upwards. Now, let us take a look at the “strong” tournaments that Magnus has played in; we will define “strong” as having an average Elo of at least 2700; for reference, Magnus is currently (as of May 2022) at 2864, 10th place is at 2761, and we have to go down to 37th place to reach a player with Elo 2699. Thus, we are comparing Magnus Carlsen’s performance against other GMs of his caliber.
This data shows something that followers of the chess world will already know: it is incredibly tight at the top, and a lot of draws occur. However, Magnus’s scores tend to hover above ½ most of the time, including putting up some insane performances in high-Elo tournaments. He generally posts scores between 0.5 and 0.8, which is incredibly difficult considering that most classical games are draws in the modern game; even reaching 0.66 means you win one of every three games (and still go undefeated throughout the whole tournament), which, given how hard it is to win just one game at the top level, is an incredible feat. Furthermore, if we standardize the data to fit a linear regression, we see the following:
This data suggests that, over his reign as the world champion (9 years), Carlsen has consistently performed at a rating of 2866 in strong classical tournaments (albeit with a weak regression as R2 = 0.113), which is an extremely high rating. For reference, no other player has ever reached that rating even once in their lives (Kasparov comes closest at 2851, and of currently active players Fabiano Caruana has the highest peak of 2854); this indicates that, in strong tournaments, Magnus Carlsen is consistently better than the next best player by a long shot (to the extent that no other players have ever come close to Carlsen’s “average” performance). There’s no sign of drop-off either; in 2022, he won the Wijk aan Zee tournament with a tournament performance rating (the Elo rating that would have been expected to get the same results that he actually got) of 2900.
However, this does also come with some drawbacks. For example, the regression is weak, meaning that chess matches, in general, are far less predictable than suggested. Second, Carlsen has explicitly stated that his goal is to hit 2900 and become the first player to do so; the regression shows that he still has quite a way to go to reach that mark. Third, he cannot afford to even draw opponents who are much weaker than him; after winning Wijk aan Zee, he lost all of the rating points he had gained (6 wins from 13 games!) in just one draw against a lower-rated Norwegian player; given how common draws are, this is very difficult! Nonetheless, Carlsen still crushes his opponents in top tournaments.
There is, however, another way to look at the strength of chess played. When chess engines evaluate any given position, they evaluate it for an advantage for white or black, and to measure it they use the value of a pawn as a unit. For example, if black is in a relatively strong position, even if both sides have the same amount of material, the computer may evaluate the position as -1 (where a negative number is in black’s favor, and a positive one in white’s favor), which says that the position may not currently have material imbalance but black’s position is so strong that black is essentially up a pawn over white. In other words, pawns are used to describe the strength of a position. However, since positions at the top level are pretty close, it’s often best to measure in centipawns, which are 1/100 of a pawn.
Since no player is perfect, every time a player moves, they tend to lose some positional advantage; we call this the centipawn loss for that move. A strong move will have a low centipawn loss (the position might weaken, but not by much, if at all); conversely, a weak move will have a high centipawn loss. Then, throughout the game, we can calculate the average centipawn loss per move for a player, and we call this their average centipawn loss (ACPL); a low value corresponds to a strong player and a high value to a weak player.
With this in mind, we can return to Magnus and his contemporaries. We have seen how he performs in tournaments against them, but how do they perform when competing for the World Championship Cycle? How strong is Magnus’s play in reality when compared to his peers? We can take a look at the Candidates Tournaments (which determine the players who will challenge Magnus for the throne) and the World Chess Championships themselves to gauge the strength of play of Magnus and his contemporaries. Here, the green line represents Magnus in World Championship matches, the black lines represent his challengers in World Championship matches, the green dot represents Magnus in his first Candidates Tournament, and the blue circles represent other Candidates.
We see here that Magnus consistently is near the bottom of the table, a good sign, but he isn’t always the best player in terms of ACPL. Even in his first Candidates Tournament, he was beaten by Vladimir Kramnik on this measure, and in 2016, when he won a World Championship title against Sergey Karjakin, the measure suggests he should have lost! This can be mystifying, but it’s important to note that ACPL is not a perfect measure; in opening lines known to be drawn, ACPL tends to be near zero as games are analyzed by computers to ridiculous depths (in 2021, 1 game had both players having an ACPL under 3; remember that 0 represents perfect engine play). However, in more explosive and attacking games, regardless of the result, the ACPL tends to be higher, as idealized and safe computer lines are rejected for other lines that may be objectively worse (to an engine) but provide great attacking chances and complications that are really difficult to handle; many of these moves aren’t “bad” but “subpar,” meaning that they are not the top engine recommendation but they are still a pretty good option. There are also a very small fraction of moves that humans play that seem to be “invisible” to engines, but these moves are so infrequent (and computer analysis often correctly evaluates such moves once played) that this can be ignored.
Here, we have to look at Magnus’s style of play: he can be attacking and push in situations when he wants, squeezing water out of stone. He pushes, pushes, and pushes other players a lot, and he often calculates better and faster than a lot of his opponents. This lends itself to a possible higher ACPL style, a style that still chooses really good moves but opts for more dynamic options that allow him to push for a win (and get good scores in tournaments as seen earlier) instead of following boring and over-analyzed computer lines to a draw. It’s also important to take note of the drop-off between the Candidates Tournament and the World Chess Championships; there tends to be a lot more analysis going into championships, and having only one opponent while having months of build-up and large teams with seconds analyzing computer lines to 20 or more moves is bound to result in better performance; it’s no coincidence that the challenger tends to perform better than every Candidate (which includes the challenger himself) in the championship match.
Furthermore, it’s important to note who beats him: in 2013, Vladimir Kramnik, an ex-champion who dethroned legend Garry Kasparov by using a drawing technique in the Berlin Defense would be bound to play solid, tight chess; in 2014, it was the same man he unseated in 2013, Vishwanathan Anand, who he then went on to beat in the World Championship Match by a margin of +2; and in 2016, Sergey Karjakin, yet another solid, defensive, tight player known as the “Minister of Defense.” Thus, ACPL isn’t a perfect measure for analyzing every single player, but for an explosive and dynamic player like Magnus Carlsen, such a low ACPL is a sign of high-quality chess, which is consistently better than pretty much every one of his contemporaries.
Thus, it’s pretty clear to see that Magnus Carlsen is very dominant in the contemporary period. However, how does he stack up to past champions? Let’s take a look at past historical Elo ratings:
This graph confirms the inflationary nature of the Elo system, and it shows how the World Champions have fared on Elo tables since the system’s introduction. We note that Kramnik and Anand, while being world-class players in their own right (and two exceptions to Magnus in the previous analysis), were typically not the highest-ranked during their reigns and that the debate comes down to Fischer, Karpov, and Kasparov. Fischer presents an odd case, bequeathing his title to Karpov in 1975 without a game because of his own controversies. He has a large Elo gap in the table, but his short reign cannot allow him to be truly seen in the same way with this longer-term dominance. Furthermore, the spike in the bottom graph when Carlsen becomes champion is also an incredible indicator of Carlsen’s dominance; he was the top-ranked player when he won, and the spike shows just how much better he was compared to Anand, an upward spike that isn’t seen anywhere else. However, on this evidence, Carlsen’s gap to the next-best player isn’t all that different from Karpov and Kasparov, and his reign is similar to Karpov’s, although Carlsen probably takes the edge since he has never lost the top ranking since he became champion (while Karpov did). Furthermore, the length of the reign suggests that Carlsen is somewhere in between the dominance that Karpov and Kasparov have shown. Kasparov had been champion much longer and won more titles than Carlsen, whose reign is creeping up on Karpov’s while maintaining a more impressive Elo gap to the next best player.
Here, we place Kasparov on top by virtue of how long he has managed to keep the Elo gap so large and place Carlsen just under him with Karpov a close third. However, this doesn’t account for world champions before Fischer, and we can take a final look at another measure to put this all into context: the ACPL gap in world championships since 1886 (the first recognized world championship). We will ignore games in the championships of Spassky-Fischer and Topalov-Kramnik that were forfeited but will consider those matches as if the problematic games didn’t take place.
Here we see a few pieces of evidence that chess players have gotten better over time (and that Elo should be inflationary): namely, the ACPL decreases, meaning players are playing increasingly accurate moves as time goes on. Indeed, we also see a justification for the previous statement that losing on ACPL doesn’t necessarily mean a worse game: Steinitz, Lasker, Botvinnik, Kasparov, and Carlsen have all won championships while losing on ACPL.
Now, let’s take a closer look at the data. Lasker and Alekhine have notably long reigns, and both often won championships quite comfortably, but it is difficult to compare them to the modern era; as the ACPL for all top players goes down, the gap between two players becomes finer and the best possible ACPL gap also tends to decrease; Lasker’s crazy 17+ ACPL gap is unlikely to ever occur again, for example. We see that Carlsen has some of the lowest ACPL throughout history, solidifying him as one of the best players of all time, but we also see that the ACPL gap is an incredibly fickle data point that is very hard to compare. Lasker’s ACPL was so good that, between him and Capablanca, it took decades for the rest of the world to catch up; however, any modern player would wipe the floor with him, and his opponents were notably terrible as seen on the graph; it’s very difficult to compare across far apart time periods because of how much better chess players have become. Nonetheless, Lasker ranks as an all-time great, but it will be difficult to compare Carlsen to Lasker, who played 100 years before him in a far different era.
It’s also important to note what this data doesn’t mean: this doesn’t mean that Kasparov is a better player than Carlsen just because he’s been a champion for longer and because we’ve ranked Carlsen behind him. Carlsen has reached a higher Elo ranking, and chess players tend to get better as time goes on; a current Carlsen could probably beat a prime Garry Kasparov if Kasparov was limited to the tools of his day, even if the current Carlsen is not as “dominant” as Kasparov was.
With that in mind, Carlsen’s reign looks most notably like Kasparov’s (initially) and Anand’s. The graph shows that Carlsen has a way to go to match Kasparov, but is probably just above the likes of Alekhine and Steinitz, and similar to Karpov as noted before. Fischer, also considered a great player, will have to be ignored because his World Championship antics ensure that some of his data is difficult to come by and comparisons are difficult to make.
The main takeaways from the data overall suggest that Carlsen is incredibly dominant over top players in the current era and that there isn’t much sign of much changing; his rapid rise is significant, and he has dethroned the previous generation, dominating his peers with a combination of solid and good chess combined with dynamic play that allows him to earn large tournament scores in extremely strong fields. At the same time, historically, we can draw a few conclusions. While Lasker’s reign was long and dominant, and he can likely be credited with the new style of chess and with being decades ahead of his peers (who were consistently far above him), the drastic improvements in chess and the introduction of engines have significantly lowered the ACPL gap between the champion and the next best player, leaving only viable comparisons after the Second World War. Indeed, Elo gaps suggest Carlsen is not the new Kasparov but the new Karpov, and while he is certainly the dominant champion that defines the current era (as Kasparov did for his and Karpov for his, unlike Anand and Kramnik), he is far closer to Karpov (although better than Karpov) when it comes to his dominance in Elo ranking and ACPL (as their reigns as the top-ranked players and as world champions are pretty similar in length).
However, if he maintains his current level of dominance for, say, 5-10 more years (as Kasparov was world champion for 15 years and on top of the Elo charts for over 21, while Magnus has been world champion for 9 years and on top of the Elo charts for just over 11), he could be considered alongside Kasparov, an answer to a question that has been asked since he first came onto the scene. This is no easy feat, and a few obstacles stand in his way: the importance of youth and the increasing intensity of chess favoring younger players, his own disillusionment with being the World Champion, and the ever-accelerating tactical innovations in chess that continually affect the way the 16 pieces move on the 64 squares.