Decoding the Game: Forecasting NBA Champions with Neural Network Algorithms

Bruin Sports Analytics
Sep 16, 2023
9 min read

By: Jun Yu Chen

Introduction

Navigating the crossroads of machine learning and professional sports forecasting, this article delves into the innovative utilization of Neural Network models for predicting the annual champion of the NBA. We intricately weave the narrative, taking the reader through the step-by-step construction of the Neural Network model, and shedding light on the process of NBA data aggregation and implementation. Our focus then transitions to the real-world application of our finely tuned model. Its impressive prediction accuracy, as evidenced in the successful forecasting of this year's top contenders and ultimate champion, attests to the potency of Neural Network models. This exploratory exercise illuminates the potential of such predictive models and opens up an avenue for future enhancements. The profound insights garnered from our research hold the potential to further refine and advance the sphere of sports forecasting.

Background

The National Basketball Association, commonly known as the NBA, is the premier men's professional basketball league in the world. The NBA consists of 30 teams, and is divided into two conferences: the Eastern Conference and the Western Conference. Teams are allowed to trade players before a certain deadline, and the performance of the teams are greatly affected by the players in the team. A typical NBA season is divided into two parts: regular season and playoffs. Each team plays 82 games in the regular season, and the top eight teams from each conference qualify for the playoffs, where they compete in a series of best-of-seven elimination rounds to determine the conference champions. The Eastern Conference champion and the Western Conference champion then face off in the NBA Finals to compete for the league championship.

Methodology

The prediction of an NBA champion, in essence, is a classification problem where 0 signifies losing and 1 denotes winning, and the tools we will be using in the project to solve this problem are the TensorFlow package for the python code and the customized neural network model.

Arguably, the most vital determinant of a neural network's performance, apart from its architecture, is the quality of data on which it's trained. To predict NBA champions, we brainstormed and incorporated various performance factors that substantially influence a team's season outcome. We adopted an innovative approach, dividing the game statistics into two frameworks: regular-season performance and playoff performance. Although a team's regular-season performance largely indicates if it is a title contender or not, teams with decent win records may falter in the playoffs due to several reasons, such as a lack of experienced players and heightened defensive intensity. Therefore, we sourced both regular season statistics from BasketballDataset and the most recent playoff season statistics to obtain a more comprehensive measure of a team's dynamic performance. Furthermore, we split the regular season data into pre-trade and post-trade segments, based on each NBA season's trade deadline, recognizing that game-changing player trades could significantly affect a team's standing. For instance, the Mavericks this year saw their rank fall from 5th to 11th after acquiring Kyrie Irving at the trade deadline, causing them to miss the playoffs.

Notwithstanding our efforts to capture current season performance, it is noteworthy that teams with a history of championship victories or recent successes often stand a better chance of securing the championship. To account for a team's relative strength and performance beyond just the current season, we utilized the Elo rating method as per \cite{NBAPrediction}. Elo rating factors in teams' performances over the years through recurrent updates of scores for both home and away teams for each game and maintains a carryover effect of previous years' performance through reweighting. This technique aligns closely with what we have learned in class. To visually demonstrate the behavior and evolution of Elo ratings over time, we present a plot showcasing the Elo ratings of NBA teams across multiple seasons.

After a comprehensive process of data collection, cleansing, averaging, and computation, we compiled our final dataset, incorporating all necessary input features for our neural network, and relevant data are from "NBATradeDeadlineHistory," "BasketballReferenceAllStar," "NBAAllDefensiveTeamHistory," and "NBAChampionsList." These features span from basic box score statistics such as three-point percentages and field goal percentages, to more sophisticated metrics like efficiency scores, as well as recurring ratings like the Elo score. The features used in our models are as follows:

team_name
season_id
win_loss_pre
field_goal_percentage_pre
three_point_percentage_pre
free_throw_percentage_pre
rebounds_pre
blocks_pre
turnovers_pre
personal_fouls_pre
win_loss_post
field_goal_percentage_post
three_point_percentage_post
free_throw_percentage_post
rebounds_post
steals_post
blocks_post
turnovers_post
personal_fouls_post
winner
mvp_number
all_defensive
playoff_field_goal_percentage
playoff_three_point_percentage
playoff_two_point_percentage
playoff_free_throw_percentage
playoff_rebounds
playoff_steals
playoff_blocks
playoff_turnovers
playoff_personal_fouls
playoff_efficiency
pre_trade_efficiency
post_trade_efficiency
elo_score
all_star_number

Neural Network Architecture

We constructed two models in total. The first model incorporated only regular season data, while the second model integrated both regular season and playoff data. The structure of both models is nearly identical. The methodology and customized architecture of the neural network (model 2) are presented below in detail:

Input layer

The input layer has 256 neurons, with a specified input shape of (33,), corresponding to 33 input features we have preprocessed. It uses the Rectified Linear Unit (ReLU) activation function. ReLU is a common choice for hidden layers in a neural network as it helps the model learn complex patterns and avoids the vanishing gradient problem, which can occur during backpropagation in deep networks.

Hidden layers

The first hidden layer has 128 neurons and also uses the ReLU activation function. The second hidden layer has 64 neurons, again with ReLU activation. These layers are responsible for learning and capturing the complexity of the data through the transformation of inputs from the previous layer into a space where the output can be computed.

Output layer

This layer has a single neuron with a sigmoid activation function. The sigmoid function is chosen because it squashes the input values between 0 and 1, making it suitable for the binary classification problem we are solving here, predicting either losing or winning the championship.

Model compilation

The model is compiled using binary cross-entropy as the loss function and Adam as the optimizer. Binary cross-entropy is a suitable loss function for binary classification problems. The Adam optimizer is a variant of gradient descent optimization and is known for its efficiency. It computes adaptive learning rates for each parameter and adjusts the alpha learning rate automatically by fitting a different alpha rate for each iteration.

Learning Rate Decay

We also implemented the Learning Rate Scheduler, which addresses the trade-off between making quick progress and maintaining stability during training. Initially, a higher learning rate facilitates rapid progress, but it can cause overshooting and instability later on. Decay gradually reduces the learning rate, enabling more effective convergence towards the global minimum of the loss function.

Training and Evaluation

The model is trained for 50 epochs on the training data with a batch size of 32. The validation data is also used to evaluate the model at the end of each epoch. The model's performance is then evaluated on the unseen test data, and the loss and accuracy of the model on the test set are printed out.

Class weights

We also implemented the case-sensitive training method to address the imbalanced dataset. The class weights argument is used to give more weight to under-represented classes in the data during training, helping to balance the influence of each class on the model's learning. This will be explained with more contextual details.

Model Performance and Evaluation

The performance of both Model 1 and Model 2 was remarkably high, producing a test loss of 0.154 and a test accuracy of 0.953. These results initially appeared almost too favorable. In an attempt to gain more insights into the training process, we plotted the loss curve and accuracy curves for both models. Here, we display the curves for Model 1 as a representation.

The training curve exhibited a gradual decrease, signifying the model was learning and adjusting its weights to minimize the loss function. Conversely, the validation curve remained relatively flat, suggesting the model was not improving its performance on unseen data. Despite both curves achieving low values of loss, the sizable gap between the training and validation loss signified potential overfitting, a phenomenon where the model memorizes the training data and fails to generalize to new, unseen data. This pattern was mirrored in the accuracy curve.

In order to understand the origin of this discrepancy, we generated a confusion matrix, a table layout that allows visualization of the performance of an algorithm. The confusion matrix highlighted a significant problem: while the model was excellent at predicting losing teams, it struggled with identifying winning teams, producing an increased number of false negatives and classifying only one of the winning teams correctly. This can largely be attributed to the imbalanced nature of our dataset, where the ratio of winning to losing teams was approximately 1:30. As a result, our neural network developed a bias toward predicting losses.

We addressed this issue by implementing the case-sensitive training approach, essentially assigning a higher penalty for misclassifying the minority class. This increased the weight attributed to the minority class, causing our network to focus more on the patterns exhibited by winning teams. For instance, in Model 1, we implemented class weights of {class 0: 0.524, class 1: 11.096}. This adjustment proved beneficial, leading to a revised model that successfully classified all winning teams.

After adopting this approach for both models, When comparing Model 1 and Model 2, the latter, which incorporated playoff data, proved to be more capable, achieving a lower test loss of 0.151 and a higher test accuracy of 0.919. This outcome aligns with our initial hypothesis that playoff data could be more insightful, capturing performance indicators such as experience and pressure handling, which regular season data might overlook.

To further assess the effectiveness of Model 2, we investigated the loss curve, accuracy curve, confusion matrix, and the Receiver Operating Characteristic (ROC) curve. As demonstrated in the confusion matrix below, Model 2 improves in accurately classifying all the winning teams, and importantly, it generates fewer false positives compared to Model 1. This improvement highlights the effectiveness of Model 2 in discerning the true winners while minimizing misclassification.

The loss curve illustrated an optimal fit; the training loss gradually reduced and stabilized at a low point, indicating the model was learning. Importantly, the validation loss followed a similar trajectory, decreasing in parallel with the training loss. This pattern demonstrated the model's capacity to generalize effectively to unseen data, a crucial aspect in preventing overfitting.

The area under the ROC curve (AUC), which provides an aggregate measure of performance across all possible classification thresholds, amounted to an impressive 0.97. This indicated excellent model performance, notably surpassing the baseline AUC of 0.5, the equivalent of random guessing.

2022-2023 NBA Prediction and Feature Importance

As the ultimate test of our model, we asked it to forecast the 2022-2023 NBA champion. The top eight teams identified by our model matched the real-world performance, with six progressing to the conference semi-finals. The Boston Celtics, Denver Nuggets, and Miami Heat emerged as the top contenders, closely mirroring the actual outcomes of the season. Both the Miami Heat and Denver Nuggets advanced to the NBA Finals, while the Boston Celtics made it to the conference finals. The fact that Denver Nuggets just clinched the title also resonates precisely with our forecast.

Furthermore, we delved into the importance of various factors in predicting the NBA champion. Using a Gradient Boosting Classifier, we ranked our input features according to their contribution to the model's prediction and plotted the feature importance plot. The importance of these features offers valuable insights into the sport's dynamics and critical factors in the game of basketball.

Limitations and Improvements

In addition to the strategies already implemented in our model, several other methodologies could be employed to bolster its predictive capabilities. For example, we could enhance the handling of imbalanced datasets through resampling techniques, such as upsampling the minority class, thereby increasing its representation in the sample. Further, methods like Synthetic Minority Over-sampling Technique (SMOTE) could be deployed to synthesize new examples from the minority class. A continued commitment to data collection would also supplement our model's training set, allowing it to better generalize to new scenarios and reduce the risk of overfitting.

Secondly, we recognized that our model simplifies team performance around the trade deadline. In reality, crucial player trades don't always align with this deadline, and their impacts can resonate at different points throughout the season. Incorporating more precise indicators of when these trades occur could enhance our model's understanding of their effects and improve its predictive performance.

Thirdly, while our model considers a range of playoff factors, there remains room for enhancement in this area. One aspect that could be considered more thoroughly is player injury. Such incidents can drastically affect a team's performance and subsequent game outcomes, particularly in the playoffs. For instance, the 2019-2020 championship witnessed the Golden State Warriors lose their star players, Klay Thompson and Kevin Durant, to severe injuries, which substantially weakened their performance and led to this superstar team losing the championship match. Incorporating a feature that records significant player injuries could enrich our model's understanding of the impact these events have on a team's championship prospects.

Lastly, it is crucial to acknowledge that a certain degree of unpredictability and stochasticity is inherent in any sport, including the NBA. These elements of randomness often reflect unexpected outcomes that can significantly deviate from statistical predictions. To better capture this uncertainty, our model could be augmented with probabilistic modeling or simulation techniques. For instance, we could adopt a Bayesian neural network approach, which would provide a probability distribution over many plausible models and weights rather than a single-point estimate.

Conclusion

Our project reveals the effectiveness of neural networks in predicting the annual NBA champion with remarkable accuracy. Two models were constructed, one using only regular season data and the other integrating playoff data as well. Both models showed high accuracy beyond our expectations, and the integration of playoff data in the second model further improved performance. The practical test conducted on the 2022-2023 NBA season demonstrated the model's efficacy, as its predictions closely mirrored the actual outcomes. From the perspective of data science, to improve the predictive accuracy of our neural networks, incorporating strategies such as handling imbalanced datasets, improving the representation of player trades, and involving stochastic modeling to account for inherent uncertainties could be several possible ways. From the perspective of basketball players, the effect of player injuries, which has an indispensable influence on the team's overall strength, is also a potential way. Evaluating feature importance provided valuable insights into the factors most influential in determining NBA champions. Moreover, the methodology and selected features in this paper could be generalized to other sports analyses and predictions.

Special thanks to Yichen Wang, Zhekai Zheng, and Harry Tong whose invaluable contributions were instrumental in bringing this work to fruition.