Predicting Football Touchdowns with Machine Learning
Last Updated on November 17, 2023 by Editorial Team
Author(s): Max Charney
Originally published on Towards AI.
Football. An American pastime that unites fans across the nation. With an average of 16.7 million views per game and 113 million Super Bowl LVII viewers, the sport is clearly beloved by many. I created a machine-learning model to break down and analyze the game. Letβs dive into it.
Firstly, we should recognize the key player on any football teamβs offense: the quarterback. This player distributes the ball to teammates in the hopes of gaining yards or, better yet, scoring a touchdown (find the gameβs basic rules here). Touchdowns reward teams with the most points out of all scoring options, and although they are difficult to achieve, theyβre typically prioritized on offense. What if we could analyze the factors that lead to touchdowns and predict which quarterbacks will do best in upcoming seasons?
Data. Thereβs a lot of football data. I found play-by-play csv files from as far back as 1999 in this GitHub repository (including key playerβs names, yards gained, passes completed, etc.). Such an extensive amount of data calls forβ¦ machine learning!
Features. Before making the machine learning model, I had to figure out which features most correlate with more touchdowns (there are 372 features in the data set!). By testing and graphing various factors, I found that the five factors that most correlate with touchdowns are yards gained, completed passes, passes in general, interceptions, and sacks. These factors progressively correlate less with more touchdowns, and it is important to remember that correlation does not equal causation. For example, if a quarterback has thrown more interceptions, that could be simply an indication that they played/threw a lot, which could also affect their amount of touchdowns. While rushing might be a primary means of scoring for some quarterbacks, this data looks at all the league quarterbacks. This means that some outliers might score more touchdowns without passing as much because they score through other methods, such as rushing.
Year to year. Clearly, in one year, certain quarterback stats correlate with their touchdowns. However, would their stats from the previous season correlate with their touchdowns? In other words, could a quarterbackβs stats from one season indicate their performance next season? I needed to know, as this is critical to predicting which quarterbacks will succeed in future seasons. After graphing the touchdowns with previous season stats, I discovered that there still is a correlation! Additionally, this time, we can also include the correlation between previous and current season touchdowns. Unfortunately, the correlation isnβt as strong as before, but it still means that we can progress to the next step: machine learning.
Machine Learning. By utilizing machine learning, we can predict which quarterbacks will do well in the upcoming football season. Linear regression, a term you might remember from math class, is the data analysis technique that predicts the value of unknown data (touchdowns) by using other related and known data values (the features we decided on earlier). I created the linear regression model using a train test split (simple explanation). I actually also tested a random forest model, but the regression model had better results.
And you may be wondering, does a linear regression model even count as machine learning? Yep. It does, primarily because it involves using statistical techniques to learn a model from data that can make predictions or estimate relationships between variables; this is a pretty fundamental concept in the field of machine learning.
Outside Factors. Itβs important to notes that many factors play a role in determining what makes a quarterback βgood.β For example, the other players on the quarterbackβs team, the quarterbackβs coaching, and their specific strong suits, such as rushing vs. passing, all play a role. These factors may be challenging to account for in the training process, and the effects will likely be seen in the final predictions. Thatβs why I focused this model on achieved touchdowns alone rather than overall quarterback rankings.
Results. With a mean squared error of 7.4649 (meaning the predictions are off by an average of ~7.5 touchdowns) and an r-squared score of 0.709 (which means that the variables have a strong effect on the dependent variable), I would say that the model worked decently. Wait but that doesn't sound that great! As previously mentioned, with football, other factors come into play such as injuries, rookies, etc. My model accurately predicted 6/10 top touchdown scorers for 2022 based on 2021 data, with the incorrect predictions being accounted for due to age, injury, and other factors. I also tested the model to predict quarterback success in the upcoming 23β24 NFL season, and it turned out pretty successful! The top 10 highest touchdown scorer predictions aligned with 7/10 of the Fox News QB predictions, with misalignments accounted for by injury, being good at other things (such as rushing yards instead of throwing), a poor previous season, or a new team. And who knows, maybe my predictions will turn out accurate!
What does this mean? I created a machine learning model that could predict quarterback touchdowns with reasonably high accuracy by finding intricate patterns in complex data. This shows how powerful machine learning is and how it has a vast variety of applications. It is important to remember that other information is required in certain areas, such as football, to make the best judgment about players and statistics. Yet, who knows? Maybe using this model could help you win bets or your fantasy football leagues. Well is there anything else you can do with this? The more prominent application is when you analyze teams and the league in general. While we analyzed quarterback touchdowns with play-by-play data from past years, other information can be implemented and used to analyze quarterback success as a whole, different positions, and teams as a whole, too. Exploring football data (or sports data in general) can unlock game changing insights and predictions. Approaching sports from an analytical point of view isnβt very new, but being equipped with the latest powerful technology we have today is sure to revolutionize how we understand, analyze, and excel in the game.
Perhaps in the future weβll be able to have a computer predict a perfect bracket or make optimal sports bets for us through machine learningβ¦
You can find my code at my GitHub repository here.
Iβll list some other resources and sources below that may be interesting:
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI