Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.

Publication

Predicting Football Touchdowns with Machine Learning
Latest   Machine Learning

Predicting Football Touchdowns with Machine Learning

Last Updated on November 17, 2023 by Editorial Team

Author(s): Max Charney

Originally published on Towards AI.

Football. An American pastime that unites fans across the nation. With an average of 16.7 million views per game and 113 million Super Bowl LVII viewers, the sport is clearly beloved by many. I created a machine-learning model to break down and analyze the game. Let’s dive into it.

Firstly, we should recognize the key player on any football team’s offense: the quarterback. This player distributes the ball to teammates in the hopes of gaining yards or, better yet, scoring a touchdown (find the game’s basic rules here). Touchdowns reward teams with the most points out of all scoring options, and although they are difficult to achieve, they’re typically prioritized on offense. What if we could analyze the factors that lead to touchdowns and predict which quarterbacks will do best in upcoming seasons?

Photo by Keith Johnston on Unsplash

Data. There’s a lot of football data. I found play-by-play csv files from as far back as 1999 in this GitHub repository (including key player’s names, yards gained, passes completed, etc.). Such an extensive amount of data calls for… machine learning!

Features. Before making the machine learning model, I had to figure out which features most correlate with more touchdowns (there are 372 features in the data set!). By testing and graphing various factors, I found that the five factors that most correlate with touchdowns are yards gained, completed passes, passes in general, interceptions, and sacks. These factors progressively correlate less with more touchdowns, and it is important to remember that correlation does not equal causation. For example, if a quarterback has thrown more interceptions, that could be simply an indication that they played/threw a lot, which could also affect their amount of touchdowns. While rushing might be a primary means of scoring for some quarterbacks, this data looks at all the league quarterbacks. This means that some outliers might score more touchdowns without passing as much because they score through other methods, such as rushing.

Touchdown correlation with same season stats

Year to year. Clearly, in one year, certain quarterback stats correlate with their touchdowns. However, would their stats from the previous season correlate with their touchdowns? In other words, could a quarterback’s stats from one season indicate their performance next season? I needed to know, as this is critical to predicting which quarterbacks will succeed in future seasons. After graphing the touchdowns with previous season stats, I discovered that there still is a correlation! Additionally, this time, we can also include the correlation between previous and current season touchdowns. Unfortunately, the correlation isn’t as strong as before, but it still means that we can progress to the next step: machine learning.

Touchdown correlation with previous season stats

Machine Learning. By utilizing machine learning, we can predict which quarterbacks will do well in the upcoming football season. Linear regression, a term you might remember from math class, is the data analysis technique that predicts the value of unknown data (touchdowns) by using other related and known data values (the features we decided on earlier). I created the linear regression model using a train test split (simple explanation). I actually also tested a random forest model, but the regression model had better results.

And you may be wondering, does a linear regression model even count as machine learning? Yep. It does, primarily because it involves using statistical techniques to learn a model from data that can make predictions or estimate relationships between variables; this is a pretty fundamental concept in the field of machine learning.

Outside Factors. It’s important to notes that many factors play a role in determining what makes a quarterback “good.” For example, the other players on the quarterback’s team, the quarterback’s coaching, and their specific strong suits, such as rushing vs. passing, all play a role. These factors may be challenging to account for in the training process, and the effects will likely be seen in the final predictions. That’s why I focused this model on achieved touchdowns alone rather than overall quarterback rankings.

Results. With a mean squared error of 7.4649 (meaning the predictions are off by an average of ~7.5 touchdowns) and an r-squared score of 0.709 (which means that the variables have a strong effect on the dependent variable), I would say that the model worked decently. Wait but that doesn't sound that great! As previously mentioned, with football, other factors come into play such as injuries, rookies, etc. My model accurately predicted 6/10 top touchdown scorers for 2022 based on 2021 data, with the incorrect predictions being accounted for due to age, injury, and other factors. I also tested the model to predict quarterback success in the upcoming 23–24 NFL season, and it turned out pretty successful! The top 10 highest touchdown scorer predictions aligned with 7/10 of the Fox News QB predictions, with misalignments accounted for by injury, being good at other things (such as rushing yards instead of throwing), a poor previous season, or a new team. And who knows, maybe my predictions will turn out accurate!

Top 10 NFL QB Touchdown Predictions for 2023–2024. (“Preds” means total predicted touchdowns)

What does this mean? I created a machine learning model that could predict quarterback touchdowns with reasonably high accuracy by finding intricate patterns in complex data. This shows how powerful machine learning is and how it has a vast variety of applications. It is important to remember that other information is required in certain areas, such as football, to make the best judgment about players and statistics. Yet, who knows? Maybe using this model could help you win bets or your fantasy football leagues. Well is there anything else you can do with this? The more prominent application is when you analyze teams and the league in general. While we analyzed quarterback touchdowns with play-by-play data from past years, other information can be implemented and used to analyze quarterback success as a whole, different positions, and teams as a whole, too. Exploring football data (or sports data in general) can unlock game changing insights and predictions. Approaching sports from an analytical point of view isn’t very new, but being equipped with the latest powerful technology we have today is sure to revolutionize how we understand, analyze, and excel in the game.

Perhaps in the future we’ll be able to have a computer predict a perfect bracket or make optimal sports bets for us through machine learning…

You can find my code at my GitHub repository here.

I’ll list some other resources and sources below that may be interesting:

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓