Football Forecasting - Margin-of-Victory Model - EdsCave

Main menu

Home Page
Sensors
Simulation
- Birthday Paradox
- Queuing Systems Part 1
- Introduction to Simulation
- Vibrating String
- Airplane Boarding Part 1
- Airplane Boarding Part 2
- Airplane Boarding Part 3
- Block Diagram Notation
- Simulation Links
Analytics
Forecasting
Football Forecasting
Random Corner
Projects
Resources
Blog
About/Contact
Creative Works
Separator 1
Privacy & Terms

Football Forecasting - Margin-of-Victory Model

Football Forecasting

While all of the previous models except for 'Home Team Wins' develop a rating metric for each team, they all have ignored the issue that wins and points mean different things for different games. For example, if a middle-of-the-league team beats a much weaker one, previous models count the wins and points the same way as if they had defeated the current champion. This model is an attempt to take the mismatch in team performance into account when developing ratings.

The basis of this forecasting algorithm is to build a linear regression model for Marging-of-Victory (MOV), based on which teams are playing, and a home advantage factor. The model for a game is

MOV= Visiting_team_factor - Home_Team_Factor + Home_Advantage_Factor

The setup for the equivalent regression factors is a matrix that looks like the following:

For each historical game, the regression variable is the MOV or the difference in scoes between the visiting and home teams. The input variables to the regression are simply a '+1' in the column for the visiting team, a '-1' in the home team's column, a '1' in the Hometeam advantage column (for all games). The value of '0' goes everywhere else and indicates that a given team has no involvement with that game.

When a regression is performed on the above data, it results in a series of 'strength' ratings for each team, and the home advantage factor. To compute the predicted MOV for the next week's game, you just plug in the factors into the above equation. A positive MOV predicts a visitor win, while a negative MOV predicts a home win.

While the subject of how linear regression works is way too complex to describe adequately here, the key feature is that it tries to fit a model to the actual data by simultaneously minimizing the errors between what the model would predict for all input data, and the actual data. So if you were to take the team ratings developed by the regression model, and use them to retroactively 'predict' (postdict?) the MOVs used as input, you would find that they were optimal - meaning that you can not find a set of team ratings that predicted the input data any better - at least if you are measuring optimality as minimizing the least squares error. Note, however, that a good fit to the data used to build a model is absolutely no guarantee that the model has much predictive power - this is called overfitting and is one of the long-time banes of predictive analytics.

One practical problem that I have found with this method is that it seems to result in ill-formed regression matrices, and that some regression software packages will simply not find solutions, particularly early in the season when there are few games. I believe the root of this problem is that early in the season, there have not been enough games played so that one can find a 'game-chain' to link any team to any other team, and that there are a number of distinct 'cliques' within which teams have played other teams, but no team within that clique has played a game with anyone in a different clique.

For this reason I used an iterative method for finding the team ratings. While this does not get around the above-mentioned problem of un-connected cliques, it does get aorund the issue of ill-formed matrices and numerical instability. The downside is that team ratings early in the season may not only be non-optimal, but may not be even very good. Pseudocode for this fitting algorithm is shown below:

Once there are enough connections between teams, however, the predictive performance of the iterative fitted linear model (in my experience) will typically range from a little worse to occasionally a little better than that of one developed using traditional least-squares regression.

So how well does this work in practice? Below are the 2014 NFL season results:

Overall, for the 2012,2013,2014 NFL seasons, this model makes correct calls 66% of the time.

DISCLAIMER !

Professional odds-makers have better predictive methods and algorithms than this one. This algorithm will NOT let you beat the odds in a consistent manner in Las Vegas-style gambling.

Next Page - Two Factor Model