I have previously written about some statistical methods for rating football teams and to predict the result of future matches. One was the last squares method and another was the Poisson regression method. None of these methods make good enough predictions. One problem with them is that they don’t incorporate a time perspective. Matches played a year ago is given equal importance as the most recent one. This could however be incorporated by weighing the the older matches less than newer matches. One other problem that I mentioned in the second post about Poisson regression is that teams are treated as categoricals which makes it hard to model the fact that a team’s ability changes over time.
One different kind of method that has been employed a lot in the recent years is the Elo rating system, which were originally developed for rating chess players. The method is rather simple, but I will not explain it in detail here since there are many good explanations of it elsewhere. Wikipedia has a very thorough coverage. The basic principle is that the difference in ratings between the two opposing teams provide a prediction for the result each game. The rating is then updated based on how the teams perform. If a team performs better than expected the rating increases, if they perform worse than expected the rating decrease. How much the rating changes depends on an update factor (often referred to as the K-factor).
Chess and football are of course different in many ways so the method for rating chess players is not directly suitable for rating football teams. The relative simplicity of the Elo system makes it easy to tweak and adjust to better fit football by incorporating things like home field advantage and goal difference. There are many sites around the Internet who provide different variants of Elo ratings, like the World Football Elo Ratings for national teams and Club Elo and Euro Club Index for club teams. FIFA even uses its own Elo system in its Womans World Ranking.
There has even been some research into different football rating systems. A paper titled The predictive power of ranking systems in association football (pdf) by Jan Lasek and others compared different rating systems. Their conclusion was that the different Elo type systems in general were better at predicting match outcomes than other types of rating systems.
I figured I wanted to implement a simple Elo rating system for rating football teams. There is already a package in R, PlayerRatings, which implements several different rating systems based on Elo. In my simple implementation there is no adjustment for goal difference, but I have support for home field advantage. All teams start with an initial rating of 1500. Here is what I got when I calculated the ratings for Premier League in November 2012 based on data going back to 1993. I used an update factor 24 without any home field advantage. There is no particular reason for this as I did this mostly as a proof of concept.
|
The table seems reasonable I think except for a couple of things. There is a problem related to relegation and promotion. Since I have used data back to 1993 every team who has played in the Premier League is given a rating. If a team is relegated to the Championship, their rating will no longer be updated. We can see that this creates some strange results. Take the two lowest rated teams for example. Derby has not been in the Premier League since the 2007-2008 season. Swindon, which is rated about 100 points higher than Derby, has not played in the Premier League since 1993-1994 season! Swindon now play in the fourth level of the English league system. So the ratings for the teams not in the Premier League should be considered invalid.
Relegation and promotion also creates a problem with inflated ratings. The Elo system is created so that the total number of points in the league should be constant. When a team is promoted they start with an initial rating of 1500, and if they later gets relegated they will probably have lost some of those points to the other teams in the league. In fact, we see that many of the teams with ratings less than 1500 no longer plays in the Premier League. The points they have lost are still in present in the league even though the team isn’t. This means that over time the average ratings of the teams in the league will increase.
The code I have written takes a data frame as input and works “out of the box” with data from football-data.co.uk. If you are going to use it yourself you have to make sure the data is sorted by date as the rating function just loops from top to bottom.
Here is how you can use it:
dta <- read.csv("yourdata.csv") elo <- eloRating(data=dta) print(elo)
And here is the code:
eloRating <- function(home="HomeTeam", away="AwayTeam", homeGoals="FTHG", awayGoals="FTAG", data, kfactor=24, initialRating=1500, homeAdvantage=0){ #Make a list to hold ratings for all teams all.teams <- levels(as.factor(union(levels(as.factor(data[[home]])), levels(as.factor(data[[away]]))))) ratings <- as.list(rep(initialRating, times=length(all.teams))) names(ratings) <- all.teams #Loop trough data and update ratings for (idx in 1:dim(data)[1]){ #get current ratings homeTeamName <- data[[home]][idx] awayTeamName <- data[[away]][idx] homeTeamRating <- as.numeric(ratings[homeTeamName]) + homeAdvantage awayTeamRating <- as.numeric(ratings[awayTeamName]) #calculate expected outcome expectedHome <- 1 / (1 + 10^((awayTeamRating - homeTeamRating)/400)) expectedAway <- 1 - expectedHome #Observed outcome goalDiff <- data[[homeGoals]][idx] - data[[awayGoals]][idx] if (goalDiff == 0){ resultHome <- 0.5 resultAway <- 0.5 } else if (goalDiff < 0){ resultHome <- 0 resultAway <- 1 } else if (goalDiff > 0){ resultHome <- 1 resultAway <- 0 } #update ratings ratings[homeTeamName] <- as.numeric(ratings[homeTeamName]) + kfactor*(resultHome - expectedHome) ratings[awayTeamName] <- as.numeric(ratings[awayTeamName]) + kfactor*(resultAway - expectedAway) } #prepare output ratingsOut <- as.numeric(ratings) names(ratingsOut) <- names(ratings) ratingsOut <- sort(ratingsOut, decreasing=TRUE) return(ratingsOut) }
Pingback: How to determine which football team is best? Statistical power and experimental design | opisthokonta.net
Pingback: Elo ratings in football: Home field advantage | opisthokonta.net
Hi, I was wondering if to use your R code but instead of giving an initial Ranking of 1500 like you do, how would you set the initial ranking on a csv file with team names and then iterate through the list the way you do it?
That is possible. In the function, you see that the ratings variable that is created in the beginning is a list with the team names as names. It is easy to modify this so that you can supply your own initial ratings, possibly loaded from a csv file.
Hi, first of all thank you for your excellent work.
I’m working on a bachelor thesis about football forecasting, and I’m quite a beginner on R. I don’t understand how could I substitute the initial ratings in your function with the ones I computed for the last 5 seasons of football.
Do I have to modify the InitialRating parameter in the first eloRating function, the ratings function in line 9 or both?
I have my initial ratings in a csv file named ‘InitialR.csv’, with header (team, elo).
Thank you for your help, your site gave me a huge hand.
In my function the InitialRating argument should be a single number that’s assigned to all teams. You can modify the function where the ratings variable is set to the initial ratings like this:
ratings <- InitialRating Assuming you have provided a vector of ratings with team names.
Hi,
Firstly, I would like to thank you for this awesome post and I really appreciate it :). Im working on rating system for association football such as elo rating, pi rating and double poisson. I would like to hear your suggestion\recommendations if there any references that I can refer to enhance this elo rating with influence of home advantage and goal differences?
Thank you, your site gave me a huge hand :).
Hello NazimR,
I was wondering if you found the solution on this. I am currently doing a thesis on a predictive model for the Champions League and I am firstly ranking the top teams from Europe using Elo ratings. I am using this data frame : James P. Curley (2016). engsoccerdata: English Soccer Data 1871-2016. R package version 0.1.5. This is not a csv file and I’m having trouble implementing the R code provided above using this data. Any help would be much appreciated. Also, thank you to opisthokonta for the great work.
If you are looking for academic references for Elo for soccer, I would suggest this one:
Hvattum & Arntzen (2010) Using ELO ratings for match result prediction in association football
https://www.sciencedirect.com/science/article/pii/S0169207009001708
Thank you, I will look into this. I appreciate your help!