Elo ratings in football | opisthokonta.net

I have previously written about some statistical methods for rating football teams and to predict the result of future matches. One was the last squares method and another was the Poisson regression method. None of these methods make good enough predictions. One problem with them is that they don’t incorporate a time perspective. Matches played a year ago is given equal importance as the most recent one. This could however be incorporated by weighing the the older matches less than newer matches. One other problem that I mentioned in the second post about Poisson regression is that teams are treated as categoricals which makes it hard to model the fact that a team’s ability changes over time.

One different kind of method that has been employed a lot in the recent years is the Elo rating system, which were originally developed for rating chess players. The method is rather simple, but I will not explain it in detail here since there are many good explanations of it elsewhere. Wikipedia has a very thorough coverage. The basic principle is that the difference in ratings between the two opposing teams provide a prediction for the result each game. The rating is then updated based on how the teams perform. If a team performs better than expected the rating increases, if they perform worse than expected the rating decrease. How much the rating changes depends on an update factor (often referred to as the K-factor).

Chess and football are of course different in many ways so the method for rating chess players is not directly suitable for rating football teams. The relative simplicity of the Elo system makes it easy to tweak and adjust to better fit football by incorporating things like home field advantage and goal difference. There are many sites around the Internet who provide different variants of Elo ratings, like the World Football Elo Ratings for national teams and Club Elo and Euro Club Index for club teams. FIFA even uses its own Elo system in its Womans World Ranking.

There has even been some research into different football rating systems. A paper titled The predictive power of ranking systems in association football (pdf) by Jan Lasek and others compared different rating systems. Their conclusion was that the different Elo type systems in general were better at predicting match outcomes than other types of rating systems.

I figured I wanted to implement a simple Elo rating system for rating football teams. There is already a package in R, PlayerRatings, which implements several different rating systems based on Elo. In my simple implementation there is no adjustment for goal difference, but I have support for home field advantage. All teams start with an initial rating of 1500. Here is what I got when I calculated the ratings for Premier League in November 2012 based on data going back to 1993. I used an update factor 24 without any home field advantage. There is no particular reason for this as I did this mostly as a proof of concept.

	Rating (November 2012)
Man United	1807
Man City	1767
Chelsea	1696
Arsenal	1658
Tottenham	1645
Everton	1640
Newcastle	1613
Fulham	1591
Liverpool	1567
West Brom	1562
Leeds	1552
Wigan	1543
Swansea	1526
Sunderland	1524
Stoke	1521
Middlesboro	1516
Norwich	1509
Aston Villa	1498
West Ham	1494
Birmingham	1493
Blackpool	1483
Ipswich	1481
Bolton	1479
Charlton	1470
Sheffield United	1458
Blackburn	1450
Reading	1448
Sheffield Weds	1447
Coventry	1447
Middlesbrough	1443
QPR	1440
Barnsley	1439
Portsmouth	1438
Southampton	1437
Oldham	1436
Crystal Palace	1433
Leicester	1433
Nott’m Forest	1430
Hull	1422
Burnley	1418
Wolves	1414
Wimbledon	1413
Watford	1411
Bradford	1406
Swindon	1404
Derby	1297

The table seems reasonable I think except for a couple of things. There is a problem related to relegation and promotion. Since I have used data back to 1993 every team who has played in the Premier League is given a rating. If a team is relegated to the Championship, their rating will no longer be updated. We can see that this creates some strange results. Take the two lowest rated teams for example. Derby has not been in the Premier League since the 2007-2008 season. Swindon, which is rated about 100 points higher than Derby, has not played in the Premier League since 1993-1994 season! Swindon now play in the fourth level of the English league system. So the ratings for the teams not in the Premier League should be considered invalid.

Relegation and promotion also creates a problem with inflated ratings. The Elo system is created so that the total number of points in the league should be constant. When a team is promoted they start with an initial rating of 1500, and if they later gets relegated they will probably have lost some of those points to the other teams in the league. In fact, we see that many of the teams with ratings less than 1500 no longer plays in the Premier League. The points they have lost are still in present in the league even though the team isn’t. This means that over time the average ratings of the teams in the league will increase.

The code I have written takes a data frame as input and works “out of the box” with data from football-data.co.uk. If you are going to use it yourself you have to make sure the data is sorted by date as the rating function just loops from top to bottom.

Here is how you can use it:

dta <- read.csv("yourdata.csv")
elo <- eloRating(data=dta)
print(elo)

And here is the code:


eloRating <- function(home="HomeTeam", away="AwayTeam", homeGoals="FTHG",
                      awayGoals="FTAG", data, kfactor=24, initialRating=1500,
                      homeAdvantage=0){
  
  #Make a list to hold ratings for all teams
  all.teams <- levels(as.factor(union(levels(as.factor(data[[home]])),
                                      levels(as.factor(data[[away]])))))
  
  ratings <- as.list(rep(initialRating, times=length(all.teams)))
  names(ratings) <- all.teams

  #Loop trough data and update ratings
  for (idx in 1:dim(data)[1]){
  
    #get current ratings
    homeTeamName <- data[[home]][idx]
    awayTeamName <- data[[away]][idx]
    homeTeamRating <- as.numeric(ratings[homeTeamName]) + homeAdvantage
    awayTeamRating <- as.numeric(ratings[awayTeamName])
    
    #calculate expected outcome 
    expectedHome <- 1 / (1 + 10^((awayTeamRating - homeTeamRating)/400))
    expectedAway <- 1 - expectedHome
    
    #Observed outcome
    goalDiff <- data[[homeGoals]][idx] - data[[awayGoals]][idx]
    if (goalDiff == 0){
      resultHome <- 0.5
      resultAway <- 0.5
    }
    else if (goalDiff < 0){
      resultHome <- 0
      resultAway <- 1
    }
    else if (goalDiff > 0){
      resultHome <- 1
      resultAway <- 0
    }
    
    #update ratings
    ratings[homeTeamName] <- as.numeric(ratings[homeTeamName]) + kfactor*(resultHome - expectedHome)
    ratings[awayTeamName] <- as.numeric(ratings[awayTeamName]) + kfactor*(resultAway - expectedAway)
  }
  
  #prepare output
  ratingsOut <- as.numeric(ratings)
  names(ratingsOut) <- names(ratings)
  ratingsOut <- sort(ratingsOut, decreasing=TRUE)

  return(ratingsOut) 
}

10 thoughts on “Elo ratings in football”

Pingback: How to determine which football team is best? Statistical power and experimental design | opisthokonta.net
Pingback: Elo ratings in football: Home field advantage | opisthokonta.net
Chris McGonagle on November 18, 2017 at 9:37 pm said:

Hi, I was wondering if to use your R code but instead of giving an initial Ranking of 1500 like you do, how would you set the initial ranking on a csv file with team names and then iterate through the list the way you do it?

Reply ↓
- opisthokonta on November 23, 2017 at 1:51 pm said:
  
  That is possible. In the function, you see that the ratings variable that is created in the beginning is a list with the team names as names. It is easy to modify this so that you can supply your own initial ratings, possibly loaded from a csv file.
  
  Reply ↓
  - Andrea on July 28, 2018 at 2:45 pm said:
    
    Hi, first of all thank you for your excellent work.
    I’m working on a bachelor thesis about football forecasting, and I’m quite a beginner on R. I don’t understand how could I substitute the initial ratings in your function with the ones I computed for the last 5 seasons of football.
    Do I have to modify the InitialRating parameter in the first eloRating function, the ratings function in line 9 or both?
    I have my initial ratings in a csv file named ‘InitialR.csv’, with header (team, elo).
    Thank you for your help, your site gave me a huge hand.
    
    Reply ↓
    - opisthokonta on August 2, 2018 at 9:19 am said:
      
      In my function the InitialRating argument should be a single number that’s assigned to all teams. You can modify the function where the ratings variable is set to the initial ratings like this:
      
      ratings <- InitialRating Assuming you have provided a vector of ratings with team names.
      
      Reply ↓
NazimR on October 8, 2019 at 5:04 am said:

Hi,

Firstly, I would like to thank you for this awesome post and I really appreciate it :). Im working on rating system for association football such as elo rating, pi rating and double poisson. I would like to hear your suggestion\recommendations if there any references that I can refer to enhance this elo rating with influence of home advantage and goal differences?

Thank you, your site gave me a huge hand :).

Reply ↓
- Zacharias Andreou on October 13, 2019 at 8:53 pm said:
  
  Hello NazimR,
  
  I was wondering if you found the solution on this. I am currently doing a thesis on a predictive model for the Champions League and I am firstly ranking the top teams from Europe using Elo ratings. I am using this data frame : James P. Curley (2016). engsoccerdata: English Soccer Data 1871-2016. R package version 0.1.5. This is not a csv file and I’m having trouble implementing the R code provided above using this data. Any help would be much appreciated. Also, thank you to opisthokonta for the great work.
  
  Reply ↓
  - opisthokonta on October 14, 2019 at 10:34 am said:
    
    If you are looking for academic references for Elo for soccer, I would suggest this one:
    
    Hvattum & Arntzen (2010) Using ELO ratings for match result prediction in association football
    
    https://www.sciencedirect.com/science/article/pii/S0169207009001708
    
    Reply ↓
    - Zacharias Andreou on October 17, 2019 at 10:28 pm said:
      
      Thank you, I will look into this. I appreciate your help!
      
      Reply ↓

10 thoughts on “Elo ratings in football”

Leave a Reply to Chris McGonagle Cancel reply