BBC’s More Or Less on why the men’s FIFA rankings fail

One of the podcasts I listen to regularly, ‘More Or Less’ from the BBC, had the other day an episode about the (men’s) FIFA rankings. In the episode they discuss a shortcoming in the ranking system that makes it possible for a team to loose points (and thus ranking position) despite winning a match. The reason for this is not fully explained, but looking closer at the descriptions provided at fifa.com I think I see where the problem lies. After each match, rating points are given to the winner (or split if there is a draw). The crucial thing here is that friendly matches (or other non-important matches) gives fewer points than important tournament matches. The published ratings then are basically an average over the points earned for the matches played in the last couple of years. That means that winning a friendly match sometimes will yield fewer than a team’s average points, thus decreasing the average.

Unfortunately the episode did not mention the women’s FIFA ranking system which is based on the much better Elo system, used in chess rankings (and which I have written about previously). In this sort of system a win will almost surely give more points, and not less (the worst case scenario for a win is that no points are earned).

Dataset: Football stadiums with geographic coordinates

Here is a dataset I have put together with the location and capacity for the stadiums for about 160 European teams. The teams are from England, Scotland, France, Germany and Spain. The data is taken from Wikipedia and should be correct for the last couple of seasons. The French team Lille’s stadium is the current from the 2012 season, and Nice’s stadium is not the current one, but the one they had until the end of last season.

Some of the coordinates are more accurate than others, but I think they should be accurate enough to at least give an indication of the town the team comes from. That is probably true for the teams that have recently moved to another stadium as well; they are probably within the same town. The Guardian has looked into how far English Premier League clubs have moved, and it is usually less than 10 kilometers.

The data table contains 8 columns. The FDCOUK column contains the names of the teams as they appear in data from football-data.co.uk/. Since the names in the FDCOUK column often is abbreviated, more complete names are found in the Team column. There is also a column with the name of the stadium, but I have not been consistent with the naming with regards to traditional names vs. sponsored names.

What can this data set be used for? One thing I want to look into is whether traveling distance for the visiting team in a match influences the home field advantage. I have a couple of other ideas as well, but that will be for another time.

Download data

Edit: Updated dataset after I found a trailing whitespace in the FDCOUK column.
Edit December 16 2013: Added two Spanish teams.
Edit March 2nd 2015: Added 27 English teams.

Elo ratings in football: Home field advantage

In my first post about Elo ratings in football I posted the code for a R function where you could adjust the ratings to account for home field advantage. The method is simple: Some extra points are added to the home teams rating when the match predictions (which is based on the ratings) are calculated. My implementation did only support giving a the same fixed amount of extra points to all teams in all games. In other words it is assumed that all teams have the same home field advantage, and that the home field advantage does not change over time. This is of course unrealistic if the point of the ratings is to give as accurate predictions as possible. Still the method is used in the FIFA Women’s World Ranking and in the World Football Elo Ratings.

I know of two ways to implement a more dynamic (and more realistic) home field advantage. The ClubElo ratings (which is perhaps the best football rating site out there), developed by Lars Schiefler, let the home field advantage change over time, similar to how the ratings change. This is done by updating the home field advantage after each game based on the home team’s performance. An article on the ClubElo site describes the details very well.

A rather different method is used in the pi-rating system, developed by Anthony Constantinou. Each team have two ratings, one describing performance when playing at home, the other when playing away. The cool thing about the way this is done is that the two ratings for a team are not calculated separately from each other. It is not the case that only the home matches are used to calculate the home rating and away matches to calculate the away rating; After each match both ratings are updated. The home rating for the home team is updated almost like the regular Elo ratings, and the away rating is then also updated based on how much the home rating has changed, scaled by a factor. That way the two ratings are allowed to deviate from each other, giving rise to an adaptive, team specific home field advantage. The procedure is of course also applied to the away team’s ratings.

The pi-ratings, by the way, are interesting in other ways besides the method for determining the home field advantage. Instead of the ratings being somewhat arbitrary numbers, like most Elo systems, the pi-ratings directly models goal differences. The details are described in the paper Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries. A draft of the paper is also available at the pi-ratings website. While I am at it, I can also recommend Constantinou’s other papers on football prediction.

How to determine which football team is best? Statistical power and experimental design

Luck plays a significant part in a football match. Because of this we are not absolutely sure that the winning team in a match is the best one. Some researchers has taken a closer look at this by viewing a football match as a experiment used to determine which team is the best (Soccer matches as experiments: how often does the ‘best’ team win? by G. K. Skinner & G. H. Freeman, link). They found that in matches where the goal difference where less than about 3 or 4 goals, we could in general not be more than 90% sure that the best team won. This led the scientists to call a football match for “a badly designed experiment”.

While a single match could only hope to determine which of two teams is the best, we need a lot more matches to determine the best team among several. We need to hold a competition. There are several ways in which the different teams can play against each other in a competition. The perhaps most common formats are the the all versus all format we find in most national leagues. In this format every team plays against every other team in the league. Another common type of competition is the knockout tournament. This is the format used in the last stage in many international competitions like FIFA World Cup.

If we suppose the goal of a competition is to determine the best team we can see the competition as an experimental setup. Both of the two different competition formats have pros and cons. In an all versus all leagues (hereafter just referred to as a league) the teams often play each other twice during the season, once at each team’s home field. We thus get a repetition of each pair of teams, and we also get to control for home field advantage. This may or may not be the case in knockout tournaments (hereafter just referred to as a tournament). In knockout stage at the FIFA World Cup the the teams facing each other only plays a single match. Also, all teams except for the team representing the hosting nation has a home field advantage (if they reach the knockout stage, that is). The UEFA Champions League knockout stage operates with 2-leg matches, where each team play each other twice.

The question whether different types of competitions are better or worse at correctly identifying the best team relates to the statistical concept of power. In short, the power of an experimental procedure is the probability of confirming the alternative hypothesis when the alternative is true. In terms of identifying the best team the hypotheses can be stated as

H0: Team X is not the best team
H1: Team X is the best team

So what we want to figure out is what is the probability of team X winning the competition if it truly is the best team. The power of an experiments depends on a couple of factors: The number of observations, the size of the effect and the experimental procedure itself. In a football competition the number of observations and the procedure is greatly confounded. The number of matches is central to the competition format. A lot more matches are played in a league than in a tournament. In a league with N teams N(N-1) matches has to be played. Compared to a 1-leg tournaments where log2N matches has to be played, this is much greater. The effect size in this context is how good the best team is compared to the other teams.

Power analysis can be rather difficult to do analytically except for in the simplest models. One way to do a power analysis is therefore to do simulations. For the simulations I did here I decided to use Elo-ratings (which I have written about here) to generate some ratings and then simulate a competition. By doing this we can know which team is the best. By simulating the competition many times over we can get an estimate of the probability that the best team wins. The Elo-ratings can be used directly to calculate the chances of winning and loosing a match and is therefore a simple way to do this. Elo-ratings has some drawbacks, however. The most obvious that comes to my mind is that it is impossible to calculate the probability of a draw. This may be a problem the simulations of the league competitions. Hopefully, the results does not suffer too much because of this in the long run, since the probability of winning, as calculated by the ratings, does include half of the probability of drawing.

For the simulations I generated uniformly distributed ratings for 16 teams. By changing the upper and lower bounds for the uniform distribution we can change the competitiveness of the league. I used two sets of bounds: One where the ‘win percentage’ between the two bounds where 90% and one with 75%. We can think of this as the variability in effect size. For each simulation new ratings were generated. For each of the results 100000 competitions were simulated. For the tournament simulations I also looked at two different initial seedings. One was the completely random seed. The other was better informed, where the top half of the teams initially matched up against one of the bottom half of the teams. Otherwise it was random.

Here are the results:

Rank League (90) League (75) Unseeded (90) Unseeded (75) Seeded (90) Seeded (75)
1 0.3763 0.2523 0.2121 0.1368 0.2196 0.1438
2 0.2432 0.1972 0.1732 0.1236 0.1804 0.1296
3 0.1561 0.1488 0.1397 0.1095 0.1434 0.1124
4 0.0991 0.1127 0.1114 0.0965 0.1167 0.101
5 0.0556 0.0841 0.0891 0.0854 0.0924 0.0897
6 0.0344 0.0635 0.0702 0.0746 0.0732 0.079
7 0.0178 0.0455 0.0544 0.0658 0.0578 0.07
8 0.0096 0.0316 0.041 0.0565 0.0432 0.0595
9 0.0046 0.0223 0.0315 0.05 0.0211 0.0426
10 0.002 0.0152 0.0245 0.0444 0.0165 0.0376
11 0.0009 0.0104 0.0178 0.0374 0.0114 0.0313
12 0.0003 0.0067 0.0124 0.0318 0.0093 0.0287
13 0.0001 0.0044 0.0092 0.0279 0.0062 0.0238
14 0 0.0028 0.0063 0.0235 0.0042 0.0201
15 0 0.0014 0.0045 0.0203 0.0028 0.0168
16 0 0.0011 0.0027 0.0161 0.0018 0.0141

The league format unsurprisingly is much better at determining the best team than tournaments. What I found most surprising was how little effect the seeding in a tournament has. For both the higher and the lower competitive tournaments the chance of correctly identify the best team increases by less than one percentage point.

FIFA Women’s World Ranking and goal difference in Elo ratings.

The FIFA rankings for women’s national teams use a quite different methodology than the one used in ranking men’s national teams. The Women’s World Ranking (WWR) is based on the Elo rating system I wrote about in the previous post. The details for the men’s ranking can be found here, and the details the women’s ranking can be found here.

One thing that makes the WWR interesting is how goal differences are accounted for in the ratings. This is not something found in the ordinary chess-based Elo ratings. The method used in WWR is to let the ‘winning percentage’ change depending on two things: Goal difference and the number of goals the loosing team scored. This is in contrast to the original Elo ratings where the winning team wins 100% and the loosing team wins 0% (a draw is 50%). In WWR a team can never win 100%, the most it can win is 99%. This is the case when the goal difference is more than 6 and the loosing team haven’t scored any goals. The table below is from the pdf file linked to above and shows how many percent the loosing team win.

wwrtable

One strange thing about this table is the column for goal difference 0, implying a draw. I am guessing this is an error since it means that the winning percentage for the loosing team will be greater than the winning team. In the paper I mentioned in my last post where a number of different rating methods were compared (The predictive power of ranking systems in association football by J. Laset et. al), it was assumed that draw would yield both teams 50%, as in the original Elo-ratings. That paper also showed that the Women’s World Ranking was among the rating systems with best prediction.

Here is a plot showing the win percentage when the loosing team has scored 0 and 5 goals. We can see that there is not much gained for the loosing team to score one extra goal (assuming the goal difference stays the same, which of course is dubious), and most of the gain in winning percentage is when a team scores a goal such that the stance goes from a draw to a win.

WWRwinpercantage_nontransparent

The table above only goes up to 5 goals for the loosing team, but for the sake of implementation it is easy to generalize the rule about how much the loosing team gains by scoring an additional goal (with the goal difference is the same). For example will the loosing team gain 0.9 extra winning percentage points when the goal difference is 2. Similarly the gain is 0.6 percentage points when the goal difference is 5.

Below is a R function to compute the win percentage for football matches. It takes as input two vectors with the number of goals scored by the two opponents and returns a vector of win percentages (a number between 0 and 1) for the first team.

winPercentageWWR <- function(team1Goals, team2Goals){
  #calculates the win percentage for team 1.

  stopifnot(length(team1Goals) == length(team2Goals))
  
  perc <- c(0.01, 0.02, 0.03, 0.04, 0.08, 0.15, 0.50, 0.85, 0.92, 0.96, 0.97, 0.98, 0.99)
  add <- c(0, 0.01, 0.009, 0.008, 0.007, 0.006, 0.005)
  
  goalDifferences <- (team1Goals - team2Goals)
  goalDifferences[goalDifferences < -6] <- -6
  goalDifferences[goalDifferences > 6] <- 6

  team1WinPercentage <- numeric(length=length(goalDifferences))

  for (idx in 1:length(goalDifferences)){

    team1WinPercentage[idx] <- perc[goalDifferences[idx]+7] -  
      sign(goalDifferences[idx])*min(team1Goals[idx], team2Goals[idx])*add[abs(goalDifferences[idx])+1]
  }
  return(team1WinPercentage) 
}

Elo ratings in football

I have previously written about some statistical methods for rating football teams and to predict the result of future matches. One was the last squares method and another was the Poisson regression method. None of these methods make good enough predictions. One problem with them is that they don’t incorporate a time perspective. Matches played a year ago is given equal importance as the most recent one. This could however be incorporated by weighing the the older matches less than newer matches. One other problem that I mentioned in the second post about Poisson regression is that teams are treated as categoricals which makes it hard to model the fact that a team’s ability changes over time.

One different kind of method that has been employed a lot in the recent years is the Elo rating system, which were originally developed for rating chess players. The method is rather simple, but I will not explain it in detail here since there are many good explanations of it elsewhere. Wikipedia has a very thorough coverage. The basic principle is that the difference in ratings between the two opposing teams provide a prediction for the result each game. The rating is then updated based on how the teams perform. If a team performs better than expected the rating increases, if they perform worse than expected the rating decrease. How much the rating changes depends on an update factor (often referred to as the K-factor).

Chess and football are of course different in many ways so the method for rating chess players is not directly suitable for rating football teams. The relative simplicity of the Elo system makes it easy to tweak and adjust to better fit football by incorporating things like home field advantage and goal difference. There are many sites around the Internet who provide different variants of Elo ratings, like the World Football Elo Ratings for national teams and Club Elo and Euro Club Index for club teams. FIFA even uses its own Elo system in its Womans World Ranking.

There has even been some research into different football rating systems. A paper titled The predictive power of ranking systems in association football (pdf) by Jan Lasek and others compared different rating systems. Their conclusion was that the different Elo type systems in general were better at predicting match outcomes than other types of rating systems.

I figured I wanted to implement a simple Elo rating system for rating football teams. There is already a package in R, PlayerRatings, which implements several different rating systems based on Elo. In my simple implementation there is no adjustment for goal difference, but I have support for home field advantage. All teams start with an initial rating of 1500. Here is what I got when I calculated the ratings for Premier League in November 2012 based on data going back to 1993. I used an update factor 24 without any home field advantage. There is no particular reason for this as I did this mostly as a proof of concept.

  Rating (November 2012)
Man United 1807
Man City 1767
Chelsea 1696
Arsenal 1658
Tottenham 1645
Everton 1640
Newcastle 1613
Fulham 1591
Liverpool 1567
West Brom 1562
Leeds 1552
Wigan 1543
Swansea 1526
Sunderland 1524
Stoke 1521
Middlesboro 1516
Norwich 1509
Aston Villa 1498
West Ham 1494
Birmingham 1493
Blackpool 1483
Ipswich 1481
Bolton 1479
Charlton 1470
Sheffield United 1458
Blackburn 1450
Reading 1448
Sheffield Weds 1447
Coventry 1447
Middlesbrough 1443
QPR 1440
Barnsley 1439
Portsmouth 1438
Southampton 1437
Oldham 1436
Crystal Palace 1433
Leicester 1433
Nott’m Forest 1430
Hull 1422
Burnley 1418
Wolves 1414
Wimbledon 1413
Watford 1411
Bradford 1406
Swindon 1404
Derby 1297

The table seems reasonable I think except for a couple of things. There is a problem related to relegation and promotion. Since I have used data back to 1993 every team who has played in the Premier League is given a rating. If a team is relegated to the Championship, their rating will no longer be updated. We can see that this creates some strange results. Take the two lowest rated teams for example. Derby has not been in the Premier League since the 2007-2008 season. Swindon, which is rated about 100 points higher than Derby, has not played in the Premier League since 1993-1994 season! Swindon now play in the fourth level of the English league system. So the ratings for the teams not in the Premier League should be considered invalid.

Relegation and promotion also creates a problem with inflated ratings. The Elo system is created so that the total number of points in the league should be constant. When a team is promoted they start with an initial rating of 1500, and if they later gets relegated they will probably have lost some of those points to the other teams in the league. In fact, we see that many of the teams with ratings less than 1500 no longer plays in the Premier League. The points they have lost are still in present in the league even though the team isn’t. This means that over time the average ratings of the teams in the league will increase.

The code I have written takes a data frame as input and works “out of the box” with data from football-data.co.uk. If you are going to use it yourself you have to make sure the data is sorted by date as the rating function just loops from top to bottom.

Here is how you can use it:

dta <- read.csv("yourdata.csv")
elo <- eloRating(data=dta)
print(elo)

And here is the code:


eloRating <- function(home="HomeTeam", away="AwayTeam", homeGoals="FTHG",
                      awayGoals="FTAG", data, kfactor=24, initialRating=1500,
                      homeAdvantage=0){
  
  #Make a list to hold ratings for all teams
  all.teams <- levels(as.factor(union(levels(as.factor(data[[home]])),
                                      levels(as.factor(data[[away]])))))
  
  ratings <- as.list(rep(initialRating, times=length(all.teams)))
  names(ratings) <- all.teams

  #Loop trough data and update ratings
  for (idx in 1:dim(data)[1]){
  
    #get current ratings
    homeTeamName <- data[[home]][idx]
    awayTeamName <- data[[away]][idx]
    homeTeamRating <- as.numeric(ratings[homeTeamName]) + homeAdvantage
    awayTeamRating <- as.numeric(ratings[awayTeamName])
    
    #calculate expected outcome 
    expectedHome <- 1 / (1 + 10^((awayTeamRating - homeTeamRating)/400))
    expectedAway <- 1 - expectedHome
    
    #Observed outcome
    goalDiff <- data[[homeGoals]][idx] - data[[awayGoals]][idx]
    if (goalDiff == 0){
      resultHome <- 0.5
      resultAway <- 0.5
    }
    else if (goalDiff < 0){
      resultHome <- 0
      resultAway <- 1
    }
    else if (goalDiff > 0){
      resultHome <- 1
      resultAway <- 0
    }
    
    #update ratings
    ratings[homeTeamName] <- as.numeric(ratings[homeTeamName]) + kfactor*(resultHome - expectedHome)
    ratings[awayTeamName] <- as.numeric(ratings[awayTeamName]) + kfactor*(resultAway - expectedAway)
  }
  
  #prepare output
  ratingsOut <- as.numeric(ratings)
  names(ratingsOut) <- names(ratings)
  ratingsOut <- sort(ratingsOut, decreasing=TRUE)

  return(ratingsOut) 
}

Predicting football results with Poisson regression pt. 2

In part 1 I wrote about the basics of the Poisson regression model for predicting football results, and briefly mentioned how our data should look like. In this part I will look at how we can fit the model and calculate probabilities for the different match outcomes. I will also discuss some problems with the model, and hint at a few improvements.

Fitting the model with R
When we have the data in an appropriate format we can fit the model. R has a built in function glm() that can fit Poisson regression models. The code for loading the data, fitting the model and getting the summary is simple:

#load data
yrdta <- read.csv(“yourdata.csv”)

#fit model and get a summary
model <- glm(Goals ~ Home + Team + Opponent, family=poisson(link=log), data=yrdta)
summary(model)

The summary function for fitting the model with data from Premier League 2011-2012 season gives us this (I have removed portions of it for space reasons):

(Edit September 2014: There was some errors in the estimates in the original version of this post. This was because I made some mistakes when I formated the data as described in part one. Thanks to Derek in the comments for pointing this out. )

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)          0.45900    0.19029   2.412 0.015859 *  
Home                 0.26801    0.06181   4.336 1.45e-05 ***
TeamAston Villa     -0.69103    0.20159  -3.428 0.000608 ***
TeamBlackburn       -0.40518    0.18568  -2.182 0.029094 *  
TeamBolton          -0.44891    0.18810  -2.387 0.017003 *  
TeamChelsea         -0.13312    0.17027  -0.782 0.434338    
TeamEverton         -0.40202    0.18331  -2.193 0.028294 *  
TeamFulham          -0.43216    0.18560  -2.328 0.019886 *
-----
OpponentSunderland  -0.09215    0.20558  -0.448 0.653968    
OpponentSwansea      0.01026    0.20033   0.051 0.959135    
OpponentTottenham   -0.18682    0.21199  -0.881 0.378161    
OpponentWest Brom    0.03071    0.19939   0.154 0.877607    
OpponentWigan        0.20406    0.19145   1.066 0.286476    
OpponentWolves       0.48246    0.18088   2.667 0.007646 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The Estimate column is the most interesting one. We see that the overall mean is e0.49 = 1.63 and that the home advantage is e0.26 = 1.30 (remember that we actually estimate the logarithm of the expectation, therefore we need to exponentiate the coefficients to get interpretable numbers). If we want to predict the results of a match between Aston Villa at home against Sunderland we could plug the estimates into our formula, or use the predict() function in R. We need to do this twice, one time to predict the number of goals Aston Villa is expected to score, and one time for Sunderland.

#aston villa
predict(model, data.frame(Home=1, Team="Aston Villa", Opponent="Sunderland"), type="response")
# 0.9453705 

#for sunderland. note that Home=0.
predict(model, data.frame(Home=0, Team="Sunderland", Opponent="Aston Villa"), type="response")
# 0.999 

We see that Aston Villa is expected to score on average 0.945 goals, while Sunderland is expected to score on average 0.999 goals. We can plot the probabilities for the different number of goals against each other:

AstonVillaSunderland1

We can see that Aston Villa has just a bit higher probability for scoring not goals than Sunderland. Sunderland has also just a tiny bit higher probablity for most other number of goals. Both teams have about the same probability of scoring exactly one goal. In general the pattern we see in the plot is consistent with what we would expect considering the expected number of goals.

Match result probabilities
Now that we have our expected number of goals for the two opponents in a match, we can calculate the probabilities for either home win (H), draw (D) and away win (A). But before we continue, there is an assumption in our model that needs to be discussed, namely the assumption that the goals scored by the two teams are independent. This may not be obvious since surely we have included information about who plays against who when we predict the number of goals for each team. But remember that each match is included twice in our data set, and the way the regression method works, each observation are assumed to be independent from the others. We’ll see later that this can cause some problems.

The most obvious way calculate the probabilities of the three outcomes is perhaps to look at goal differences. If we can calculate the probabilities for goal differences (home goals minus away goals) of exactly 0, less than 0, and greater than 0, we get the probabilities we are looking for. I will explain two ways of doing this, both yielding the same result (in theory at least): By using the Skellam distribution and by simulation.

Skellam distribution
The Skellam distribution is the probability distribution of the difference of two independent Poisson distributed variables, in other words, the probability distribution for the goal difference. R does not support it natively, but the VGAM package does. For our example the distribution looks like this:
AstonVillaSunderland2
If we do the calculations we get the probabilities for home win, draw, away win to be 0.329, 0.314, 0.357 respectively.

#Away
sum(dskellam(-100:-1, predictHome, predictAway)) #0.3574468
#Home
sum(dskellam(1:100, predictHome, predictAway)) #0.3289164
#Draw
sum(dskellam(0, predictHome, predictAway)) #0.3136368

Simulation
The second method we can use is simulation. We simulate a number of matches (10000 in our case) by having the computer draw random numbers from the two Poisson distributions and look at the differences. We get the probabilities for the different outcomes by calculating the proportion of different goal differences. The independence assumption makes this easy since we can simulate the number of goals for each team independently of each other.

 set.seed(915706074)
nsim <- 10000
homeGoalsSim <- rpois(nsim, predictHome) 
awayGoalsSim <- rpois(nsim, predictAway)
goalDiffSim <- homeGoalsSim - awayGoalsSim
#Home
sum(goalDiffSim > 0) / nsim #0.3275
#Draw
sum(goalDiffSim == 0) / nsim # 0.3197
#Away
sum(goalDiffSim < 0) / nsim #0.3528

The results differ a tiny bit from what we got from using the Skellam distribution. It is still accurate enough to not cause any big practical problems.

How good is the model at predicting match outcomes?
The Poisson regression model is not considered to be among the best models for predicting football results. It is especially poor at predicting draws. Even when the two teams are expected to score the same number of goals it rarely manages to assign the highest probability for a draw. In one attempt I used Premier League data from the last half of one season and the first half of the next season to predict the second half of that season (without refitting the model after each match day). It assigned highest probability to the right outcome in 50% of the matches, but never once predicted a draw.

Lets see at some other problems with the model and suggest some improvements.

One major problem I see with the model is that the predictor variables are categorical. This is a constraint that makes inefficient use of the available data since we get rather few data points per parameter (i.e per team). The model does for example not understand teams are more like each other than others and instead view each team in isolation. There has been some attempts at using Bayesian methods to incorporate information on which teams are better and which are poorer. Se for example this blog. If the teams instead could be reduced to numbers (by using some sort of rating system) we would get fewer parameters to estimate. We could then also incorporate an interaction term, something that is almost impossible with the categorical predictor variables we have. The interaction term in this case would be the effect of a team under or over estimating its opponent.

(As an aside, we could in fact interpret the coefficients in our model as a form of rating of a teams offensive and defensive strength)

Another way the model can be improved is to incorporate a time aspect. The most obvious way to do this is perhaps to weights to the matches such that more recent matches are more important than matches far back in time.

A further improvement would be to look at the level of different players, and not at a team as a whole. For example will a team with many injured players in a match most likely perform poorer than what you would expect. One could use weights to down weight the contribution of matches where this is a problem. A much more powerful idea would be to combine data on match lineup with a rating system for players. This could be used to infer a rating for the whole team in a specific match. In addition to correct for injured players it would also account for new players on a team and players leaving a team. The biggest problem with this approach is lack of available data in a format that is easy to handle.

I don’t think any of the improvements I have discussed here will solve the problem of predicting draws since it originates in the independent Poisson assumption, although I think they could improve predictions in general. To counter the problem of predicting draws I think a very different model would have to be used. I would also like to mention that the improvements I have suggested here are rather general, and could be incorporated in many other prediction models.

Predicting football results with Poisson regression pt. 1

I have been meaning to write about my take on using Poisson regression to predict football results for a while, so here we go. Poisson regression is one of the earliest statistical methods used for predicting football results. The goal here is to use available data to to say something about how many goals a team is expected to score and from that calculate the probabilities for different match outcomes.

The Poisson distribution
The Poisson distribution is a probability distribution that can be used to model data that can be counted (i.e something that can happen 0, 1, 2, 3, … times). If we know the number of times something is expected to happen, we can find the probabilities that it happens any number of times. For example if we know something is expected to happen 4 times, we can calculate the probabilities that it happens 0, 1, 2, … times.

It turns out that the number of goals a team scores in a football match are approximately Poisson distributed. This means we have a method of assigning probabilities to the number of goals in a match and from this we can find probabilities for different match results. Note that I write that goals are approximately Poisson. The Poisson distribution does not always perfectly describe the number of goals in a match. It sometimes over or under estimates the number of goals, and some football leagues seems fit the Poisson distribution better than others. Anyway, the Poisson distribution seems to be an OK approximation.

The regression model
To be able to find the probabilities for different number of goals we need to find the expected number of goals L (It is customary to denote the expectation in a Poisson distribution by the Greek letter lambda, but WordPress seem to have problems with greek letters so i call i L instead). This is where the regression method comes in. With regression we can estimate lambda conditioned on certain variables. The most obvious variable to look at is which team is playing. Manchester United obviously makes more goals than Wigan. The second thing we want to take into account is who the opponent is. Some teams are expected to concede fewer goals, while others are expected to let in more goals. The third thing we want to take into account is home field advantage.

Written in the language of regression models this becomes

log(L) = mu + home + teami + opponentj

The mu is the overall mean number of goals. The home is the effect on number of goals a team has by playing at home. Teami is the effect of team number i, opponentj is the effect of team j.

(Note: Some descriptions of the Poisson regression model on football data uses the terms offensive and defensive strength to describe what I have called team and opponent. The reason I prefer the terms I use here is because it makes it a bit easier to understand later when we look at the data set.)

The logarithm on the left hand side is called the link function. I will not dwell much on what a link function is, but the short story is that they ensure that the parameter we try to estimate don’t fall outside its domain. In this case it ensures us that we never get negative expected number of goals.

Data
In my example I will use data from football-data.co.uk. What data you would want to use is up to yourself. Typically you could choose to use data from the last year or the least season, but that is totally up to you to decide.

Each of the terms on the right hand side of the equation (except for mu) corresponds to a columns in a table, so we need to fix our data a bit before we proceed with fitting the model. Each match is essentially two observations, one for how many goals the home team scores, the second how many the away team scores. Basically, each match need two rows in our data set, not just one.

Doing the fix is an easy thing to do in excel or Libre Office Calc. We take the data rows (i.e. the matches) we want to use and duplicate them. Then we need to switch the away team and away goals columns so they become the same as the home team column. We also need a column to indicate the home team. Here is an example on how it will look like:

In the next part I will fit the actual model, calculate probabilities and describe how we can make predictions using R.

Is goal difference the best way to rank and rate football teams?

In my previous post i compared the least squares rating of football teams to the ordinary three points for a win rating. In this post I will look closer at how these two systems rank teams differently. I briefly touched upon the subject in the last post, were we saw that the two systems generally ranked the teams in the same order, with a few exceptions. We saw that Sunderland and Newcastle were the two teams in the 2011-2012 Premier League season who differed most in their ranking in the two systems. The reason for this was of course because the least squares approach is based on goal difference, while the points system is based only on match outcome. This means that teams who win a match by many goals will benefit more on the least squares ranking than on the points system. For example, a 3-0 win will count more than a 2-1 win when we use goal difference, but they will give the same number of points based on match outcome. This also holds if wee look at the loosing team; a 2-1 loss is better than a 3-0 loss.

It seems more intuitive to rank teams on a system based on goal difference (using least squares or some other method) than the tree points for a win system, especially when we remind ourself that it lacks any theoretical justification. Awarding three points for a win instead of two was not used before the 1980’s and were not used in the World Cup until 1994. The reason for introducing the three points system was to give the teams more incentive to win. Also, as far as I know, even the two points for a win lacks a theoretical basis as a way to measure teams strength. But even if the points system lack an underlying mathematical theory, it still could be a better system than a system based on goal difference for deciding the true strength of a team. A paper titled Fitness, chance, and myths: an objective view on soccer results by the two German physicists A. Hauer and O. Rubner compares the two systems using data from the German Bundesliga. They looked at each team in each season from the late 1980’s and calculated how much the teams goal difference and points correlated between the first and second half of a season. A higher correlation means that there is less chance involved in how the measure reflects a teams real strength. What they found was that goal difference was more correlated between the half-seasons than the 3- and 2 points for a win system.

However, this does not mean that goal difference is the best way to measure team strength. I would like to see if there are some other measures that correlate better between season halves. What first comes to mind is to look at ball possession or shots at target.

As a last note, even if goal difference has a better theoretical foundation as a measure of “who is the best”, I do not think that leagues and tournaments should quit the points system. It may very well be that the points system makes a football competition more interesting since it adds more chance to it.

Least squares rating of football teams

The Wikipedia article Statistical association football predictions mentions a method for least squares rating of football teams. The article does not give any source for this, but I found what I think may be the origin of this method. It appears to be from an undergrad thesis titled Statistical Models Applied to the Rating of Sports Teams by Kenneth Massey. It is not on football in particular, but on sports in general where two teams compete for points. A link to the thesis can be found here.

The basic method as described in Massey’s paper and the Wikipedia article is to use a n*k design matrix A where each of the k columns represents one team, and each of the n rows represents a match. In each match (or row) the home team is indicated by 1, and the away team by -1. Then we have a vector y indicating goal differences in each match, with respect to the home team (i.e. positive values for home wins, negative for away wins). Then the least squares solution to the system Ax = y is found, with the x vector now containing the rating values for each team.

When it comes to interpretation, the difference in least squares estimate for the rating of two teams can be seen as the expected goal difference between the teams in a game. The individual rating can be seen as how many goals a teams scores compared to the overall average.

Massey’s paper also discusses some extensions to this simple model that is not mentioned in the Wikipedia article. The most obvious is incorporation of home field advantage, but there is also a section on splitting the teams’ performances into offensive and defensive components. I am not going to go into these extensions here, you can read more about them i Massey’s paper, along with some other rating systems that are also discussed. What I will do, is to take a closer look at the simple least squares rating and compare it to the ordinary three points for a win rating used to determine the league winner.

I used the function I made earlier to compute the points for the 2011-2012 Premier League season, then I computed the least squares rating. Here you can see the result:

  PTS LSR LSRrank RankDiff
Man City 89 1.600 1 0
Man United 89 1.400 2 0
Arsenal 70 0.625 3 0
Tottenham 69 0.625 4 0
Newcastle 65 0.125 8 3
Chelsea 64 0.475 5 -1
Everton 56 0.250 6 -1
Liverpool 52 0.175 7 -1
Fulham 52 -0.075 10 1
West Brom 47 -0.175 12 2
Swansea 47 -0.175 11 0
Norwich 47 -0.350 13 1
Sunderland 45 -0.025 9 -4
Stoke 45 -0.425 15 1
Wigan 43 -0.500 16 1
Aston Villa 38 -0.400 14 -2
QPR 37 -0.575 17 0
Bolton 36 -0.775 19 1
Blackburn 31 -0.750 18 -1
Wolves 25 -1.050 20 0

It looks like the Least squares approach gives similar results as the standard points system. It differentiates between the two top teams, Manchester City and Manchester United, even if they have the same number of points. This is perhaps not so surprising since City won the league because of greater goal difference than United, and this is what the least squares rating is based on. Another, perhaps more surprising thing is how relatively low least squares rating Newcastle has, compared to the other teams with approximately same number of points. If ranked according to the least squares rating, Newcastle should have been below Liverpool, instead they are three places above. This hints at Newcastle being better at winning, but with few goals, and Liverpool winning fewer times, but when they win, they win with more goals. We can also see that Sunderland comes poor out in the least squares rating, dropping four places.

If we now plot the number of points to the least squares rating we see that the two methods generally gives similar results. This is perhaps not so surprising, and despite some disparities like the ones I pointed out, there are no obvious outliers. I also calculated the correlation coefficient, 0.978, and I was actually a bit surprised of how big it was.