The Dixon-Coles approach to time-weighted Poisson regression

In the previous blog posts about predicting football results using Poisson regression I have mostly ignored the fact that the data points (ie matches) used to fit the models are gathered (played) at different time points.

In the 1997 Dixon and Coles paper where they described the bivariate adjustment for low scores they also discussed using weighted maximum likelihood to better make the parameter estimates reflect the current abilities of the teams, rather than as an average over the whole period you have data from.

Dixon and Coles propose to weight the games using a function so that games are down weighted exponentially by how long time it is since they were played. The function to determine the weight for a match played is

\( \phi(t) = exp(-\xi t) \)

where t is the time since the match was played, and \(\xi\) is a positive parameter that determines how much down-weighting should occur.

I have implemented this function in R, but I have done a slight modification from the one from the paper. Dixon and Coles uses “half weeks” as their time unit, but they do not describe in more detail what exactly they mean. They probably used Wednesday or Thursday as the day a new half-week starts, but I won’t bother implementing something like that. Instead I am just going to use days as the unit of time.

This function takes a vector of the match dates (data type Date) and computes the weights according to the current date and a value of \(\xi\). The currentDate argument lets you set the date to count from, with all dates after this will be given weight 0.

DCweights <- function(dates, currentDate=Sys.Date(), xi=0){
  datediffs <- dates - as.Date(currentDate)
  datediffs <- as.numeric(datediffs *-1)
  w <- exp(-1*xi*datediffs)
  w[datediffs <= 0] <- 0 #Future dates should have zero weights

We can use this function to plot how the much weight the games in the past is given for different values of \(\xi\). Here we see that \(\xi = 0\) gives the same weight to all the matches. I have also set the currentDate as a day in April 2013 to illustrate how future dates are weighted 0.


To figure out the optimal value for \(\xi\) Dixon and Coles emulated a situation where they predicted the match results using only the match data prior to the match in question, and then optimizing for prediction ability. They don’t explain how exactly they did the optimizing, but I am not going to use the optim function to do this. Instead I am going to just try a lot of different values of \(\xi\), and go for the one that is best. The reason for this is that doing the prediction emulation takes some time, and using an optimizing algorithm will take an unpredictable amount of time.

I am not going to use the Dixon-Coles model here. Instead I am going for the independent Poisson model. Again, the reason is that I don’t want to use too much time on this.

To measure prediction ability, Dixon & Coles used the predictive log-likelihood (PLL). This is just the logarithm of the probabilities calculated by the model for the outcome that actually occurred, added together for all matches. This means that a greater PLL indicates that the actual outcomes was more probable according to the model, which is what we want.

I want to use an additional measure of prediction ability to complement the PLL: The ranked probability score (RPS). This is a measure of prediction error, and takes on values between 0 and 1, with 0 meaning perfect prediction. RPS measure takes into account the entire probability distribution for the three outcomes, not just the probability of the observed outcome. That means a high probability of a draw is considered less of an error that a high probability of away win, if the actual outcome was home win. This measure was popularized in football analytics by Anthony Constantinou in his paper Solving the Problem of Inadequate Scoring Rules for Assessing Probabilistic Football Forecast Models. You can find a link to a draft version of that paper on Constantinou’s website.

I used data from the 2005-06 season and onwards, and did predictions from January 2007 and up until the end of 2014. I also skipped the ten first match days at the beginning of each season to avoid problems with lack of data for the promoted teams. I did this for the top leagues in England, Germany, Netherlands and France. Here are optimal values of \(\xi\) according to the two prediction measurements:

England 0.0018 0.0018
Germany 0.0023 0.0023
Netherlands 0.0019 0.0020
France 0.0019 0.0020

The RPS and PLL mostly agree, and where they disagree, it is only by one in the last decimal place.

Dixon & Coles found an optimal value of 0.0065 in their data (which were from the 1990s and included data from the top four English leagues and the FA cup), but they used half weeks instead of days as their time unit. Incidentally, if we divide their value by the number of days in a half week (3.5 days) we get 0.00186, about the same I got. The German league has the greatest optimum value, meaning historical data is of less importance when making predictions.

An interesting thing to do is to plot the predictive ability against different values of \(\xi\). Here are the PLL (adjusted to be between 0 and 1, with the optimum at 1) for England and Germany compared, with their optima indicated by the dashed vertical lines:

Prediction ability for values of Xi Engand Germany

I am not sure I want to interpret this plot too much, but it does seems like predictions for the German league are more robust to values of \(\xi\) greater than the optimum, as indicated by the slower decline in the graph, than the English league.

So here I have presented the values of \(\xi\) for the independent Poisson regression model, but will these values be the optimal for the Dixon & Coles model? Probably not, but I suspect there will be less variability between the two models than between the same model fitted for different leagues.

53 thoughts on “The Dixon-Coles approach to time-weighted Poisson regression

  1. Could you show how you could include this into the existing Dixon and Coles model you have produced please? I’m not sure how you add it in and I’m interested to see the results.

    • It is really simple actually, just multiply the weights with the log likelihood for each match, before you sum them together. You can replace the original log likelihood function with this:

      DClogLikWeighted <- function(y1, y2, lambda, mu, rho=0, weights=NULL){
        #rho=0, independence
        #y1 home goals
        #y2 away goals
        loglik <- log(tau(y1, y2, lambda, mu, rho)) + log(dpois(y1, lambda)) + log(dpois(y2, mu))
        if (is.null(weights)){
        } else {

      It should be obvious, but anyway: Give your weights (calculated by the DCweights function, for example) to the weights argument. If you don’t give it any weights it will give equal weight to all matches, i.e. the same results as before.

      • Hi, there. I have read many of your article. But I don’t understand this one. So, do you mean I should type “DCweights” instead of “NULL” ?

        I did it. I put the function DCweights right after the function tau, and changed the DCloglik to the function DCloglikWeighted and also the NULL to DCweights.

        It gave me this:

        > HomeWinProbability
        [1] NA
        > DrawProbability
        [1] NA
        > AwayWinProbability
        [1] NA

        please help.

          • You should compute the weights first, for example like this:
            myweights <- DCweights(dates, xi=0.0018) This creates a vector with the weights that you can pass to the DClogLikWeighted function: DClogLikWeighted(...., weights=myweights)

      • However, how to input dates from csv file? Should we make a matrix like attack.p in DCoptimFn?

        I did it that way but it doesn’t work.

        DCmodelData <- function(df){

        hm <- model.matrix(~ HomeTeam – 1, data=df, contrasts.arg=list(HomeTeam='contr.treatment'))
        am <- model.matrix(~ AwayTeam -1, data=df)
        team.names <- unique(c(levels(df$HomeTeam), levels(df$AwayTeam)))
        ax = model.matrix(~ Date -1,data=df)


        DCoptimFn <- function(params, DCm){

        home.p <- params[1]
        rho.p <- params[2]

        nteams <- length(DCm$teams)
        ndates <- length(DCm$dates)

        attack.p <- matrix(params[3:(nteams+2)], ncol=1)
        defence.p <- matrix(params[(nteams+3):(nteams+2+nteams)], ncol=1)
        da <- matrix(params[(nteams+nteams+3):length(DCm$dates)],ncol=1)

        lambda <- exp(DCm$homeTeamDM %*% attack.p + DCm$awayTeamDM %*% defence.p + home.p)
        mu <- exp(DCm$awayTeamDM %*% attack.p + DCm$homeTeamDM %*% defence.p)
        dates <- (DCm$dates %*% da)

        xi <-0.0018
        datediffs <- as.Date(dates,"%d%b%Y") – as.Date("2013-12-30")
        datediffs <- as.numeric(datediffs *-1)
        w <- exp(-1*xi*datediffs)
        w[datediffs <= 0] <- 0 #Future dates should have zero weights

        DClogLikWeighted(y1=DCm$homeGoals, y2=DCm$awayGoals, lambda, mu, rho.p, w) * -1

        • You don’t really have to do anything special with the dates, just pass the column to the as.Date function, with the appropriate format argument. This is what is already done in the command that creates the datediff variable. You don’t have to use the model.matrix argument or do any matrix multiplication. So your da and (DCm$dates %*% da) stuff is not really needed. In the way you have set it up you can just do like this in the DCmodelData function:

          ax <- df$Date

          • I have problems to compute THIS:
            myweights <- DCweights(dates, xi=0.0018)

            It gives me this error:
            Error in DCweights(dates, xi = 0.0018) : 'dates' object not found

          • It says the ‘dates’ object not found, which means you have not defined a variable called ‘dates’.

  2. Would a Pareto decay model not be more representative of the recent value of the team strengths? Exponential decay places a lot of emphasis on form, which doesn’t take into account the varying strengths of opposition?

    Have you thought about modelling adjustments for strength of schedule ?

    • I have been thinking a bit about other possible weighting schemes, and I will definitely look into the Pareto function. Thanks for the tip!

      One thing I tried was incorporating seasons into the weights. The idea was that the current season should count more than the previous ones, since I thought data from 5 months back would less informative if that was in the previous season than in the current one. What I did was just multiplying the weights in the previous seasons by a number less than 1 (and adjusting the xi parameter a bit), but at least for the Premier League it didn’t improve the predictions.

      I think these regression models incorporate strength of schedule pretty good already, since the parameter estimates relies on all the data. I am not even sure the model could not incorporate strength of schedule, since if there were two groups of teams where no team in one group had played any team in the other group, I think the model would be unidentifiable. I have been thinking about looking into the “connectedness” between groups of teams (i.e. from different leagues) some time in the future.

  3. Did you find that adding the time weightings yielded more accurate predictions vs the independent Poisson regression model with the Dixon and Coles low scoring adjustment?

    • I haven’t done any extensive testing of the different models, so I am not quite sure. I hope to get around to it sometime in the near future.

  4. Just out of curiosity (I know you didn’t intend to maximise xi within the function because it takes time), but how would you go about it?

    • I just tried every value of xi between 0 and 0.0030, incremented by 0.0001 (R command: seq(0,0.0030, 0.0001)). Then I calculated the RPS for all 31 values of xi and found which value of xi gave the smallest one.

      • I also wondered how you implemented that. If I understood correctly, you have to refit the model each matchday and compute the probability of each outcome. Then you have to do the whole process for different values of xi and compare which xi yields the highest PPL. But there is no chance for me to see an easy implementation of that <.<

        • Yes that’s correct. You have to do this to get a realistic prediction. You should fit the model including the most recent data, and not having any future data. The implementation of that requires two loop constructs. The inner loop iterates over all match dates and does the model fitting and predictions, the outer loop goes over all values of xi.

  5. Hi,

    Great posts! Would you be able to show code for calculating the ranked probability score please? I am currently using the Brier scores but it means calculating for win, loss and draw.

  6. Hi,
    first of all: great articles, very clear and straight to the point!

    I tried to implement this last step of the model but I think I’m missing something. I calculate the weights outside the optimization function, and pass them as parameters inside par.inits. I think that by doing so I implicitly force the weights to be optimized too…

    DCoptimFnWeighted <- function(params, DCm){
      home.p <- params[1]
      rho.p <- params[2]
      nteams <- length(DCm$teams)
      attack.p <- matrix(params[3:(nteams+2)], ncol=1)
      defence.p <- matrix(params[(nteams+3):(2*nteams+2)], ncol=1)
    weights <- matrix(params[(2*nteams+3):length(params)], ncol=1)
      lambda <- exp(DCm$homeTeamDM %*% attack.p + DCm$awayTeamDM %*% defence.p + home.p)
      mu <- exp(DCm$awayTeamDM %*% attack.p + DCm$homeTeamDM %*% defence.p)
        DClogLikWeighted(y1=DCm$homeGoals, y2=DCm$awayGoals, lambda, mu, rho.p, weights) * -1

    the function DClogLikWeighted is the one you suggested.

    Can you please tell me where I am wrong? How should I pass the weights to the LogLikWeighted function?

    thank you

    • Thank you!

      From the code you posted it seems like you try to optimize all the weights. In the model estimation the weights are treated as given, ie they are not to be estimated. If you try to minimize the negative weighted log-likelihood functions the weights will all be estimated to -Inf or something. That is why you should pass the weights you want to use (those calculated by the DCweights function, for example) directly to the likelihood function, and not treat them as parameters to be estimated. To find the optimal weights from the DCweights function I did what Dixon and Coles did in their paper: Emulate a real prediction situation where you try to predict the next match using the data available up to that point in time, and evaluate the prediction accuracy for different values of the dumping factor xi.

    • You should pass a vector of the Date class. You would typically have a character vector with dates, and you can convert this to a proper date vector with the as.Date() function.

          • Hi, I tried that but it is giving me the following error: Error in `-.Date`(dates, as.Date(currentDate)) :
            can only subtract from “Date” objects
            > Below is my current code.

            my_dates <- c('2016-08-20', '2016-06-11')
            myweights <- DCweights(my_dates, xi = 0.0018)
            DClogLikWeighted(y1 = DCm$homeGoals, y2 = DCm$awayGoals, lambda, mu, rho.p, myweights) * -1

          • You need to pass the output from the as.Date function to a variable, or inside the function call argument. It could be like this:

            my_dates <- c('2016-08-20', '2016-06-11') myweights <- DCweights(as.Date(my_dates), xi = 0.0018)

  7. Hello,

    First of all, I apologise for the late comment. Secondly, your posts have helped me a lot and I have made a similar model in Excel (I know Excel isn’t the best tool for this kind of stuff … however, I’m doing my A-Levels and it is what was easiest at the time since I have no coding experience/knowledge).

    Could you clarify how exactly you calculate ? please? My spreadsheet is set up with a few seasons of results with corresponding dates but I am not too sure how I can find the value for ?.

    Sorry if I sound stupid here, I only have basic knowledge compared to you guys as I start university later this year.



    • Im sorry, but it looks like some of your comment is missing. What is it that you don’t know how to calculate? The xi? The ranked probability score?

      • Hello,

        I realised I added in a random word at the bottom somehow. Sorry for the confusion there!
        The part I am struggling with is determining a value for ? in the following formula:


        For my t I’m using says since last match. But as I’m working in Excel I can’t figure out how to get a value for ?.

        Hope this clears it up, and thank you for your response.

        • The way I do it is for each match day, I fit the model for all available previous data, and predict the results on the games on that day. I do this for a couple of seasons, say. I do thins for several values of xi, and select the one that gives best predictions. This can take a few hours when I do it in R.

  8. Hi, sorry but i don’t understand how you calculate the best value for xi. I try with doing a cicle where for every value of xi from 0.0001 to 0.02 (increasing by 0.0001), i get the estimates from the model, then i calculate the value of the PLL, but the PLL is always decreasing. Any suggestion? there is the code:

    #function to create the thetea Home, draw, away

    #function for estimate the probabilties
    #p_match result return a vector with the pWin, pDraw, pLoss
    for(i in 1:k){



    It will be very helpfull for me, i’m a student and i’m writing about this topic on my final review. Hope for reply, thank you.

    • Sorry there were an error in the code i posted before…
      this is the main part: i maximezed the loglik function (it is sum(phi* logLikDixonColes) ) and the i calculate the S value (the PPL)



  9. Hi, first of all thanks for your work!
    Can you show how you calculate the xi optim value?
    I tried wit doing 2 loop: One that just change the value of xi and in the inner loop I optimezd the model for each match day and calculate the probability of outcome, then with all the probability I calculate the value of the PPL.
    But the series of PPL that I get is not regular, it is first increasing then decresing and again increasing and decreasing and so for all the range of value of xi I tried (about 0.0001 to 0.02).
    Am I missing something?

    • They way you describe your procedure seems right, so my guess is that you have a introduced a bug somewhere, where you perhaps overwrite some results, or you index something wrong or something.

      • I checked my code but i’m still blocked. Is there any possibility you can show your code, please? it would be very helpfull

        • I don’t think that would be very helpful. There are so many possible way of doing this, the data could be coded in different ways, and so on, so code for doing this would not be very informative, I think.

  10. Hey, was just wondering if the inclusion of the time weighting would influence the parameter calculations of if it is just in use for the predictions?

    • The time weights are only used in the parameter estimation. They are not used when calculating the predictions.

      • When I change the value of xi it doesn’t change the value of my parameter estimations? I’ve used your code

        tau <- Vectorize(function(x, y, lambda, mu, rho){
        if (x == 0 & y == 0){return(1 – (lambda*mu*rho))
        } else if (x == 0 & y == 1){return(1 + (lambda*rho))
        } else if (x == 1 & y == 0){return(1 + (mu*rho))
        } else if (x == 1 & y == 1){return(1 – rho)
        } else {return(1)}

        DCweights <- function(prem15, currentDate, xi){
        datediffs<-difftime(strptime(prem15$Date, format ="%d/%m/%y"), strptime(currentDate, format="%d/%m/%y"), units="days")
        w <- exp(-1*xi*datediffs)
        w[datediffs <= 0] <- 0 #Future dates should have zero weights

        DClogLikWeighted <- function(x, y, lambda, mu, rho=0.06, weights=weights){
        loglik<-sum(log(tau(x, y, lambda, mu, rho)) + log(dpois(x, lambda)) + log(dpois(y, mu)))

        DCoptimFn <- function(params, DCm){
        home.p <- params[1]
        rho.p <- params[2]
        nteams <- length(DCm$teams)
        attack.p <- matrix(params[3:(nteams+2)], ncol=1)
        defence.p <- matrix(params[(nteams+3):length(params)], ncol=1)

        lambda <- exp(DCm$homeTeamDM %*% attack.p + DCm$awayTeamDM %*% defence.p + home.p)
        mu <- exp(DCm$awayTeamDM %*% attack.p + DCm$homeTeamDM %*% defence.p)

        DClogLikWeighted(x=DCm$homeGoals, y=DCm$awayGoals, lambda, mu, rho.p, weights=weights) * -1

        DCattackConstr <- function(params, DCm, …){
        nteams <- length(DCm$teams)
        attack.p <- matrix(params[3:(nteams+2)], ncol=1)
        return((sum(attack.p) / nteams) – 1)

        weights<-DCweights(prem15, date, xi=0.5)

        #initial parameter estimates – make sure the length is the same prem thing
        attack.p <- rep(0.01, length(unique(prem15DC$teams)))
        defence.p <- rep(-0.08, length(unique(prem15DC$teams)))
        home.p <- 0.06
        rho.init <- 0.06
        par.inits <- c(home.param, rho.init, attack.p, defence.p)
        names(par.inits) <- c('HOME', 'RHO', paste('Attack', prem15DC$teams, sep='.'), paste('Defence', prem15DC$teams, sep='.'))

        test<-auglag(par=par.inits, fn=DCoptimFn, heq=DCattackConstr, DCm=prem15DC)

        Where prem15 is the 2014/15 season data, if I change the value of xi the parameters are the same?

        • What I mean is that when you multiply by the weights in the DCLogLikWeighted function, yes the resulting value will be lower in the Optim Function, but surely the same values for the paremeters will minimise this too?

          • The parameters should change when you change xi. In your code you have a large value of xi (0.5), so small changes is that might not change the parameter estimates much when you have data from only one season.

          • Apologies, the xi in there is incorrect to what I used, I have tried a loop from 0.01 to 0.06 using the code
            for(i in seq(0.001,0.04, 0.001))
            weights15<-DCweights(prem15, date, xi=i)
            test<-auglag(par=par.inits, fn=DCoptimFn, heq=DCattackConstr, DCm=prem15DC)
            ActualResultsProbs(prem15) #This produces the value for the pll
            and there are no changes to the parameters, they all produce for example 1.40 for Arsenal using the full 2014/15 season, simply outputting smaller fvals of the output of the auglag.
            Any idea why it doesn't change?

          • It seems like you don’t actually pass weights15 to DCoptimfun via auglag.

          • I don’t understand why the output of the optim function would change as the weights change though? For example, when I use xi=0.001 I get an output once maximised of 216100, but when I use xi=0.005 I get 241200. However these use the same values for the parameters.

          • Im not sure i follow you. Do you think it is strange that a weighted sum changes as the weights change?

Leave a Reply

Your email address will not be published. Required fields are marked *