I have written a lot about different models for forecasting football results, and provided a lot of R code along the way. Especially popular are my posts about the Dixon-Coles model, where people still post comments, four years since I first wrote them. Because of the interest in them, and the interest in some of the other models I have written about, I decided to tidy up my code and functions a bit, and make an R package out of it. The result is the goalmodel R package. The package let you fit the ordinary Poisson model, which was one of the first models I wrote about, the Dixon-Coles model, The Negative-Binomial model, and you can also use the adjustment I wrote about in my previous update.

The package contains a function to fit the different models, and you can even combine different aspects of the different models into the same model. You can for instance use the Dixon-Coles adjustment together with a negative binomial model. There is also a range of different methods for making prediction of different kinds, such as expected goals and over/under.

The package can be downloaded from github. It is still just the initial version, so there are probably some bugs and stuff to be sorted out, but go and try it out and let me know what you think!

Thanks for making this work public. I’m new to both sports betting and R, but I did science in uni back in the day, so I will knock the rust off and try my hand at modeling. ðŸ™‚

Would the package be useful for predicting other sports like hockey and basket?

Playing around with the goalmodel package for English championship. Can we make a weighted Rue Salvesen model with weights from the weights_dc function, like this?

#

# Weighted Rue-Salvesen model (rsw)

#

my_weights <- weights_dc(championship$Date, xi=0.0019)

length(my_weights)

plot(championship$Date, my_weights)

gm_res_rsw <- goalmodel(goals1 = championship$hgoal, goals2 = championship$vgoal,

team1 = championship$home, team2=championship$visitor,

rs=TRUE, weights = my_weights)

summary(gm_res_rsw)

Yes, you can use weights for all types of models. However, read my blog post about the Rue-Salvesen adjustment, and how it might not be a good idea to actually fit the model with the adjustment. Also take a look at the github page about how you can set the RS adjustment parameter after you have fit the default (or Dixon-Coles) model.

Thanks! I have now read up on parameter setting and tested it.

Forgive me for being an R newb, but how would you filter for matches between two dates?

I take it that we load up multiple seasons like this (let’s do 2011-2015):

# Load data from English Premier League, 2011/2012 to

# 2015/2016 season.

england %>%

filter(Season %in% c(2011):c(2015),

tier==c(1)) %>%

mutate(Date = as.Date(Date)) -> england_2011_2015

Let’s say we want to predict a match within this dataset, using all matches played up to that point. How do we specify a sub-dataset between two dates? Specifically, between the 1st game of 2011/2012 season and some date before the end of the 2015/2016 season?

Thanks in advance and sorry for the nag. ðŸ˜‰

Amazing package! Why do you give it for free? And how much money did you make already?

One question: Why don’t you use the derivatives in optimization?

The optim function use the finite differences method to compute derivatives, unless a function that computes these is provided. Deriving the derivatives for all the models in the package is boring and difficult, so I haven’t done it.

This looks amazing and exactly what I’ve been looking for, thanks!

Sorry for being a pain though but do you know of any step by step guides for using R? I’m a complete novice and have figured the odd thing out through trial and error / Google but am struggling to get this to work. Apologies again and keep up the good work

Great post thanks. HAve you covered how you apply time weight in simple poisson distribution?

Do you apply it to log-likelihood function as do Dixon Coles?

If so, you don;y use built in glm function in R, but you build log-likelihood and minimize it?

thanks,

I have blogged about the time-weighting, using ordinary Poisson regression, here: http://opisthokonta.net/?p=1013

If you use the built-in glm function, you just use the weight argument. The goalmodel package don’t use the glm function, but has its own implementation of the poisson log-likelihood, which can be weighted in the same manner as the DC likelihood.

Hi, I’ve never used R before so wouldn’t even know where to start. I’ve been moedlling football games from many leagues using expected goals that I scrape and then use Poisson to get the probabilities for the upcoming games. Would this tool help me make it quicker/better ? Is it possible to implement expected goals data into etc. ? Sorry for all this questions but I’m new to the programing language.

I am not sure if it will be quicker or better using this package, but you can use expected goals as input. I actually did that in this post: http://opisthokonta.net/?p=1760

great content keep up the great work!! Why dont you make a youtube video about your github model and how to use it?

Hi!

I’m having some problems with a gaussian model. It gives very big numbers for attack/defence parameters.. This happens usually for the first team in the list. So if i rename “Arsenal” to “Barsenal” for example, the problem would move to Aston Villa here..

Model Gaussian

Log Likelihood -1699.37

AIC 3588.74

R-squared NA

Parameters (estimated) 95

Parameters (fixed) 0

Team Attack Defense

Arsenal -13.21 0.10

Aston Villa 0.40 -0.11

Barnsley 0.03 -0.10

Birmingham -0.00 -0.08

……………………………………………………..

Maybe a bug in goalmodel? I tried to locate it but i’m more into C# than R.

But anyway thanks for a great package! Its really helpful!

You have a lot of parameters it seems. Could there be that some teams have very few games in your data? Also, have you used the new version 0.2? It should be more stable.

I updated to 2.0, cannot get gaussian model to work at all anymore. Even with simple data. It always gives very high intercept.