I recently read an interesting paper that was published last year, by Igor Barbosada Costa, Leandro Balby Marinho and Carlos Eduardo Santos Pires: Forecasting football results and exploiting betting markets: The case of “both teams to score. In the paper they try out different approaches for predicting the probability of both teams to score at least one goal (“both teams to score”, or BTTS for short). A really cool thing about the paper is that they actually used my goalmodel R package. This is the first time I have seen my package used in a paper, so im really excited bout that! In addition to using *goalmodel*, they also tried a few other machine learning approaches, where insted of trying to model the number of goals by using the Poisson distribution, they train the machine learning models directly on the both-teams-to-score outcome. They found that both approaches had similar performance.

I have to say that I personally prefer to model the scorelines directly using Poisson-like models, rather than trying to fit a classifier to the outcome you want to predict, whether it is over/under, win/draw/lose, or both-teams-to-score. Although I can’t say I have any particularly rational arguments in favour of the goal models, I like the fact that you can use the goal model to compute the probabilities of any outcome you want from the same model, without having to fit separate models for every outcome you are interested in. But then again, goalmodels are of course limited by the particlar model you choose (Poisson, Dixon-Coles etc), which could give less precice predictions on some of these secondary outcomes.

Okay, so how can you compute the probability of both teams to score using the Poisson based models from *goalmodel*? Here’s what the paper says:

This is a straightforward approach where they have the matrix with the probabilities of all possible scorelines, and then just add together the probabilities that correspond to the outcome of at least one team to score no goals. And since this is the exact opposite outcome (the complement) of both teams to score, you take one minus this probability to get the BTTS probability. I assume thay have used the predict_goals() function to get the matrix in question from a fitted goal model. In theory, the Poisson model allows to an infinite amount of goals, but in practice it is sufficient to just compute the matrix up to 10 or 15 goals.

Heres a small self-contained example of how the matrix with the score line probabilities is computed by the predict_goals() function, and how you can compute the BTTS probability from that.

# Expected goals by the the two oppsing teams. expg1 <- 1.1 expg2 <- 1.9 # The upper limit of how many goals to compute probabilities for. maxgoal <- 15 # The "S" matrix, which can also be computed by predict_goals(). # Assuming the independent Poisson model. probmat <- dpois(0:maxgoal, expg1) %*% t(dpois(0:maxgoal, expg2)) # Compute the BTTS probability using the formula from the paper. prob_btts <- 1 - (sum(probmat[2:nrow(probmat),1]) + sum(probmat[1,2:ncol(probmat)]) + probmat[1,1])

In this example the probability of both teams to score is 56.7%.

In many models, like the one used above, it is assumed that, the goal scoring probabilites for the two oppsing teams are *statistically independent* (given that you provide the expected goals). For this type of model there is a much simpler formula for computing the BTTS probability. Again, you compute the probability of at each of the teams to not score, and then take 1 minus these probabilities. And the probability of at least one team to score is just the product of the probability of both teams to score. In mathematical notation this is

where X is the random variable for the number of goals scored by the first team, and Y is the corresponding random variable for the second team. The R code for this formula for the independent Poisson model is

prob_btts_2 <- (1 - dpois(0, lambda = expg1)) * (1 - dpois(0, lambda = expg2))

which you can verify gives the same result as the matrix-approach. This formula only works with the statistical independence assumption. The Dixon-Coles and bivariate Poisson (not in the *goalmodel* package) models are notable models that does not have this assumption, but instead have some dependence (or correlation) between the goal scoring probabilities for the two sides.

I have also found a relatively simple formula for BTTS probability for the Dixon-Coles model. Recall that the Dixon-Coles model applies an adjustment to low scoring outcomes (less than two goals scored by either team), shifting probabilities to (or from) 0-0 and 1-1 to 1-0 and 0-1 outcomes. The amount of probability that is shifted depends on the parameter called rho. For the BTTS probability, it is the adjustment to the probability of the 1-1 outcome that is of interest. The trick is basically to subtract the probability of the 1-1 outcome for the underlying independent model without the Dixon-Coles adjustment, and then add back the Dixon-Coles adjusted 1-1 probability. The Dixon-Coles adjustment for the 1-1 outcome is simply 1 – rho, which does not depend on the expected goals of the two sides.

Here is some R code that shows how to apply the adjustment:

# Dixon-Coles adjustment parameter. rho <- 0.13 # 1-1 probability for the independent Poisson model. p11 <- dpois(1, lambda = expg1) * dpois(1, lambda = expg2) # Add DC adjusted 1-1 probability, subtract unadjusted 1-1 probability. dc_correction <- (p11 * (1-rho)) - p11 # Apply the corrections prob_btts_dc <- prob_btts_2 + dc_correction

If you run this, you will see that the BTTS probability decreases to 55.4% when rho = 0.13.

I have added two functions for computing BTTS probabilities in the new version 0.6 of the goalmodel package, so be sure to check that out. The predict_btts() function works just like the other predict_* functions in the package, where you give the function a fitted goalmodel, together with the fixtures you want to predict, and it gives you the BTTS probability. The other function is pbtts(), which works independently of a fitted goalmodel. Instead you just give it the expected goals, and other paramters like the Dixon-Coles rho parameter, directly.

Hi,

I like this article but I think there is a typo on line 10 of the R code.

Should you not be using expg2 rather than using expg1 twice? i.e.

probmat <- dpois(0:maxgoal, expg1) %*% t(dpois(0:maxgoal, expg2))

Keep up the good work.

Regards,

John

You are absolutely right! Thanks for pointing it out, I have fixed it now.

Hello!

Is it possible to predict the position of the team in the standings?

The idea is very simple. Let’s say there are 38 games in a soccer tournament. We already know how many wins / losses / draws there were for each specific team (goal difference and so on). How to use it to predict the position of the team in the standings?

Im am not sure I understand your question. Could you elaborate?

We know that there are 38 rounds in the tournament. Imagine that there is a 5th round. We know the number of points, goals, etc. each specific team. We have historical data, comparing which we can assume the real strength of the team in this season

In other words, the number of rounds is limited (38). This means that with each round the strength of the team becomes clearer and clearer to us. The question is how do we predict the final position of the team in the table. This would solve all problems.

If you have a model fitted after the 5th round, you can use that to make predictions for the rest of the games of the tournament. If you compute the probabilities for 1×2, you can use these to calculate the expected points for each game and sum them together for each team, and then add that to the points they already have gained, to get the predicted final number of points.

What model can you recommend to fitted?

A Poisson model would be a good start I think.

Ot quite relevant but do you plan on exploring the following models?

1. Biv poisson with diagonal inflation (karlis, ntzoufras)

2. Birth Process (Dixon robinson )

Maybe!

2. Anybody has data about score line in games? It should be easy to set up using a cox poisson model, standard estimation in glmnet in R.

2. Anybody has data about score line in games? It should be easy to set up using a cox poisson model, standard estimation in glmnet in R.

Do you mean the timing of the goals? I don’t know of any easy to use datasets, but the stasbomb open data repositry should have the timing of goals. https://github.com/statsbomb

Hi!

Are you planning to work more on this package? Your package has been very handy to build a simple model very quickly. I was wondering whether there are going to be possibilities to measure probabilities for handicaps? This would be helpful e.g. when big teams are playing rather small teams. 🙂

I do occationally work a bit on the package, but I havent looked into handicaps that much, so I dont think it will come anything in the near future.

Hey man! appreciate this package! Great work, although it seems like the package “engsoccerdata” is not available anymore, which makes me unable to install the “goalmodel” package, do you have any solution for this?

Thanks

David

It seems it is not available from CRAN anymore, but you can get it from here:

https://github.com/jalapic/engsoccerdata#installation

I will see if i can remove the dependency soon, but for now, you need to install it from github.