My package for converting bookmaker odds into probabilities is now on available from CRAN. The package contains several different conversion algorithms, which are all accessible via the implied_probabilities() function. I have written an introduction on how you can use the package here, together with a description of all the methods and with references to papers. But I also want to give some background to some of the methods here on the blog as well.

In statistics, an odd is usually taken to mean the inverse of a probability, that is 1/p, but in the betting world different odds formats exists. As usual, Wikipedia has a nice overview of the different formats. In the *implied* package, only inverse probability odds are allowed as inputs, which in betting are called decimal odds.

Now you might think that converting decimal odds to probabilities should be easy, you can just use the definition above and take the inverse of the odds to recover the probability. But it is not that simple, since in practice using this simple formula will give you improper probabilities. They will not sum to 1, as they should, but be slightly larger. This gives the bookmakers an edge and the probabilities (which aren’t real probabilities) can not be considered fair, and so different methods for correcting this exists.

Some methods uses different types of regression modelling combined with historical data to estimate the biases in the different outcomes. This is for example the case in the paper *On determining probability forecasts from betting odds* by Erik Ĺ trumbelj. Anyway, the implied package does not include these kinds of methods. The reason I wanted to mention this paper is that this was where I first read about Shin’s method for the first time.

All the methods in the package are what I call one-shot methods. The conversion of a set of odds for a game only relies on the odds them self, and not on any other data. This is deliberate choice, since I didn’t want to make a modelling package, since that would be much more complicated.

Many of the methods in the package comes are described in the Wisdom of the Crowd document by Joseph Buchdahl, and a review paper by Clarke et al (*Adjusting Bookmakerâ€™s Odds to Allow for Overround*).

Many of the methods in the package can be described as *ad hoc* methods. They basically use a simple mathematical formula that relates the true underlying probabilities to the improper probabilities given by the bookmakers odds. Then this formula is used to find the true probabilities so that they are proper (sum to 1) while also recovering the improper bookmaker probabilities.

A few other methods in the package are more theory based, like Shin’s method, and I find these methods really interesting. Shin’s method imagine that there are two types of bettors. The first type is the typical bettor, and the sum of bets by this type follows the “wisdom of the crowd” pattern which should reflect the true ncertainty of the outcome given the publicly available information. Then there is a second type of bettor, which has inside information and always bets on the winning outcome. However, the bookmaker don’t know what type of bettor the individual bettors are, and only observes the mixture of the two types. Here is the interesting part: By assuming the bookmakers know that there are two types of bettors, and that the bookmakers seek to maximize their profits, Shin was able to derive some complicated formulas that relate the true underlying “wisdom of the crowds” probabilities and the bookmakers odds. These formulas can be used in the same way as the ad hoc methods to find the underlying probabilities.

A natural question question is what method gives the most realistic probabilities? There is no definite answer to this, and different methods will be best in different markets and settings. You need to figure this out for yourself.

I am currently working on some new methods inspired by Shin’s framework which I hope to write about later. Shin’s work was mostly done in the context of horse racing, where there is realistic that some bettors have inside information. I hope to develop a method that is more relevant for football.

Thank you very much on this.

really nice, it’s the one I have been looking for lately. Building logistic regression model in r, and was wondering why sometimes after adding up prob I get value >1. So will use it for sure

Thank you ! Really good stuff

Let me ask you a question please. Do you have an idea (or maybe an URL source) of how we can predict next match goals in football using decision trees or random forest ?

I have seen a few cases of it, but the only one I could find right now was this https://arxiv.org/abs/1806.03208

Oh, and I almost forgot, I wrote a post about it some time ago https://opisthokonta.net/?p=809

Why not do a regression and fit with the reality?

Hi, really nice job! I’m trying to perform some analysis in R using the implied package.. it seems it returns an error for some combination of odds (may be it does not converge to the solution!).

Particularly, it returns:

Error in if any(problematic)) { :

missing value where TRUE/FALSE needed

It happens probably because some model condition about the odds is not fulfilled and the code returns missing data.

Do you know which conditions about the odds have to be satisfied to run the implied_probabilities function?

Sorry, I’tt try to be more precise.. the methodology used is “shin” and beyond the error message it returns a warning message too.

The message is:

In implied_probabilities(odds, method=”shin”):

Could not find z: Did not converge in 12 instances. Some results may be unreliable. See the “problematic” vector in the output.

The problem is that it does not return the output dataframe, so I cannot check for nothing!

Anyway, thanks for the package! It is really really useful!