The probabilities implied by bookmaker odds: Introducing the ‘implied’ package

My package for converting bookmaker odds into probabilities is now on available from CRAN. The package contains several different conversion algorithms, which are all accessible via the implied_probabilities() function. I have written an introduction on how you can use the package here, together with a description of all the methods and with references to papers. But I also want to give some background to some of the methods here on the blog as well.

In statistics, an odd is usually taken to mean the inverse of a probability, that is 1/p, but in the betting world different odds formats exists. As usual, Wikipedia has a nice overview of the different formats. In the implied package, only inverse probability odds are allowed as inputs, which in betting are called decimal odds.

Now you might think that converting decimal odds to probabilities should be easy, you can just use the definition above and take the inverse of the odds to recover the probability. But it is not that simple, since in practice using this simple formula will give you improper probabilities. They will not sum to 1, as they should, but be slightly larger. This gives the bookmakers an edge and the probabilities (which aren’t real probabilities) can not be considered fair, and so different methods for correcting this exists.

Some methods uses different types of regression modelling combined with historical data to estimate the biases in the different outcomes. This is for example the case in the paper On determining probability forecasts from betting odds by Erik Štrumbelj. Anyway, the implied package does not include these kinds of methods. The reason I wanted to mention this paper is that this was where I first read about Shin’s method for the first time.

All the methods in the package are what I call one-shot methods. The conversion of a set of odds for a game only relies on the odds them self, and not on any other data. This is deliberate choice, since I didn’t want to make a modelling package, since that would be much more complicated.

Many of the methods in the package comes are described in the Wisdom of the Crowd document by Joseph Buchdahl, and a review paper by Clarke et al (Adjusting Bookmaker’s Odds to Allow for Overround).

Many of the methods in the package can be described as ad hoc methods. They basically use a simple mathematical formula that relates the true underlying probabilities to the improper probabilities given by the bookmakers odds. Then this formula is used to find the true probabilities so that they are proper (sum to 1) while also recovering the improper bookmaker probabilities.

A few other methods in the package are more theory based, like Shin’s method, and I find these methods really interesting. Shin’s method imagine that there are two types of bettors. The first type is the typical bettor, and the sum of bets by this type follows the “wisdom of the crowd” pattern which should reflect the true ncertainty of the outcome given the publicly available information. Then there is a second type of bettor, which has inside information and always bets on the winning outcome. However, the bookmaker don’t know what type of bettor the individual bettors are, and only observes the mixture of the two types. Here is the interesting part: By assuming the bookmakers know that there are two types of bettors, and that the bookmakers seek to maximize their profits, Shin was able to derive some complicated formulas that relate the true underlying “wisdom of the crowds” probabilities and the bookmakers odds. These formulas can be used in the same way as the ad hoc methods to find the underlying probabilities.

A natural question question is what method gives the most realistic probabilities? There is no definite answer to this, and different methods will be best in different markets and settings. You need to figure this out for yourself.

I am currently working on some new methods inspired by Shin’s framework which I hope to write about later. Shin’s work was mostly done in the context of horse racing, where there is realistic that some bettors have inside information. I hope to develop a method that is more relevant for football.

16 thoughts on “The probabilities implied by bookmaker odds: Introducing the ‘implied’ package

  1. really nice, it’s the one I have been looking for lately. Building logistic regression model in r, and was wondering why sometimes after adding up prob I get value >1. So will use it for sure

  2. Thank you ! Really good stuff
    Let me ask you a question please. Do you have an idea (or maybe an URL source) of how we can predict next match goals in football using decision trees or random forest ?

    • The point with these methods is to NOT use regression modelling, but to only use the information inherent in the odds for a single match. The paper I linked to also seem to support that these methods might be better than regression models.

  3. Hi, really nice job! I’m trying to perform some analysis in R using the implied package.. it seems it returns an error for some combination of odds (may be it does not converge to the solution!).

    Particularly, it returns:

    Error in if any(problematic)) { :
    missing value where TRUE/FALSE needed

    It happens probably because some model condition about the odds is not fulfilled and the code returns missing data.

    Do you know which conditions about the odds have to be satisfied to run the implied_probabilities function?

    • Sorry, I’tt try to be more precise.. the methodology used is “shin” and beyond the error message it returns a warning message too.

      The message is:
      In implied_probabilities(odds, method=”shin”):
      Could not find z: Did not converge in 12 instances. Some results may be unreliable. See the “problematic” vector in the output.

      The problem is that it does not return the output dataframe, so I cannot check for nothing!

      Anyway, thanks for the package! It is really really useful!

      • A possible workaround to get the non-problematic probabilities and identify the problematic ones is to write a loop for all sets of odds, and then feed the one-by-one to the implied_probabilities() function. I am also working on fixing the bug so that at least it raises a warning and return the OK results.

    • If you have never used R before it might be a bit tricky to access the functionality of this package. I suggest you find a tutorial on R, and learn to read in and manipulate data, install packages and so on. There are countless tutorials out there, just google it.

      • Ok. Thanks. I need to type manual all teams and odds? What i want to say is…so much work..how are the results? Some profit?

        • You don’t have to provide the team names, only the odds. But check out the link to the Wisdom of the Crowd webpage. Odds from numerous providers are posted regularly (except for now that sports are cancelled) as easy-to-use csv and excel files, and I think there’s even a excel sheet with some similar calculators.

  4. Thank you for the package.

    “Shin’s method imagine that there are two types of bettors”.

    Perhaps this was true several years ago, when there was extremely little information… Today, I think, this is the wrong direction. Now bookmakers already know more than individual players. Or at least, bookmakers have a lot of statistical and other information that allows them to confuse us by setting the wrong odds.

    Even logically – if bookmakers would ALWAYS set fair odds, then they quickly went bankrupt. For me, odds are misinformation.

    Why you focus so much on odds?

    • Shin’s model was devloped in the early 90s for horse race betting, so yes, it might not be the best method for football odds.

Leave a Reply to Karol Cancel reply

Your email address will not be published. Required fields are marked *