The probabilities implied by bookmaker odds: Introducing the ‘implied’ package

Posted on March 1, 2020 by opisthokonta

My package for converting bookmaker odds into probabilities is now on available from CRAN. The package contains several different conversion algorithms, which are all accessible via the implied_probabilities() function. I have written an introduction on how you can use the package here, together with a description of all the methods and with references to papers. But I also want to give some background to some of the methods here on the blog as well.

In statistics, an odd is usually taken to mean the inverse of a probability, that is 1/p, but in the betting world different odds formats exists. As usual, Wikipedia has a nice overview of the different formats. In the implied package, only inverse probability odds are allowed as inputs, which in betting are called decimal odds.

Now you might think that converting decimal odds to probabilities should be easy, you can just use the definition above and take the inverse of the odds to recover the probability. But it is not that simple, since in practice using this simple formula will give you improper probabilities. They will not sum to 1, as they should, but be slightly larger. This gives the bookmakers an edge and the probabilities (which aren’t real probabilities) can not be considered fair, and so different methods for correcting this exists.

Some methods uses different types of regression modelling combined with historical data to estimate the biases in the different outcomes. This is for example the case in the paper On determining probability forecasts from betting odds by Erik Štrumbelj. Anyway, the implied package does not include these kinds of methods. The reason I wanted to mention this paper is that this was where I first read about Shin’s method for the first time.

All the methods in the package are what I call one-shot methods. The conversion of a set of odds for a game only relies on the odds them self, and not on any other data. This is deliberate choice, since I didn’t want to make a modelling package, since that would be much more complicated.

Many of the methods in the package comes are described in the Wisdom of the Crowd document by Joseph Buchdahl, and a review paper by Clarke et al (Adjusting Bookmaker’s Odds to Allow for Overround).

Many of the methods in the package can be described as ad hoc methods. They basically use a simple mathematical formula that relates the true underlying probabilities to the improper probabilities given by the bookmakers odds. Then this formula is used to find the true probabilities so that they are proper (sum to 1) while also recovering the improper bookmaker probabilities.

A few other methods in the package are more theory based, like Shin’s method, and I find these methods really interesting. Shin’s method imagine that there are two types of bettors. The first type is the typical bettor, and the sum of bets by this type follows the “wisdom of the crowd” pattern which should reflect the true ncertainty of the outcome given the publicly available information. Then there is a second type of bettor, which has inside information and always bets on the winning outcome. However, the bookmaker don’t know what type of bettor the individual bettors are, and only observes the mixture of the two types. Here is the interesting part: By assuming the bookmakers know that there are two types of bettors, and that the bookmakers seek to maximize their profits, Shin was able to derive some complicated formulas that relate the true underlying “wisdom of the crowds” probabilities and the bookmakers odds. These formulas can be used in the same way as the ad hoc methods to find the underlying probabilities.

A natural question question is what method gives the most realistic probabilities? There is no definite answer to this, and different methods will be best in different markets and settings. You need to figure this out for yourself.

I am currently working on some new methods inspired by Shin’s framework which I hope to write about later. Shin’s work was mostly done in the context of horse racing, where there is realistic that some bettors have inside information. I hope to develop a method that is more relevant for football.

23 thoughts on “The probabilities implied by bookmaker odds: Introducing the ‘implied’ package”

TS on March 2, 2020 at 8:57 pm said:

Thank you very much on this.

Reply ↓
Karol on March 7, 2020 at 12:40 am said:

really nice, it’s the one I have been looking for lately. Building logistic regression model in r, and was wondering why sometimes after adding up prob I get value >1. So will use it for sure

Reply ↓
amin89 on March 20, 2020 at 9:34 pm said:

Thank you ! Really good stuff
Let me ask you a question please. Do you have an idea (or maybe an URL source) of how we can predict next match goals in football using decision trees or random forest ?

Reply ↓
- opisthokonta on March 22, 2020 at 9:23 am said:
  
  I have seen a few cases of it, but the only one I could find right now was this https://arxiv.org/abs/1806.03208
  
  Oh, and I almost forgot, I wrote a post about it some time ago https://opisthokonta.net/?p=809
  
  Reply ↓
Niels on April 4, 2020 at 8:17 pm said:

Why not do a regression and fit with the reality?

Reply ↓
- opisthokonta on April 9, 2020 at 12:57 pm said:
  
  The point with these methods is to NOT use regression modelling, but to only use the information inherent in the odds for a single match. The paper I linked to also seem to support that these methods might be better than regression models.
  
  Reply ↓
Quantopic on April 5, 2020 at 11:59 am said:

Hi, really nice job! I’m trying to perform some analysis in R using the implied package.. it seems it returns an error for some combination of odds (may be it does not converge to the solution!).

Particularly, it returns:

Error in if any(problematic)) { :
missing value where TRUE/FALSE needed

It happens probably because some model condition about the odds is not fulfilled and the code returns missing data.

Do you know which conditions about the odds have to be satisfied to run the implied_probabilities function?

Reply ↓
- Quantopic on April 5, 2020 at 12:40 pm said:
  
  Sorry, I’tt try to be more precise.. the methodology used is “shin” and beyond the error message it returns a warning message too.
  
  The message is:
  In implied_probabilities(odds, method=”shin”):
  Could not find z: Did not converge in 12 instances. Some results may be unreliable. See the “problematic” vector in the output.
  
  The problem is that it does not return the output dataframe, so I cannot check for nothing!
  
  Anyway, thanks for the package! It is really really useful!
  
  Reply ↓
  - opisthokonta on April 9, 2020 at 12:53 pm said:
    
    Thank you for reporting this. I sent you an email asking for some more details. Thank you.
    
    Reply ↓
  - opisthokonta on April 17, 2020 at 11:59 am said:
    
    A possible workaround to get the non-problematic probabilities and identify the problematic ones is to write a loop for all sets of odds, and then feed the one-by-one to the implied_probabilities() function. I am also working on fixing the bug so that at least it raises a warning and return the OK results.
    
    Reply ↓
    - Alessandro Betori on June 3, 2022 at 3:46 pm said:
      
      Hello, i get error f() values at end points not of opposite sign with odds 1.19, 7.0,14.0 and method=’jsd’
      
      Reply ↓
Adryan87 on April 13, 2020 at 4:55 pm said:

Ok,i installed that program R…Then what? I thought this run auto..

Reply ↓
- opisthokonta on April 14, 2020 at 9:35 am said:
  
  If you have never used R before it might be a bit tricky to access the functionality of this package. I suggest you find a tutorial on R, and learn to read in and manipulate data, install packages and so on. There are countless tutorials out there, just google it.
  
  Reply ↓
  - Adryan87 on April 15, 2020 at 4:46 pm said:
    
    Ok. Thanks. I need to type manual all teams and odds? What i want to say is…so much work..how are the results? Some profit?
    
    Reply ↓
    - opisthokonta on April 16, 2020 at 8:39 am said:
      
      You don’t have to provide the team names, only the odds. But check out the link to the Wisdom of the Crowd webpage. Odds from numerous providers are posted regularly (except for now that sports are cancelled) as easy-to-use csv and excel files, and I think there’s even a excel sheet with some similar calculators.
      
      Reply ↓
Kamancho on May 1, 2020 at 2:51 pm said:

Thank you for the package.

“Shin’s method imagine that there are two types of bettors”.

Perhaps this was true several years ago, when there was extremely little information… Today, I think, this is the wrong direction. Now bookmakers already know more than individual players. Or at least, bookmakers have a lot of statistical and other information that allows them to confuse us by setting the wrong odds.

Even logically – if bookmakers would ALWAYS set fair odds, then they quickly went bankrupt. For me, odds are misinformation.

Why you focus so much on odds?

Reply ↓
- opisthokonta on May 4, 2020 at 5:01 pm said:
  
  Shin’s model was devloped in the early 90s for horse race betting, so yes, it might not be the best method for football odds.
  
  Reply ↓
Evan Dens on June 7, 2020 at 10:01 pm said:

Thank you for your time and effort an addition to my quantitative betting toolset

Reply ↓
Henrique on June 30, 2020 at 8:16 am said:

You probably won’t see it since it’s an old post, but I have a doubt: I’m still a Newbie in R, but I’m trying to apply the things I learn to a hobby project of mine: Resimulating the history of a particular league, season by season. Using as an example the Premier League, I would go back to 1889 and use the real data of matches and goals scored to calculate expected goals and then randomly generate a Poisson-based number for each team to get match results and then build a table.
I used to do this on Excel, but your Goalmodel package is doing wonders for me right now. My doubt is regarding international football, like World Cups. Is there a way to do something similar? Goalmodel is working just fine but it requires matches and scores to calculate the attack/defense strength and then the scoreline probabilities. Is there a way to apply this to international football? They don’t play each other that much and all I have back in 1930 are Elo ratings. There’s a package called “Elo” but it only offers the outcome probability, not the result.

Thanks in advance and sorry for the book I just wrote.

Reply ↓
- opisthokonta on July 1, 2020 at 9:45 am said:
  
  Yes you would need the actual scores for the goalmodel package to be useful. I don’t know any easy to use sources for that kind of data, so you would probably need to scrape the data yourself.
  
  It is a problem that international games are “sparse”, most countries don’t play each other, so most comparisons between rely on indirect data. I explored this a bit in this old post. As long as the graph is connected, it the goalmodel should work. I think it gives a warning or error if it contains two or more separate components.
  
  Reply ↓
  - Henrique on July 1, 2020 at 3:11 pm said:
    
    Thanks for the reply! I already have all the scores needed, manage to scrape it from the international-football website. I’m using matches between World Cups as a parameter (1930 uses data from 1927 to 1930). I don’t know if it will do, but I’ll test. There’s one problem though: yes, it’s giving me this warning of not comparable clusters. Probably some group of islands kept playing themselves but not anyone else. Is there a way to identify these clusters and remove them?
    
    Reply ↓
    - opisthokonta on July 2, 2020 at 5:06 pm said:
      
      You can use the igraph pcakage. I dont’t quite remember the details, but you create a graph (ie a network) of all teams using the graph_from_dataframe function, using just the two vectors of teams, and then you use some function to find the “connected components”, I don’t remember the name.
      
      Reply ↓
niels on July 4, 2020 at 3:33 pm said:

How about plotting the quantils of the betfair probabilities against the bookmakers?

Reply ↓

23 thoughts on “The probabilities implied by bookmaker odds: Introducing the ‘implied’ package”

Leave a Reply to opisthokonta Cancel reply