I was asked in the comments for the R code for the ranked probability score, so instead of posting it deep down in the comments I thought I’d post it as a proper blog instead. The ranked probability score (RPS) is a measure of how similar two probability distributions are and is used as a way to evaluate the quality of a probabilistic prediction. It is an example of a proper scoring rule.

The RPS was brought to my attention in the paper *Solving the problem of inadequate scoring rules for assessing probabilistic football forecasting models* by Constantinou and Fenton. In that paper they argue that the RPS is the best measure of the quality of football predictions when the predictions are of the type where you have probabilities for the outcome (home win, draw or away win). The thing about the RPS is that it also reflects that an away win is in a sense *closer* to a draw than a home win. That means that a higher probability predicted for a draw is considered better than a higher probability for home win if the actual result is an away win.

You can also find some more details at the pena.lt/y blog.

The following R function takes two arguments. The first argument (*predictions*) is a matrix with the predictions. It should be laid out so that each row is one prediction, laid out in the proper order, where each element is a probability and each row sum to 1. The second argument (*observed*) is a numeric vector that indicates which outcome that was actually observed.

For assessing football predictions the predictions matrix would have three columns, with the probabilities for the match ordered as home, draw and away (or in the opposite order).

rankProbScore <- function(predictions, observed){ ncat <- ncol(predictions) npred <- nrow(predictions) rps <- numeric(npred) for (rr in 1:npred){ obsvec <- rep(0, ncat) obsvec[observed[rr]] <- 1 cumulative <- 0 for (i in 1:ncat){ cumulative <- cumulative + (sum(predictions[rr,1:i]) - sum(obsvec[1:i]))^2 } rps[rr] <- (1/(ncat-1))*cumulative } return(rps) }

Hi,

I’m using excel to calculate RPS. Could you help me to understand how it works?

Thank you

It works by comparing the cumulative probability distributions of the predictions and the outcome. Since the outcomes are ordered, it makes sense to use the cumulative distributions, since for the observed outcome this changes from 0 to 1 abruptly at point of the outcome.

Consider the following hypothetical scoring for a categorical (non-ordinal) outcome: For each possible outcome, take the squared difference between the observed probability (either 0 or 1) and your predicted probability. The higher probability you managed to give the actual outcome, you would get a lower score, indicating a better prediction. Now you can think of the RPS as an extension of this, where instead of taking the squared probability differences of each particular outcome instead, for a particular outcome, also take the sum of all probabilities “below” or “before” that outcome in addition. That way high probabilities far away from the observed outcome will have a greater influence on the score.

Thank you for your answer.

Could you help me with a practical example?

There are some nice examples in the paper I linked to in the second paragraph.

I found this document:

http://trap.ncirl.ie/1875/1/lucaairoldi.pdf

I’m trying to understand how to have the RPS results at page 28.

Could you help me?

Thank you

Hello, your R function don’t works right. I read the paper “Solving the problem of inadequate scoring rules for assessing probabilistic football forecasting models”.

I applied your function in match 3 of this paper (for model alpha and beta) and I got different results in the paper (see Table 3).

It works for me. I think perhaps you use it wrong. The observed argument should be a number in {1,2,3}, not a matrix or length-3 vector of 0’s and 1’s. The number points to the column in the prediction matrix the observed result correspond to.

Here is how I manage to reproduce the RPS for match 3:

I see that I didn’t explain this very well in the post. When I tried to use it myself I also got wrong results and it took me some time to figure out what was wrong. Do you get this to work?

Ok, you is right. I was using

rankProbScore(predictions=alpha, observed=c(0,1,0))

instead of

rankProbScore(predictions=alpha, observed=2)

Thank you

Hello,

When I use this function, I provide 2 inputs: a predictions outcome matrix with H, D, A probabilities for all matches and a vector with values 1, 2 or 3 depending on the actual outcome of the match.

However, when the rps is returned, I get a value for each match. Is that what it should be? Or should it be one numerical value? Or is it a problem with my inputs to the function?

I reedited the code as follows: Firstly, I returned a list just to see what the function is doing. One thing I notice is that returning “obsvec” only returns one line: [1] 0 0 1.

Then I added one line:

rps[rr] <- (1/(ncat-1))*cumulative

rankscore <- (1/(ncat-1))*cumulative

the rankscore returns one numerical value. In the case of xi = 0.0018 I get rankscore = 0.5849709, using data from Premiership & Championship 2015-16 and 2016-17.

Also, I got an interesting return value for "cumulative" of 1.169942. Not quite sure what that means, but it doesn't sound right.

Below the code:

rankProbScore <- function(predictions, observed){

ncat <- ncol(predictions)

npred <- nrow(predictions)

rps <- numeric(npred)

for (rr in 1:npred){

obsvec <- rep(0, ncat)

obsvec[observed[rr]] <- 1

cumulative <- 0

for (i in 1:ncat){

cumulative <- cumulative + (sum(predictions[rr,1:i]) – sum(obsvec[1:i]))^2

}

rps[rr] <- (1/(ncat-1))*cumulative

rankscore <- (1/(ncat-1))*cumulative

}

return(list(

ncat,

npred,

obsvec,

cumulative,

rps,

rankscore

))

}

Yes thr RPS is calculated for each game. To get one value for all the games you can take the average.

I havent tested your code, but you seem to make an error where you return the values for the last game. This is because you in each iteration overwrite the previous values of rankscore. The cumulative variable should also not be returned, since this too is overwritten in each for each game.

So I should get a mean of the rps[rr] output and ignore rankscore and cumulative?

Something like this?

return(list(

rps = rps,

meanrps = mean(rps),

sumrps = sum(rps)

))

Yes, like that.

Hi all,

First of all, I really like your blog. Unfortunately, I am not very familiar with Python. So I am trying to write the function in Python. I got the following sofar:

import numpy as np

import pandas as pd

def RPS(predictions, observed):

ncat = 3

npred = len(predictions)

RPS = np.zeros(npred)

for x in range(0, npred):

obsvec = np.zeros(ncat)

obsvec[observed.iloc[x]-1] = 1

cumulative = 0

for i in range(1, ncat):

cumulative = cumulative + (sum(predictions.iloc[x, 1:i]) – sum(obsvec[1:i]))

RPS[x] = (1/(ncat-1)) * cumulative

return RPS

df = pd.read_csv(‘test.csv’, header=0)

predictions = df[[‘H’, ‘D’, ‘L’]]

observed = df[[‘Outcome’]]

RPS = RPS(predictions, observed)

Please point out any errors! Appreciate it

what sort of error message do you get?

The script technically works, but I get the wrong RPS-values if I use the values of Table 1 in the paper of Constantinou

I also forgot to square the cumulative.

cumulative = cumulative + (sum(predictions.iloc[x, 1:i]) – sum(obsvec[1:i])) ** 2

Still, I get different values

The following script works and gives me the same output as in Table 3 of the paper.

import numpy as np

import pandas as pd

def RPS(predictions, observed):

ncat = 3

npred = len(predictions)

RPS = np.zeros(npred)

for x in range(0, npred):

obsvec = np.zeros(ncat)

obsvec[observed.iloc[x]-1] = 1

cumulative = 0

for i in range(0, ncat):

cumulative = cumulative + (sum(predictions.iloc[x, 0:i]) – sum(obsvec[0:i])) ** 2

RPS[x] = (1 / (ncat – 1)) * cumulative

return RPS

df = pd.read_csv(‘test.csv’, header=0)

predictions = df[[‘H’, ‘D’, ‘L’]]

observed = df[[‘Outcome’]]

RPS = RPS(predictions, observed)

print(RPS)

Good, and thank you for sharing the code with the rest of us!

What is the difference of ranked probability score and Brier Score?

The Brier score is basically squared error, while the RPS is the difference in the cumulative distributions. I suggest reading the paper I linked to for a thorough discussion.