If you want to do your own number crunching on football results you need data. There are a lot of websites out there with results and statistics, but to get the information into a convenient format can be a hassle. Luckily there are some places where you can get good datasets available for download. Here I have collected some of them. I will update this list when I find more good sites. Also, if you know of some, feel free to mention them in the comments.
This list was last updated January 8th 2017.
football-data.co.uk
Match results and statistics from many European leagues and tournaments, England, Germany, Italy, Spain etc. With data all the way back to the 1990’s plus odds data from different bookmakers. The datasets are updated each week.
Format: csv, excel, zip
Link: http://www.football-data.co.uk/data.php
James Curley’s engsoccerdata
All English league matches from the very beginning in 1888 to the present, in the convenient form of a R package.
Format: csv, rdata
Link: engsoccerdata @ github
Stryktipsetisistastund.se
Match results from Swedish, Norwegian and Finish leagues in a format similar to football-data.co.uk.
Format: csv
Link: stryktipsetisistastund.se/data
Ouseful.info: Sports data and R
A list of R packages for sports and football analytics, including some packages that consists mostly of data sets.
Format: R packages
Link
Football stadium coordinates
Small data set compiled by me, with GPS coordinates for the home stadiums for about 130 European teams.
Format: csv
Link
Kaggle
A data set with details on 25k eurpean matches and 11k players. Registration required.
Format: csv
Link: European Soccer Database
Link: Open Data Spotlight: The Ultimate European Soccer Database | Hugo Mathien
WoSo Stats
WoSo Stats collects data for women’s soccer. Including detailed match event data.
Format: csv
Link
The Guardian
The Guardian sometimes post football related datasets, especially in connection with large football events. Player statistics from Opta, historical match results and more.
Format: Google Docs (can be downloaded as csv or excel)
Link: World Cup 2010
Link: EURO 2012
The Guardian – Top 100 players
Each year The Guardian asks 100+ experts to rank the best players in the world. The complete details of the voting is available as a spreadsheet.
Format: Google Docs (can be downloaded as csv or excel)
Link: 2012 (The full voting data is not available, but the 2012 ranking is available in the 2013 spreadsheet)
Link: 2013
Link: 2014
Link: 2015
Link: 2016
openfootball football.db
A large database of football results.
Format: database
Link
Openfooty API
An API which makes available results, player stats, lineups, tables et cetera. Registration required.
Format: XML, php, json
Link: Openfooty
MCFC Analytics
Manchester City and Opta released a ton of data on every player from the 2011-12 Premier League season. Registration required.
Format: Excel, xml
Link: MCFC Analytics
Kaggle
Some historical World Cup data. Registration required.
Format: csv
Link: World Cup 2010 – Take on the Quants
Link: World Cup 2010 – Confidence Challenge
Pingback: Getting started: Where do I get my data from? | stats and snakeoil
Am looking for footaball data that contains the usual teams involved, match result plus stadium where match was played.
You can use the football stadium coordinates dataset i compiled. It contains stadium name and and coordiantes and should be easy to merge with a table of results. It miht be a bit outdated, but its a start.
Dear Jonas,
Thank you for your excellent blog that helps me a lot.
I am interested in the effect of a new trainer on a teams performance. So, I would like to ask if there exists a historical dataset (as far back as possible) containing all trainers of each team that played in the premier league at some point in this period. I would just need 4 columns: Team, Name of the Coach, Date of hire, end of contract. However, if you have something similar but for another country I would be interested too.
Thank you very much in advance
Best Regards,
Stefan
Sorry the late answer. I am not aware of any such data set that is readily available, but this Wikipedia article seems like a good place to start:
https://en.wikipedia.org/wiki/List_of_Premier_League_managers
I’m sorry this is vague, but I remember reading a study on this topic a few years ago. It was produced by a German University and if I remember correctly concluded that the team’s performances after 15 games were thr best predictor of performance under the new coach
Every time I go looking for something to do with football/sports modelling, I end up back here.
Amazing work – thanks for all your efforts!
Hi. Love your blog! I’m looking for a dataset of English premier league results, with the time of all goals scored shown. I’m a bit of a novice so ideally in csv so I can stick it into excel for cleaning and analysis (I’m basically looking at which teams have lost most points from late goals). Thanks if you can help!
Im sorry, I don’t know of any.
Hi, I am looking for a data set which has a collection of transfers, including details about the player (Position, Nationality, Age, Basic performance data), and details about the transer (The fee in particular)
I’m also interested in data about player wages, aswell as their details again. (Wages, Position, Club, Age) etc etc
Thanks in advance, great job on the site!
Hey Jonas,
first of all, thanks for the information offered in your blog!
Im writing my bachelor thesis about Elo Ratings at the moment which contains statistical analysis I have to do, whether specific hypothesis about Elo Ratings are consistent with real world data or not. For my research, I will need the end results (tables) of the top five European leagues for the last 25-50 years. So far I haven’t found any useful datasets to work with. Do you maybe have a clew?
Best Regards, Marlon
I would suggest you look at the engsoccerdata package. It contains a LOT of historical data. You would have to compute the league points your self, but that is not hard.
https://github.com/jalapic/engsoccerdata
I would like to obtain Team Data from about 40 leagues across the world. The applications I have looked at so far require me identifying the league number for each of the 40 leagues and carrying out 40 API calls. Are you aware of anywhere that would provide this data and allow me to download it it a simpler fashion?
Not that I am aware of. Identifying 40 league numbers from an API, and manually plotting them into your script does not sound like much work, actually.
Do you know anywhere where I can obtain data for player’s wages?
https://www.kaggle.com/karangadiya/fifa19
Do you know of any such datasets that provides stats of players of last 30 -40 years or even more? Top 5 European leagues and Champions League. Stats that include goals, assists, GK stats and all.
No i am not aware. I doubt the more detailed types of stats (assists and so on) weren’t recorded at the time.
Hi there,
Does it exist a dataset to check the geoposition of players and the ball, for a particular match?
I haven’t looked much at that type of data myself, but take a look at some of the things I have shared on twitter, and this youtube channel https://www.youtube.com/channel/UCUBFJYcag8j2rm_9HkrrA7w.
Hola! Estoy buscando una base datos como la de football-data.co.uk pero con información de champions y Europa League, habrá alguna que tenga esta información?
I don’t know of anyone, sorry.
Hello,
I am a student in Paris and i want to make a thesis about the prediction of the results of soccer matches (1X2, exact scores, number of goals, …).
I will be very grateful if there was a possibility to provide me with the Dixon and Coles model code.
Moreover, does this code take into account the accuracy of the forecast, in particular thanks to the LogLoss function?
How are the relegated and promoted teams managed from one season to another in terms of predictions?
Their attack strength and defense potential are not similar to what they had in their league last season.
What assumption is made in this regard?
I really thank you for the help you can give me.
Thank you very much in advance,
Have a good day.
Sebastien
I have written several blogposts about the Dixon-Coles model which includes lots of R code. I have also made the R package with the Dixon-Coles model plus much more: https://github.com/opisthokonta/goalmodel.
Thanks a lot for your answer.
In your programmation, have you considered the question as follows :
How are the relegated and promoted teams managed from one season to another in terms of predictions?
Over how many seasons do you do the calculations for the predictions?
Thanks a lot!
Yes I have considered those questions, and I think I have written about them some times on this blog.
Hello,
About your R package with the Dixon-Coles model, can we transform easily this programmation with the language Python instead of R ?
Thanks for your return,
Best regards.
There are ways to call R code from Python I think, but I am not familiar with them. You can take a look at this blog that has some python code for the DC model: https://dashee87.github.io/football/python/predicting-football-results-with-statistical-modelling-dixon-coles-and-time-weighting/
Hello I need csv file for players real time stats like goals scored, matches played, clean sheets etc, for last 5-7 years. Could you please help me where I can find that?
Im sorry, im not aware about any such stats easily available.
Hello,
I am looking for financial data of football clubs displayed in financial statements (revenues, P&L, payers’ trading, etc.) in order to develop a predictive financial model for football clubs;
Do you know where I can find such data ,
Thank you in advance for your answer.
Best regards,
Tarik
Im sorry, i don’t know of any such data. I guess annual reports from the clubs or leagues would be a good place to start.
No worries.
Thank you for your reply!