How to Easily Get Football Data with a Python Package (Without Web Scraping) | by Frank Andrade | Nov, 2022
Get data about the World Cup, Champions League, La Liga, and more in a couple of minutes
Every minute of a football match generates data that can be used to get high-quality insights that could power player recruitment, and match analysis and help coaches make better decisions.
Unfortunately, most free datasets available online only contain basic stats such as goals, team names, and the day of the match.
We can’t get many valuable insights with only basic stats.
To get advanced stats you could try web scraping, but before spending time on that, you could explore the data some sports data providers share for free.
In this guide, we’ll explore all the free football data that Statsbomb shares on its Python package statsbombpy
.
Installing the Library
To get access to all the football data Statsbomb shares, we need to install statsbombpy
.
pip install statsbombpy
Once we have the library installed, we have to import it.
from statsbombpy import sb
Competitions Available
To see all the competitions that Statsbomb shares for free, we only need to run sb.competitions().
I’m going to drop the duplicates in the country_name
and competition_name
columns to show the unique competitions that Statsbomb has.
# show all competitions
sb.competitions()# show unique competitions
sb.competitions().drop_duplicates(['country_name', 'competition_name'])
As we can see, the free version of Statsbomb has competitions such as the FIFA World Cup, Champions League, La Liga, and Premiere League.
That said, Statsbomb has more competitions that are only available through API access for paying customers only.
Let’s explore one of the competitions available in the dataset — FIFA World Cup 2018.
FIFA World Cup 2018: Exploring the Matches
To explore the data for the FIFA World Cup 2018, we need the competition_id
and season_id
. From the image above, we can see that the values are 43 and 3, respectively.
df_2018 = sb.matches(competition_id=43, season_id=3)
df_2018.head(5)
Although it’s not visible in the image above, the dataframe df_2018
has 22 columns. That’s a good start for a football analytics project.
To get more information about a specific match we need the match_id
. For example, the 2018 World Cup final, France vs Croatia, has the id 8658.
id_final_2018 = 8658
df_2018[df_2018['match_id']==id_final_2018]
Now that we verify that 8658 is the correct id, let’s explore the lineups and all the events that happened in the 90 minutes.
Lineups
Let’s see the lineups of the match France vs Croatia.
lineups = sb.lineups(match_id=id_final_2018)
If we print lineups
, we’d see that it’s a dictionary. Let’s its keys.
>>> lineups.keys()
dict_keys(['France', 'Croatia'])
We can get access to the lineups of France and Croatia through the keys.
Here are the lineups of Croatia.
lineups['Croatia']
Here are the lineups for France.
lineups['France']
Match events
To get the events in the match France vs Croatia, we need to use the id of the match again.
df_events = sb.events(match_id=id_final_2018)
Let’s see all the columns inside df_events
df_events.columns
We can see that there’s a lot of information in this dataset. Let’s select only a few columns and sort the dataframe by the minute
and timestamp
columns.
df_events = df_events[['timestamp','team', 'type', 'minute', 'location', 'pass_end_location', 'player']]
df_events = df_events.sort_values(['minute', 'timestamp'])
Let’s see the events at the last minute of the match France vs Croatia.
df_events.tail(30)
That’s it! Now you can explore this package on your own to start your football analytics project.
Get data about the World Cup, Champions League, La Liga, and more in a couple of minutes
Every minute of a football match generates data that can be used to get high-quality insights that could power player recruitment, and match analysis and help coaches make better decisions.
Unfortunately, most free datasets available online only contain basic stats such as goals, team names, and the day of the match.
We can’t get many valuable insights with only basic stats.
To get advanced stats you could try web scraping, but before spending time on that, you could explore the data some sports data providers share for free.
In this guide, we’ll explore all the free football data that Statsbomb shares on its Python package statsbombpy
.
Installing the Library
To get access to all the football data Statsbomb shares, we need to install statsbombpy
.
pip install statsbombpy
Once we have the library installed, we have to import it.
from statsbombpy import sb
Competitions Available
To see all the competitions that Statsbomb shares for free, we only need to run sb.competitions().
I’m going to drop the duplicates in the country_name
and competition_name
columns to show the unique competitions that Statsbomb has.
# show all competitions
sb.competitions()# show unique competitions
sb.competitions().drop_duplicates(['country_name', 'competition_name'])
As we can see, the free version of Statsbomb has competitions such as the FIFA World Cup, Champions League, La Liga, and Premiere League.
That said, Statsbomb has more competitions that are only available through API access for paying customers only.
Let’s explore one of the competitions available in the dataset — FIFA World Cup 2018.
FIFA World Cup 2018: Exploring the Matches
To explore the data for the FIFA World Cup 2018, we need the competition_id
and season_id
. From the image above, we can see that the values are 43 and 3, respectively.
df_2018 = sb.matches(competition_id=43, season_id=3)
df_2018.head(5)
Although it’s not visible in the image above, the dataframe df_2018
has 22 columns. That’s a good start for a football analytics project.
To get more information about a specific match we need the match_id
. For example, the 2018 World Cup final, France vs Croatia, has the id 8658.
id_final_2018 = 8658
df_2018[df_2018['match_id']==id_final_2018]
Now that we verify that 8658 is the correct id, let’s explore the lineups and all the events that happened in the 90 minutes.
Lineups
Let’s see the lineups of the match France vs Croatia.
lineups = sb.lineups(match_id=id_final_2018)
If we print lineups
, we’d see that it’s a dictionary. Let’s its keys.
>>> lineups.keys()
dict_keys(['France', 'Croatia'])
We can get access to the lineups of France and Croatia through the keys.
Here are the lineups of Croatia.
lineups['Croatia']
Here are the lineups for France.
lineups['France']
Match events
To get the events in the match France vs Croatia, we need to use the id of the match again.
df_events = sb.events(match_id=id_final_2018)
Let’s see all the columns inside df_events
df_events.columns
We can see that there’s a lot of information in this dataset. Let’s select only a few columns and sort the dataframe by the minute
and timestamp
columns.
df_events = df_events[['timestamp','team', 'type', 'minute', 'location', 'pass_end_location', 'player']]
df_events = df_events.sort_values(['minute', 'timestamp'])
Let’s see the events at the last minute of the match France vs Croatia.
df_events.tail(30)
That’s it! Now you can explore this package on your own to start your football analytics project.