# This is a sample dataset call. Further in the project, we could change a couple of parameters in the dataset call for comparative analysis, etc.
<- fb_season_team_stats(country = "ENG", gender = "M", season_end_year = "2023", tier = "1st", stat_type = "shooting") prem_team_2023_shooting
Homework 2
1 Introduction
Until a decade and a half ago, recording data points regarding key events in soccer was a manual process. This limited the range of analysis and the variety of data attributes. Today, with the usage of advanced technology, hundreds of attributes describing a game are being recorded in real-time. This has enabled a wide range of research in the field of soccer analytics and led to the emergence of various soccer data aggregators like StatsBomb, FBref, Understat, Fotmob, etc. In this project, we will be using the data from these aggregators using the worldfootballR package in R. This package has been created by Jason Zivkovic that enables users to conveniently access data belonging to the above-mentioned aggregators in R.
The goal of the project will be to find performance attributes that best predict success in the game. We would want to see the following aspects:
- What are the best teams/players good at? We will compare their aggregate statistics like goals, assists, xG, xA, Goal difference, etc.
- Is xG a good indicator of success? Any trends over years?
- Which team has been the most unlucky? Here, we will analyse the difference between XG and Actual goals to see which teams couldn’t secure points despite performing well.
2 Data
This dataset allows us to obtain soccer game/team/player data with various attributes. The attributes are both categorical and numerical. The attributes describing a game are goals scored by the home/away team, assists, game result, corners taken, shots attempted, etc. Similar attributes are also recorded at a team and player level. The dataset also contains some advanced attributes like xG and xA that are widely accepted in the soccer analytics industry to describe the quality of actions of players/teams in quantitative terms. We will be making use of xG and related stats to determine if it is a good indicator of success (better points earned in the league, more goals scored, etc.). For simplicity, we will be using the stats of the 1st tier of English Premier League Football for our analysis.
nrow(prem_team_2023_shooting)
[1] 40
table(prem_team_2023_shooting$Team_or_Opponent)
opponent team
20 20
Attribute Information
- Competition_Name (str, categorical): Name of the Competition
- Gender (str, categorical): Gender category of the competition
- Country (str, categorical): Country of the competition
- Season_End_Year (int, numerical): Year of end of the competition season
- Squad (str, categorical): Name of the team
- Team_or_Opponent (str, categorical): Takes only two values - ‘team’ and ‘opponent’. If the record is a ‘team’ record, then the stats are scored by the Squad, else, the stats are scored against the Squad.
- Num_Players (int, numerical): Number of players registered in that season
- Mins_Per_90 (int, numerical): Games played per season
- Gls_Standard (int, numerical): Goals scored in a season
- Sh_Standard (int, numerical): Shots taken in a season
- SoT_Standard (int, numerical): Shots on target in a season
- SoT_percent_Standard (int, numerical): Ratio of SoT_Standard/Sh_Standard
- Sh_per_90_Standard (int, numerical): Sh_Standard per 90 mins
- SoT_per_90_Standard (int, numerical): SoT_Standard per 90 mins
- G_per_Sh_Standard (int, numerical): Gls_Standard/Sh_Standard
- G_per_SoT_Standard (int, numerical): Gls_Standard/SoT_Standard
- Dist_Standard (int, numerical): Distance travelled by the team in the season. (Running distance)
- FK_Standard (int, numerical): Free kicks taken in a season
- PK_Standard (int, numerical): Penalty kicks scored in a season
- PKatt_Standard (int, numerical): Penalty kicks awarded in a season
- xG_Expected (int, numerical): Expected Goals scored in a season
- npxG_Expected (int, numerical): non-penalty Expected Goals scored in a season
- npxG_per_Sh_Expected (int, numerical): non-penalty Expected Goals per shot in a season
- G_minus_xG_Expected (int, numerical): Difference between actual goals scored and Expected goals scored
- np:G_minus_xG_Expected (int, numerical): Difference between actual non-penalty goals scored and non-penalty Expected goals scored
# Sample analysis
# Pre-processing
# selecting the required columns
<- select(prem_2023_shooting, Squad, Team_or_Opponent, npxG_Expected) prem_2023_shooting_1
Error in select(prem_2023_shooting, Squad, Team_or_Opponent, npxG_Expected): object 'prem_2023_shooting' not found
<- mutate(prem_2023_shooting_1, Squad = case_when(str_detect(Squad, "vs ") ~ str_split(Squad, "vs ", simplify = TRUE)[,2],
prem_2023_shooting_1 TRUE ~ as.character(Squad)
))
Error in mutate(prem_2023_shooting_1, Squad = case_when(str_detect(Squad, : object 'prem_2023_shooting_1' not found
# Using the pivot_wider function to pivot the dataset
<- pivot_wider(prem_2023_shooting_1, names_from = "Team_or_Opponent", values_from = "npxG_Expected") prem_2023_shooting_pivoted
Error in pivot_wider(prem_2023_shooting_1, names_from = "Team_or_Opponent", : object 'prem_2023_shooting_1' not found
<- rename(prem_2023_shooting_pivoted, xG_For = team, xG_Against = opponent) prem_2023_shooting_pivoted
Error in rename(prem_2023_shooting_pivoted, xG_For = team, xG_Against = opponent): object 'prem_2023_shooting_pivoted' not found
prem_2023_shooting_pivoted
Error in eval(expr, envir, enclos): object 'prem_2023_shooting_pivoted' not found
# Building a scatter plot of xG_For and xG_Against
ggplot(prem_2023_shooting_pivoted,
aes(x = xG_For,
y = xG_Against,
col = Squad
+
)) geom_text(aes(y = xG_Against - 1, label = Squad), size = 3) +
geom_point(size = 2.5, alpha = 0.8) +
labs(x = "Expected Goals Scored",
y = "Expected Goals Conceded",
title = "xG_For vs. xG_Against",
subtitle = "xG For vs. Against comparison of all Premier league teams for season of 2022-2023") +
theme(legend.position = "none")
Error in ggplot(prem_2023_shooting_pivoted, aes(x = xG_For, y = xG_Against, : object 'prem_2023_shooting_pivoted' not found