Exploring a network of football teams and the transactions they made from 2018-2021. An Edgelist maybe?
I begin by converting the data into an edgelist format. I select the club from and club to columns which are our nodes in this case. The relationship that I’m interested in is where from and where to has the majority of transfers taken place across various places. After selecting the from and to columns, I select the players, the transfer fee and the season in which the transfer took place. I also remove loan and free transfer as they don’t seem relevant at the moment.
#load data from CSV
transfers <- read_csv("/Users/isha/Desktop/GitHub/transfernetworks.csv")
#select relevant data for edgelist format
data <- transfers %>%
select(club_from, club_to, name,fee) %>%
filter(fee != "free transfer") %>%
filter(fee != "loan transfer") %>%
filter (fee != 0)
data$fee <- as.numeric(data$fee)
data <- data %>%
filter(fee != is.na(fee))
any(is.na(data$fee))
[1] FALSE
head(data)
# A tibble: 6 × 4
club_from club_to name fee
<chr> <chr> <chr> <dbl>
1 Aston Villa Manchester City Jack Grealish 117000000
2 Inter Milan Chelsea FC Romelu Lukaku 113000000
3 Borussia Dortmund Manchester United Jadon Sancho 85000000
4 ACF Fiorentina Juventus FC Dušan Vlahović 81000000
5 Real Madrid Manchester United Raphaël Varane 40000000
6 Inter Milan Paris Saint-Germain Achraf Hakimi 66000000
club_from | club_to | name | fee |
---|---|---|---|
Aston Villa | Manchester City | Jack Grealish | 1.17e+08 |
Inter Milan | Chelsea FC | Romelu Lukaku | 1.13e+08 |
Borussia Dortmund | Manchester United | Jadon Sancho | 8.50e+07 |
ACF Fiorentina | Juventus FC | Dušan Vlahović | 8.10e+07 |
Real Madrid | Manchester United | Raphaël Varane | 4.00e+07 |
Inter Milan | Paris Saint-Germain | Achraf Hakimi | 6.60e+07 |
The edgelist I created consists of 867 nodes and 4386 ties. The edge attributes are the players who are being transferred. The weight of the edges is the fee for which they were transferred. The edgelist is directed because the players are transferring from one team to another.
#convert data into matrix format
data.mat <- as.matrix(data)
#create igraph object from data
ig <- graph_from_data_frame(data.mat, directed = TRUE)
# add edge attribute weight i.e transfer fee
ig <- set_edge_attr(ig, "weight", value = na.omit(data$fee))
# add edge attribute season of transfer
#ig<- set_edge_attr(ig, "season", value = na.omit(data$season))
# add node attribute i.e league the club belongs to
#set_vertex_attr(ig,"league" ,value = node_attr)
#delete edge attribute that was automatically being created in addition to weight
ig <- delete_edge_attr(ig, "fee")
#check summary of the igraph object
summary(ig)
IGRAPH eb768a3 DNW- 867 4386 --
+ attr: name (v/c), name (e/c), weight (e/n)
#convert ig network into intergraph object to coerce with statnet
network <- intergraph::asNetwork(ig)
network
Network attributes:
vertices = 867
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = TRUE
bipartite = FALSE
total edges= 4386
missing edges= 0
non-missing edges= 4386
Vertex attribute names:
vertex.names
Edge attribute names not shown
#plot intergraph object
plot(network)
#plot igraph object
plot(ig)
The network consists of 867 nodes and 4386 edges. This means that there are 867 football clubs in our network and we are going to be explore the transfer of players that has taken place in the last four years i.e 8 transfer windows.
# look at the dyad census
dyad_census(ig)
$mut
[1] 201
$asym
[1] 3468
$null
[1] 371742
There are 201 mutual, 3468 asymmetrical and 371742 dyads
triad_census(ig)
[1] 105140687 2584373 450247 12854 17067 27123
[7] 4389 3716 1486 339 380 182
[13] 194 306 134 28
triangles(ig)
+ 8007/867 vertices, named, from eb768a3:
[1] SL Benfica
[2] Borussia Dortmund
[3] Eintracht Frankfurt
[4] SL Benfica
[5] Borussia Dortmund
[6] PSV Eindhoven
[7] SL Benfica
[8] Borussia Dortmund
[9] CA Boca Juniors
[10] SL Benfica
+ ... omitted several vertices
The results suggest that the network is directed, i.e players are transferring from one club to another.
The network is weighted. The weight of the transfer is the fee for which the transfer was made.
The network is not bipartite which mean that they transfers are not in sets- there is a flow of transfers of players in the network.
#get global clustering cofficient: igraph
transitivity(ig, type="global")
[1] 0.1088854
#get average local clustering coefficient: igraph
transitivity(ig, type="average")
[1] 0.1058294
The global transitivity of the graph is 0.1088854 which is the ratio of triangles connected to triangles
the average transitivity 0.1058294 is the transitivity of the local triad clusters, i.e. the ratio of local triangles to all connected triangles.
The average path length in the weighted network is 5689620.
[1] 5689620
The shortest_paths function enables us to look at the shortest parts between two nodes. Let’s explore some shortest between football clubs of various leagues.
shortest_paths(ig,"Chelsea FC", "Liverpool FC")$vpath[[1]]
+ 6/867 vertices, named, from eb768a3:
[1] Chelsea FC Spartak Moscow SC Freiburg
[4] 1.FC Union Berlin FC Schalke 04 Liverpool FC
shortest_paths(ig, "FC Porto", "Juventus FC")$vpath[[1]]
+ 6/867 vertices, named, from eb768a3:
[1] FC Porto AS Roma ACF Fiorentina Hellas Verona
[5] SS Lazio Juventus FC
shortest_paths(ig, "Bayern Munich", "Aston Villa")$vpath[[1]]
+ 9/867 vertices, named, from eb768a3:
[1] Bayern Munich TSG 1899 Hoffenheim VfB Stuttgart
[4] SC Braga Olympiacos Piraeus Red Star Belgrade
[7] UD Las Palmas LOSC Lille Aston Villa
distances(ig,"Chelsea FC", "Real Madrid")
Real Madrid
Chelsea FC 8e+05
distances(ig, "Bayern Munich", "Chelsea FC")
Chelsea FC
Bayern Munich 1050000
[1] 5689620
The distance between Chelsea and FC Barcelona is 1.6 nodes.
names(igraph::components(ig))
[1] "membership" "csize" "no"
igraph::components(ig)$no
[1] 20
igraph::components(ig)$csize
[1] 829 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[17] 2 2 2 2
#igraph::components(ig)$membership
There are 20 components in this network. The largest component consists of 829 nodes and the rest nineteen components comprise of two nodes each.
plot(ig, vertex.label.color = "black")
m1 <- layout_nicely(ig)
w1 <- E(ig)$names
plot(ig,
vertex.label.color = "black",
edge.color = 'black',
edge.width = w1,
layout = m1)
library(ggraph)
ggraph(ig, layout = "with_kk" )+
geom_edge_link(aes(alpha = weight))+
geom_node_point(aes(size = strength(ig)))
library(visNetwork)
data_2 <- toVisNetworkData(ig)
visNetwork(nodes = data_2$nodes, edges = data_2$edges, width = 300, height = 300) %>%
visIgraphLayout(layout = "layout_with_kk") %>%
visOptions(highlightNearest = TRUE) %>%
visOptions(nodesIdSelection = TRUE) %>%
visOptions(selectedBy = "group")
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kumar (2022, May 11). Data Analytics and Computational Social Science: Exploring European Football Transfer Networks. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsisha-mahajan12githubiofootballtransfersocialnetworkposts2022-05-08-exploring-european-football-transfer-networks/
BibTeX citation
@misc{kumar2022exploring, author = {Kumar, Isha Akshita Mahajan, Ankit}, title = {Data Analytics and Computational Social Science: Exploring European Football Transfer Networks}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsisha-mahajan12githubiofootballtransfersocialnetworkposts2022-05-08-exploring-european-football-transfer-networks/}, year = {2022} }