Exploring European Football Transfer Networks

Exploring a network of football teams and the transactions they made from 2018-2021. An Edgelist maybe?

Isha Akshita Mahajan, Ankit Kumar (UMass Amherst)
2022-05-08

Load Required Packages

Load the data

I begin by converting the data into an edgelist format. I select the club from and club to columns which are our nodes in this case. The relationship that I’m interested in is where from and where to has the majority of transfers taken place across various places. After selecting the from and to columns, I select the players, the transfer fee and the season in which the transfer took place. I also remove loan and free transfer as they don’t seem relevant at the moment.

#load data from CSV
transfers <- read_csv("/Users/isha/Desktop/GitHub/transfernetworks.csv")
#select relevant data for edgelist format
data <- transfers %>% 
  select(club_from, club_to, name,fee) %>% 
  filter(fee != "free transfer") %>% 
  filter(fee != "loan transfer") %>% 
  filter (fee != 0)
data$fee <- as.numeric(data$fee) 
data <- data %>% 
filter(fee != is.na(fee))
any(is.na(data$fee))
[1] FALSE
head(data)
# A tibble: 6 × 4
  club_from         club_to             name                 fee
  <chr>             <chr>               <chr>              <dbl>
1 Aston Villa       Manchester City     Jack Grealish  117000000
2 Inter Milan       Chelsea FC          Romelu Lukaku  113000000
3 Borussia Dortmund Manchester United   Jadon Sancho    85000000
4 ACF Fiorentina    Juventus FC         Dušan Vlahović  81000000
5 Real Madrid       Manchester United   Raphaël Varane  40000000
6 Inter Milan       Paris Saint-Germain Achraf Hakimi   66000000
kable(head(data))
club_from club_to name fee
Aston Villa Manchester City Jack Grealish 1.17e+08
Inter Milan Chelsea FC Romelu Lukaku 1.13e+08
Borussia Dortmund Manchester United Jadon Sancho 8.50e+07
ACF Fiorentina Juventus FC Dušan Vlahović 8.10e+07
Real Madrid Manchester United Raphaël Varane 4.00e+07
Inter Milan Paris Saint-Germain Achraf Hakimi 6.60e+07

Create Edgelist (Part 1)

The edgelist I created consists of 867 nodes and 4386 ties. The edge attributes are the players who are being transferred. The weight of the edges is the fee for which they were transferred. The edgelist is directed because the players are transferring from one team to another.

#convert data into matrix format
data.mat <- as.matrix(data)
#create igraph object from data
ig <- graph_from_data_frame(data.mat, directed = TRUE)
# add edge attribute weight i.e transfer fee
ig <- set_edge_attr(ig, "weight", value = na.omit(data$fee))
# add edge attribute season of transfer
#ig<-  set_edge_attr(ig, "season", value = na.omit(data$season))
# add node attribute i.e league the club belongs to
#set_vertex_attr(ig,"league" ,value = node_attr)
#delete edge attribute that was automatically being created in addition to weight
ig <- delete_edge_attr(ig, "fee")
#check summary of the igraph object
summary(ig)
IGRAPH eb768a3 DNW- 867 4386 -- 
+ attr: name (v/c), name (e/c), weight (e/n)
#convert ig network into intergraph object to coerce with statnet
network <- intergraph::asNetwork(ig) 
network
 Network attributes:
  vertices = 867 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 4386 
    missing edges= 0 
    non-missing edges= 4386 

 Vertex attribute names: 
    vertex.names 

 Edge attribute names not shown 
#plot intergraph object
plot(network)

#plot igraph object 
plot(ig)

Exploring Network Structures

#count the number of nodes
vcount(ig)
[1] 867
# count the number of edges
ecount(ig)
[1] 4386

The network consists of 867 nodes and 4386 edges. This means that there are 867 football clubs in our network and we are going to be explore the transfer of players that has taken place in the last four years i.e 8 transfer windows.

# look at the dyad census
dyad_census(ig)
$mut
[1] 201

$asym
[1] 3468

$null
[1] 371742

There are 201 mutual, 3468 asymmetrical and 371742 dyads

 [1] 105140687   2584373    450247     12854     17067     27123
 [7]      4389      3716      1486       339       380       182
[13]       194       306       134        28
triangles(ig)
+ 8007/867 vertices, named, from eb768a3:
   [1] SL Benfica                      
   [2] Borussia Dortmund               
   [3] Eintracht Frankfurt             
   [4] SL Benfica                      
   [5] Borussia Dortmund               
   [6] PSV Eindhoven                   
   [7] SL Benfica                      
   [8] Borussia Dortmund               
   [9] CA Boca Juniors                 
  [10] SL Benfica                      
+ ... omitted several vertices
[1] TRUE
[1] TRUE
[1] FALSE

The results suggest that the network is directed, i.e players are transferring from one club to another.

The network is weighted. The weight of the transfer is the fee for which the transfer was made.

The network is not bipartite which mean that they transfers are not in sets- there is a flow of transfers of players in the network.

Transitivity

#get global clustering cofficient: igraph
transitivity(ig, type="global")
[1] 0.1088854
#get average local clustering coefficient: igraph
transitivity(ig, type="average")
[1] 0.1058294

The global transitivity of the graph is 0.1088854 which is the ratio of triangles connected to triangles

the average transitivity 0.1058294 is the transitivity of the local triad clusters, i.e. the ratio of local triangles to all connected triangles.

Path Lengths

The average path length in the weighted network is 5689620.

[1] 5689620

The shortest_paths function enables us to look at the shortest parts between two nodes. Let’s explore some shortest between football clubs of various leagues.

shortest_paths(ig,"Chelsea FC", "Liverpool FC")$vpath[[1]]
+ 6/867 vertices, named, from eb768a3:
[1] Chelsea FC        Spartak Moscow    SC Freiburg      
[4] 1.FC Union Berlin FC Schalke 04     Liverpool FC     
shortest_paths(ig, "FC Porto", "Juventus FC")$vpath[[1]]
+ 6/867 vertices, named, from eb768a3:
[1] FC Porto       AS Roma        ACF Fiorentina Hellas Verona 
[5] SS Lazio       Juventus FC   
shortest_paths(ig, "Bayern Munich", "Aston Villa")$vpath[[1]]
+ 9/867 vertices, named, from eb768a3:
[1] Bayern Munich       TSG 1899 Hoffenheim VfB Stuttgart      
[4] SC Braga            Olympiacos Piraeus  Red Star Belgrade  
[7] UD Las Palmas       LOSC Lille          Aston Villa        
distances(ig,"Chelsea FC", "Real Madrid")
           Real Madrid
Chelsea FC       8e+05
distances(ig, "Bayern Munich", "Chelsea FC")
              Chelsea FC
Bayern Munich    1050000
[1] 5689620

The distance between Chelsea and FC Barcelona is 1.6 nodes.

Component Structure

names(igraph::components(ig))
[1] "membership" "csize"      "no"        
igraph::components(ig)$no
[1] 20
igraph::components(ig)$csize
 [1] 829   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[17]   2   2   2   2
#igraph::components(ig)$membership

There are 20 components in this network. The largest component consists of 829 nodes and the rest nineteen components comprise of two nodes each.

Network Visualizations - an attempt

plot(ig, vertex.label.color = "black")

m1 <- layout_nicely(ig)
w1 <- E(ig)$names
plot(ig, 
        vertex.label.color = "black", 
        edge.color = 'black',
        edge.width = w1,
        layout = m1)

library(ggraph)
ggraph(ig, layout = "with_kk" )+
  geom_edge_link(aes(alpha = weight))+
  geom_node_point(aes(size = strength(ig)))

library(visNetwork)
data_2 <- toVisNetworkData(ig)

visNetwork(nodes = data_2$nodes, edges = data_2$edges, width = 300, height = 300) %>% 
visIgraphLayout(layout = "layout_with_kk") %>% 
visOptions(highlightNearest = TRUE) %>% 
visOptions(nodesIdSelection = TRUE) %>% 
visOptions(selectedBy = "group")

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Kumar (2022, May 11). Data Analytics and Computational Social Science: Exploring European Football Transfer Networks. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsisha-mahajan12githubiofootballtransfersocialnetworkposts2022-05-08-exploring-european-football-transfer-networks/

BibTeX citation

@misc{kumar2022exploring,
  author = {Kumar, Isha Akshita Mahajan, Ankit},
  title = {Data Analytics and Computational Social Science: Exploring European Football Transfer Networks},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsisha-mahajan12githubiofootballtransfersocialnetworkposts2022-05-08-exploring-european-football-transfer-networks/},
  year = {2022}
}