Claire Battaglia
challenge3
degree
density
Degree and density of a network
Author

Claire Battaglia

Published

March 22, 2023

Describe the network data

This week I’ll be creating a network of murders in the movie the Godfather.

Code
# read in data
murders <- read_csv("Godfather_murders.csv", show_col_types = FALSE)
New names:
• `` -> `...4`
Code
# create to, from objects
from <- murders[, 1]
to <- murders[, 2]

# create network object - igraph
murders.ig <- graph_from_data_frame(murders, directed = TRUE)

print(murders.ig)
IGRAPH 3263af1 DN-- 30 21 -- 
+ attr: name (v/c), family from (e/c), ...4 (e/l)
+ edges from 3263af1 (vertex names):
 [1] Rocco Lampone    ->Khartoum           
 [2] Tattaglia unknown->Luca Brasi         
 [3] Rocco Lampone    ->Paulie Gatto       
 [4] Corleone unknown ->Bruno Tattaglia    
 [5] Michael Corleone ->Virgil Sollozzo    
 [6] Michael Corleone ->Marc McCluskey     
 [7] Barzini unknown  ->Sonny Corleone     
 [8] Barzini unknown  ->guard 1            
+ ... omitted several edges
Code
# create network object - tidygraph
murders_tidy <- as_tbl_graph(murders.ig)

print(murders_tidy)
# A tbl_graph: 30 nodes and 21 edges
#
# A rooted forest with 9 trees
#
# Node Data: 30 × 1 (active)
  name             
  <chr>            
1 Rocco Lampone    
2 Tattaglia unknown
3 Corleone unknown 
4 Michael Corleone 
5 Barzini unknown  
6 Fabrizio         
# … with 24 more rows
#
# Edge Data: 21 × 4
   from    to `family from` ...4 
  <int> <int> <chr>         <lgl>
1     1    10 Corleone      NA   
2     2    11 Tattaglia     NA   
3     1    12 Corleone      NA   
# … with 18 more rows

I’ve created a network object with both igraph and tidygraph, just to compare the two.

The network is:

  • directed (one person commits the murder, the other is murdered)
  • named
  • unweighted
  • not bipartite

It has 30 nodes and 21 edges. The network is not connected.

Code
# get number of components
igraph::components(murders.ig)$no
[1] 9
Code
# get size of each component
igraph::components(murders.ig)$csize
[1] 5 2 3 3 4 2 4 3 4
Code
# create plot
ggraph(murders_tidy, layout = "auto") + 
  geom_node_point() +
  geom_edge_diagonal() + 
  labs(title = "Murders in the Godfather, Part 1") +
  theme_graph(foreground = "#c6a25a")
Using "tree" as default layout
Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

Exploring degree

Code
# create df of degrees
murders_nodes <- data.frame(name = V(murders.ig)$name, degree = igraph::degree(murders.ig)) %>%
  mutate(indegree = igraph::degree(murders.ig, mode = "in", loops = FALSE),
         outdegree = igraph::degree(murders.ig, mode = "out", loops = FALSE))

murders_nodes
                                                 name degree indegree outdegree
Rocco Lampone                           Rocco Lampone      4        0         4
Tattaglia unknown                   Tattaglia unknown      1        0         1
Corleone unknown                     Corleone unknown      2        0         2
Michael Corleone                     Michael Corleone      2        0         2
Barzini unknown                       Barzini unknown      3        0         3
Fabrizio                                     Fabrizio      1        0         1
Clemenza                                     Clemenza      3        0         3
Willie Cicci                             Willie Cicci      2        0         2
Al Neri                                       Al Neri      3        0         3
Khartoum                                     Khartoum      1        1         0
Luca Brasi                                 Luca Brasi      1        1         0
Paulie Gatto                             Paulie Gatto      1        1         0
Bruno Tattaglia                       Bruno Tattaglia      1        1         0
Virgil Sollozzo                       Virgil Sollozzo      1        1         0
Marc McCluskey                         Marc McCluskey      1        1         0
Sonny Corleone                         Sonny Corleone      1        1         0
guard 1                                       guard 1      1        1         0
guard 2                                       guard 2      1        1         0
Apollonia Vitelli                   Apollonia Vitelli      1        1         0
Victor Stracci                         Victor Stracci      1        1         0
Stracci's Bodyguard               Stracci's Bodyguard      1        1         0
Moe Greene                                 Moe Greene      1        1         0
Carmine Cuneo                           Carmine Cuneo      1        1         0
Phillip Tattaglia                   Phillip Tattaglia      1        1         0
prostitute                                 prostitute      1        1         0
Emilio Barzini's Bodyguard Emilio Barzini's Bodyguard      1        1         0
Emilio Barzini's Driver       Emilio Barzini's Driver      1        1         0
Emilio Barzini                         Emilio Barzini      1        1         0
Salvatore Tessio                     Salvatore Tessio      1        1         0
Carlo Rizzi                               Carlo Rizzi      1        1         0

For this network, any node with an out-degree of anything other than zero has committed a murder while any node with an in-degree of anything other than zero was ultimately murdered. Logically we know that no node should have an in-degree of greater than zero, as one can only be murdered once.

Code
# get summary
summary(murders_nodes)
     name               degree       indegree     outdegree  
 Length:30          Min.   :1.0   Min.   :0.0   Min.   :0.0  
 Class :character   1st Qu.:1.0   1st Qu.:0.0   1st Qu.:0.0  
 Mode  :character   Median :1.0   Median :1.0   Median :0.0  
                    Mean   :1.4   Mean   :0.7   Mean   :0.7  
                    3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.:1.0  
                    Max.   :4.0   Max.   :1.0   Max.   :4.0  

Given the logical constraints of this network, there isn’t much to revealed by the summary statistics. In the out-degree column we can see:

  • The maximum number of murders committed by any one person is 4.
  • The mean number of murders committed by all nodes is .7. This is the mean of all nodes, however, not the mean of the nodes who actually murdered someone. If we look at the out-degree distribution below, we can see that most nodes did not murder anyone and are therefore dragging the mean number of murders down significantly.
Code
# create plot
ggplot(murders_nodes, aes(x = outdegree)) +
  geom_histogram(binwidth = 1, fill = "#c6a25a") +
  labs(title = "Distribution of Murders in the Godfather, Part 1", x = NULL) +
  theme_minimal()

Density

Network density is the proportion of ties present in a network of all possible ties. A complete network has a network density of “1.” [I THINK] that in this particular network a density of “1” would mean that every node both murdered someone and was ultimately murdered.

Possible ties = n(n-1)

Actual ties = (2 * # of mutual ties) + # of asymmetric ties

There are 870 possible ties and 21 actual ties.

Code
# calc density manually - FOR PRACTICE
n <- 30
p_ties <- n * (n-1)
p_ties
[1] 870
Code
mut <- dyad.census(murders.ig)$mut
asym <- dyad.census(murders.ig)$asym
a_ties <- (2 * mut) + asym
a_ties
[1] 21
Code
a_ties/p_ties
[1] 0.02413793
Code
# get density
graph.density(murders.ig)
[1] 0.02413793

Random network

Code
# create random network
random <- erdos.renyi.game(30, 21, type = "gnm", directed = TRUE)

# plot random network
ggraph(random, layout = "auto") +
  geom_node_point() +
  geom_edge_diagonal() +
  labs(title = "Plot of Random Network") +
  theme_graph(foreground = "#c6a25a")
Using "sugiyama" as default layout

The random network is very different from the actual network.