DACSS 697E Assignment 5

networks homework grateful network

Assignment 5 for DACSS 697E course ‘Social and Political Network Analysis’: “Brokerage and Betweenness”

Kristina Becvar https://www.kristinabecvar.com
03-26-2022

Network Details

I am continuing to use the Grateful Dead song writing data set that I used in previous assignments to examine co-writing links and centrality.

The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled. One aspect of the Grateful Dead song data is that the connections between co-writers is weighted, with the weights representing the number of time each song was played live.

There are 26 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 26 nodes in the dataset.

There are a total of 183 (updated and still under review!) unique songs played, and the varies combinations of co-writing combinations are now represented in a binary affiliation matrix.

Loading the dataset and creating the network to begin this assignment:

Show code
gd_vertices <- read.csv("gd_nodes.csv", header=T, stringsAsFactors=F)
gd_affiliation <- read.csv("gd_affiliation_matrix.csv", row.names = 1, header = TRUE, check.names = FALSE)
gd_matrix <- as.matrix(gd_affiliation)
gd_projection <- gd_matrix%*%t(gd_matrix)

#Create Igraph Object

gd_network_ig <- graph.adjacency(gd_projection,mode="undirected") #igraph object

This is a non-directed, unweighted igraph object. It has two components; one large component with one isolate.

#Inspect New Object

igraph::vertex_attr_names(gd_network_ig)
[1] "name"
igraph::edge_attr_names(gd_network_ig)
character(0)
head(V(gd_network_ig)$name)
[1] "Eric Andersen"  "John Barlow"    "Bob Bralove"   
[4] "Andrew Charles" "John Dawson"    "Willie Dixon"  
is_directed(gd_network_ig)
[1] FALSE
is_weighted(gd_network_ig)
[1] FALSE
is_bipartite(gd_network_ig)
[1] FALSE
igraph::dyad.census(gd_network_ig)
$mut
[1] 738

$asym
[1] 0

$null
[1] -413
igraph::triad.census(gd_network_ig)
 [1] 1788    0  488    0    0    0    0    0    0    0  237    0    0
[14]    0    0   87

Centrality Scores

To examine the centrality and power scores of the nodes, I’m creating a data frame with the centrality degree, normalized centrality, Bonacich power, Eigenvector centrality scores and the breakdown of reflected and derived centrality scores.

To calculate the reflected and derived centrality scores, I first run some operations on the adjacency matrix and keep in mind that these two scores make up the entire calculation of the Eigenvector centrality score.

Show code
gd_adjacency <- as.matrix(as_adjacency_matrix(gd_network_ig))
gd_adjacency_2 <- gd_adjacency %*% gd_adjacency

#calculate portion of reflected centrality
gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)

#calculate derived centrality
gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)

centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
                        name=V(gd_network_ig)$name,
                        degree_all=igraph::degree(gd_network_ig),
                        degree_norm=igraph::degree(gd_network_ig,normalized=T),
                        BC_power=power_centrality(gd_network_ig),
                        EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
                        reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
                        derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector)

row.names(centrality_gd)<-NULL
centrality_gd%>%
  arrange(desc(degree_all))%>%
  slice(1:5)
  id            name degree_all degree_norm   BC_power    EV_cent
1  7    Jerry Garcia        328       13.12 -0.2551417 0.96094165
2 14   Robert Hunter        313       12.52 -0.1735142 1.00000000
3 25        Bob Weir        213        8.52 -0.5430836 0.18725953
4 17       Phil Lesh        149        5.96 -0.1806656 0.15133380
5 15 Bill Kreutzmann        100        4.00 -0.7011548 0.09223647
   reflect_EV  derive_EV
1 0.332625452 0.62831620
2 0.371327549 0.62867245
3 0.040709421 0.14655011
4 0.022140576 0.12919322
5 0.009710558 0.08252591

Right away, I see the highest degree are clearly Jerry Garcia and Robert Hunter, which makes sense given that they were a songwriting pair that were prolific in creating the Grateful Dead original songbook. Bob Weir also contributed quite a bit, though the songs he wrote with his writing partner John Barlow numbered many less than those that he wrote as part of the whole band, judging by Barlow’s absence in the top counts.

The original lineup of Jerry Garcia, Bob Weir, Phil Lesh, Bill Kreutzmann, and Pigpen as well as Robert Hunter’s presence in the formative years of the band’s most collaborative era, means that this degree ranking makes sense intuitively.

Eigenvector Centrality

I am also interested in the Eigenvector centrality scores - Both the top as well as the lowest value scores.

Show code
centrality_gd%>%
  arrange(desc(EV_cent))%>%
  slice(1:5)
  id            name degree_all degree_norm   BC_power    EV_cent
1 14   Robert Hunter        313       12.52 -0.1735142 1.00000000
2  7    Jerry Garcia        328       13.12 -0.2551417 0.96094165
3 25        Bob Weir        213        8.52 -0.5430836 0.18725953
4 17       Phil Lesh        149        5.96 -0.1806656 0.15133380
5 15 Bill Kreutzmann        100        4.00 -0.7011548 0.09223647
   reflect_EV  derive_EV
1 0.371327549 0.62867245
2 0.332625452 0.62831620
3 0.040709421 0.14655011
4 0.022140576 0.12919322
5 0.009710558 0.08252591

Robert Hunter having the top Eigenvector centrality score is not a shock - he has long held the unofficial title of band member and as the person behind the songwriting magic of the Grateful Dead. His primary songwriting partner was Jerry Garcia, but he also wrote songs with the early, full band and later with almost all of the individual members of the band.

It is a little surprising, though, that the Eigenvector scores fall off so quickly after Robert Hunter and Jerry Garcia.

Closeness

The closeness centrality of a node is defined as the sum of the geodesic distances between that node and all other nodes in a network. This works; however, I get a warning that closeness centrality is not well-defined for disconnected graphs.

#calculate closeness centrality: igraph
igraph::closeness(gd_network_ig)
  Eric Andersen     John Barlow     Bob Bralove  Andrew Charles 
    0.012500000     0.012987013     0.013333333     0.012048193 
    John Dawson    Willie Dixon    Jerry Garcia  Donna Godchaux 
    0.012048193     0.012658228     0.015625000     0.014285714 
 Keith Godchaux   Gerrit Graham     Frank Guida     Mickey Hart 
    0.014492754     0.012500000     0.011363636     0.014492754 
  Bruce Hornsby   Robert Hunter Bill Kreutzmann       Ned Lagin 
    0.001538462     0.015873016     0.015384615     0.012048193 
      Phil Lesh      Peter Monk   Brent Mydland     Dave Parker 
    0.016666667     0.012048193     0.013698630     0.014492754 
Robert Petersen          Pigpen     Joe Royster   Rob Wasserman 
    0.012345679     0.015151515     0.011363636     0.013333333 
       Bob Weir   Vince Welnick 
    0.017543860     0.013157895 

In addition to node-level centrality scores, I also want to calculate the network level centralization index for closeness centrality measures. Again, I get a warning that closeness centrality is not well-defined for disconnected graphs.

#calculate closeness centralization index: igraph
centr_clo(gd_network_ig)$centralization
[1] 0.2310331

Betweenness

Betweenness represents the number of geodesics on which a node sits.

#calculate betweenness centrality: igraph
igraph::betweenness(gd_network_ig, directed=FALSE)
  Eric Andersen     John Barlow     Bob Bralove  Andrew Charles 
   0.000000e+00    6.708464e-01    1.216013e-01    0.000000e+00 
    John Dawson    Willie Dixon    Jerry Garcia  Donna Godchaux 
   0.000000e+00    0.000000e+00    1.658436e+01    0.000000e+00 
 Keith Godchaux   Gerrit Graham     Frank Guida     Mickey Hart 
   9.345794e-03    0.000000e+00    0.000000e+00    3.738318e-02 
  Bruce Hornsby   Robert Hunter Bill Kreutzmann       Ned Lagin 
   0.000000e+00    2.410682e+01    3.132042e+00    0.000000e+00 
      Phil Lesh      Peter Monk   Brent Mydland     Dave Parker 
   9.039664e+01    0.000000e+00    1.306941e+00    0.000000e+00 
Robert Petersen          Pigpen     Joe Royster   Rob Wasserman 
   0.000000e+00    4.402857e+01    0.000000e+00    9.459707e-01 
       Bob Weir   Vince Welnick 
   1.216595e+02    0.000000e+00 

Top Betweenness

Now I want to add the closeness and betweenness to my centrality data frame and first, sort by and take a look at the nodes with the highest betweenness:

Show code
centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
                        name=V(gd_network_ig)$name,
                        degree_all=igraph::degree(gd_network_ig),
                        degree_norm=igraph::degree(gd_network_ig,normalized=T),
                        BC_power=power_centrality(gd_network_ig),
                        EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
                        reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
                        derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector,
                        close=closeness(gd_network_ig),
                        between=betweenness(gd_network_ig, directed=FALSE))
                        

row.names(centrality_gd)<-NULL
centrality_gd%>%
  arrange(desc(between))%>%
  slice(1:5)
  id          name degree_all degree_norm   BC_power    EV_cent
1 25      Bob Weir        213        8.52 -0.5430836 0.18725953
2 17     Phil Lesh        149        5.96 -0.1806656 0.15133380
3 22        Pigpen         95        3.80 -0.5257366 0.07985305
4 14 Robert Hunter        313       12.52 -0.1735142 1.00000000
5  7  Jerry Garcia        328       13.12 -0.2551417 0.96094165
   reflect_EV  derive_EV      close   between
1 0.040709421 0.14655011 0.01754386 121.65948
2 0.022140576 0.12919322 0.01666667  90.39664
3 0.009031643 0.07082141 0.01515152  44.02857
4 0.371327549 0.62867245 0.01587302  24.10682
5 0.332625452 0.62831620 0.01562500  16.58436

The most immediate observations I have is that the highest degree node (Jerry Garcia) is not the node with the highest scoring betweenness. That goes to Bob Weir, who is still a relatively high degree node, but significantly lower than Jerry Garcia given that his betweenness score is so much higher (~121 compared to Garcia’s ~16).

I can make a guess that the two highest degree nodes, Jerry Garcia and Robert Hunter, having relatively low betweenness scores can be linked to the fact that the two wrote mostly together. Although the pair wrote the most songs in the originals catalog, Bob Weir wrote many songs with a variety of other songwrriters; giving him a higher level of betweenness.

Similarly, Phil Lesh and Pigpen, original band members who wrote relatively fewer songs, contributed to more songs that were written by the entire band, giving them more exposure to connections on the songs that they did write.

Top Closeness

Now a look at the top closeness scores:

Show code
centrality_gd%>%
  arrange(desc(close))%>%
  slice(1:5)
  id            name degree_all degree_norm   BC_power    EV_cent
1 25        Bob Weir        213        8.52 -0.5430836 0.18725953
2 17       Phil Lesh        149        5.96 -0.1806656 0.15133380
3 14   Robert Hunter        313       12.52 -0.1735142 1.00000000
4  7    Jerry Garcia        328       13.12 -0.2551417 0.96094165
5 15 Bill Kreutzmann        100        4.00 -0.7011548 0.09223647
   reflect_EV  derive_EV      close    between
1 0.040709421 0.14655011 0.01754386 121.659478
2 0.022140576 0.12919322 0.01666667  90.396640
3 0.371327549 0.62867245 0.01587302  24.106816
4 0.332625452 0.62831620 0.01562500  16.584364
5 0.009710558 0.08252591 0.01538462   3.132042

This evaluation is more difficult as the range is made up of much less clearly defined scores.

Network Constraint (Burt)

Constraint is a measure of the redundancy of a node’s connections. It is bound between 0 and 1, with 0 being a complete lack, and 1 being complete redundancy.

Show code
constraint(gd_network_ig)
  Eric Andersen     John Barlow     Bob Bralove  Andrew Charles 
      1.0000000       0.6706222       0.4989170       1.0000000 
    John Dawson    Willie Dixon    Jerry Garcia  Donna Godchaux 
      1.2945238       0.7040590       0.5061908       0.4514219 
 Keith Godchaux   Gerrit Graham     Frank Guida     Mickey Hart 
      0.5143887       1.0000000       0.8224000       0.5294014 
  Bruce Hornsby   Robert Hunter Bill Kreutzmann       Ned Lagin 
      0.0000000       0.6332636       0.5159787       1.0000000 
      Phil Lesh      Peter Monk   Brent Mydland     Dave Parker 
      0.4521996       1.0000000       0.9325133       0.5591083 
Robert Petersen          Pigpen     Joe Royster   Rob Wasserman 
      0.7134697       0.5404552       0.8224000       0.4756234 
       Bob Weir   Vince Welnick 
      0.3367355       0.5216319 

Finally, I’m going to save all of this data into a .csv file for future analysis.

Show code
centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
                        name=V(gd_network_ig)$name,
                        degree_all=igraph::degree(gd_network_ig),
                        degree_norm=igraph::degree(gd_network_ig,normalized=T),
                        BC_power=power_centrality(gd_network_ig),
                        EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
                        reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
                        derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector,
                        close=closeness(gd_network_ig),
                        between=betweenness(gd_network_ig, directed=FALSE),
                        burt=constraint(gd_network_ig))

write.csv(centrality_gd, file = "centrality_df.csv")

Citations:

Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm

ASCAP. 18 March 2022.

Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/

Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/

This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Becvar (2022, April 15). Data Analytics and Computational Social Science: DACSS 697E Assignment 5. Retrieved from https://www.kristinabecvar.com/posts/2022-04-04-dacss-697e-assignment-5/

BibTeX citation

@misc{brokerage-betweenness,
  author = {Becvar, Kristina},
  title = {Data Analytics and Computational Social Science: DACSS 697E Assignment 5},
  url = {https://www.kristinabecvar.com/posts/2022-04-04-dacss-697e-assignment-5/},
  year = {2022}
}