Assignment 5 for DACSS 697E course ‘Social and Political Network Analysis’: “Brokerage and Betweenness”
I am continuing to use the Grateful Dead song writing data set that I used in previous assignments to examine co-writing links and centrality.
The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled. One aspect of the Grateful Dead song data is that the connections between co-writers is weighted, with the weights representing the number of time each song was played live.
There are 26 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 26 nodes in the dataset.
There are a total of 183 (updated and still under review!) unique songs played, and the varies combinations of co-writing combinations are now represented in a binary affiliation matrix.
Loading the dataset and creating the network to begin this assignment:
gd_vertices <- read.csv("gd_nodes.csv", header=T, stringsAsFactors=F)
gd_affiliation <- read.csv("gd_affiliation_matrix.csv", row.names = 1, header = TRUE, check.names = FALSE)
gd_matrix <- as.matrix(gd_affiliation)
gd_projection <- gd_matrix%*%t(gd_matrix)
#Create Igraph Object
gd_network_ig <- graph.adjacency(gd_projection,mode="undirected") #igraph object
This is a non-directed, unweighted igraph object. It has two components; one large component with one isolate.
#Inspect New Object
igraph::vertex_attr_names(gd_network_ig)
[1] "name"
igraph::edge_attr_names(gd_network_ig)
character(0)
head(V(gd_network_ig)$name)
[1] "Eric Andersen" "John Barlow" "Bob Bralove"
[4] "Andrew Charles" "John Dawson" "Willie Dixon"
is_directed(gd_network_ig)
[1] FALSE
is_weighted(gd_network_ig)
[1] FALSE
is_bipartite(gd_network_ig)
[1] FALSE
igraph::dyad.census(gd_network_ig)
$mut
[1] 738
$asym
[1] 0
$null
[1] -413
igraph::triad.census(gd_network_ig)
[1] 1788 0 488 0 0 0 0 0 0 0 237 0 0
[14] 0 0 87
To examine the centrality and power scores of the nodes, I’m creating a data frame with the centrality degree, normalized centrality, Bonacich power, Eigenvector centrality scores and the breakdown of reflected and derived centrality scores.
To calculate the reflected and derived centrality scores, I first run some operations on the adjacency matrix and keep in mind that these two scores make up the entire calculation of the Eigenvector centrality score.
gd_adjacency <- as.matrix(as_adjacency_matrix(gd_network_ig))
gd_adjacency_2 <- gd_adjacency %*% gd_adjacency
#calculate portion of reflected centrality
gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)
#calculate derived centrality
gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)
centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
name=V(gd_network_ig)$name,
degree_all=igraph::degree(gd_network_ig),
degree_norm=igraph::degree(gd_network_ig,normalized=T),
BC_power=power_centrality(gd_network_ig),
EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector)
row.names(centrality_gd)<-NULL
centrality_gd%>%
arrange(desc(degree_all))%>%
slice(1:5)
id name degree_all degree_norm BC_power EV_cent
1 7 Jerry Garcia 328 13.12 -0.2551417 0.96094165
2 14 Robert Hunter 313 12.52 -0.1735142 1.00000000
3 25 Bob Weir 213 8.52 -0.5430836 0.18725953
4 17 Phil Lesh 149 5.96 -0.1806656 0.15133380
5 15 Bill Kreutzmann 100 4.00 -0.7011548 0.09223647
reflect_EV derive_EV
1 0.332625452 0.62831620
2 0.371327549 0.62867245
3 0.040709421 0.14655011
4 0.022140576 0.12919322
5 0.009710558 0.08252591
Right away, I see the highest degree are clearly Jerry Garcia and Robert Hunter, which makes sense given that they were a songwriting pair that were prolific in creating the Grateful Dead original songbook. Bob Weir also contributed quite a bit, though the songs he wrote with his writing partner John Barlow numbered many less than those that he wrote as part of the whole band, judging by Barlow’s absence in the top counts.
The original lineup of Jerry Garcia, Bob Weir, Phil Lesh, Bill Kreutzmann, and Pigpen as well as Robert Hunter’s presence in the formative years of the band’s most collaborative era, means that this degree ranking makes sense intuitively.
I am also interested in the Eigenvector centrality scores - Both the top as well as the lowest value scores.
centrality_gd%>%
arrange(desc(EV_cent))%>%
slice(1:5)
id name degree_all degree_norm BC_power EV_cent
1 14 Robert Hunter 313 12.52 -0.1735142 1.00000000
2 7 Jerry Garcia 328 13.12 -0.2551417 0.96094165
3 25 Bob Weir 213 8.52 -0.5430836 0.18725953
4 17 Phil Lesh 149 5.96 -0.1806656 0.15133380
5 15 Bill Kreutzmann 100 4.00 -0.7011548 0.09223647
reflect_EV derive_EV
1 0.371327549 0.62867245
2 0.332625452 0.62831620
3 0.040709421 0.14655011
4 0.022140576 0.12919322
5 0.009710558 0.08252591
Robert Hunter having the top Eigenvector centrality score is not a shock - he has long held the unofficial title of band member and as the person behind the songwriting magic of the Grateful Dead. His primary songwriting partner was Jerry Garcia, but he also wrote songs with the early, full band and later with almost all of the individual members of the band.
It is a little surprising, though, that the Eigenvector scores fall off so quickly after Robert Hunter and Jerry Garcia.
The closeness centrality of a node is defined as the sum of the geodesic distances between that node and all other nodes in a network. This works; however, I get a warning that closeness centrality is not well-defined for disconnected graphs.
#calculate closeness centrality: igraph
igraph::closeness(gd_network_ig)
Eric Andersen John Barlow Bob Bralove Andrew Charles
0.012500000 0.012987013 0.013333333 0.012048193
John Dawson Willie Dixon Jerry Garcia Donna Godchaux
0.012048193 0.012658228 0.015625000 0.014285714
Keith Godchaux Gerrit Graham Frank Guida Mickey Hart
0.014492754 0.012500000 0.011363636 0.014492754
Bruce Hornsby Robert Hunter Bill Kreutzmann Ned Lagin
0.001538462 0.015873016 0.015384615 0.012048193
Phil Lesh Peter Monk Brent Mydland Dave Parker
0.016666667 0.012048193 0.013698630 0.014492754
Robert Petersen Pigpen Joe Royster Rob Wasserman
0.012345679 0.015151515 0.011363636 0.013333333
Bob Weir Vince Welnick
0.017543860 0.013157895
In addition to node-level centrality scores, I also want to calculate the network level centralization index for closeness centrality measures. Again, I get a warning that closeness centrality is not well-defined for disconnected graphs.
#calculate closeness centralization index: igraph
centr_clo(gd_network_ig)$centralization
[1] 0.2310331
Betweenness represents the number of geodesics on which a node sits.
#calculate betweenness centrality: igraph
igraph::betweenness(gd_network_ig, directed=FALSE)
Eric Andersen John Barlow Bob Bralove Andrew Charles
0.000000e+00 6.708464e-01 1.216013e-01 0.000000e+00
John Dawson Willie Dixon Jerry Garcia Donna Godchaux
0.000000e+00 0.000000e+00 1.658436e+01 0.000000e+00
Keith Godchaux Gerrit Graham Frank Guida Mickey Hart
9.345794e-03 0.000000e+00 0.000000e+00 3.738318e-02
Bruce Hornsby Robert Hunter Bill Kreutzmann Ned Lagin
0.000000e+00 2.410682e+01 3.132042e+00 0.000000e+00
Phil Lesh Peter Monk Brent Mydland Dave Parker
9.039664e+01 0.000000e+00 1.306941e+00 0.000000e+00
Robert Petersen Pigpen Joe Royster Rob Wasserman
0.000000e+00 4.402857e+01 0.000000e+00 9.459707e-01
Bob Weir Vince Welnick
1.216595e+02 0.000000e+00
Now I want to add the closeness and betweenness to my centrality data frame and first, sort by and take a look at the nodes with the highest betweenness:
centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
name=V(gd_network_ig)$name,
degree_all=igraph::degree(gd_network_ig),
degree_norm=igraph::degree(gd_network_ig,normalized=T),
BC_power=power_centrality(gd_network_ig),
EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector,
close=closeness(gd_network_ig),
between=betweenness(gd_network_ig, directed=FALSE))
row.names(centrality_gd)<-NULL
centrality_gd%>%
arrange(desc(between))%>%
slice(1:5)
id name degree_all degree_norm BC_power EV_cent
1 25 Bob Weir 213 8.52 -0.5430836 0.18725953
2 17 Phil Lesh 149 5.96 -0.1806656 0.15133380
3 22 Pigpen 95 3.80 -0.5257366 0.07985305
4 14 Robert Hunter 313 12.52 -0.1735142 1.00000000
5 7 Jerry Garcia 328 13.12 -0.2551417 0.96094165
reflect_EV derive_EV close between
1 0.040709421 0.14655011 0.01754386 121.65948
2 0.022140576 0.12919322 0.01666667 90.39664
3 0.009031643 0.07082141 0.01515152 44.02857
4 0.371327549 0.62867245 0.01587302 24.10682
5 0.332625452 0.62831620 0.01562500 16.58436
The most immediate observations I have is that the highest degree node (Jerry Garcia) is not the node with the highest scoring betweenness. That goes to Bob Weir, who is still a relatively high degree node, but significantly lower than Jerry Garcia given that his betweenness score is so much higher (~121 compared to Garcia’s ~16).
I can make a guess that the two highest degree nodes, Jerry Garcia and Robert Hunter, having relatively low betweenness scores can be linked to the fact that the two wrote mostly together. Although the pair wrote the most songs in the originals catalog, Bob Weir wrote many songs with a variety of other songwrriters; giving him a higher level of betweenness.
Similarly, Phil Lesh and Pigpen, original band members who wrote relatively fewer songs, contributed to more songs that were written by the entire band, giving them more exposure to connections on the songs that they did write.
Now a look at the top closeness scores:
centrality_gd%>%
arrange(desc(close))%>%
slice(1:5)
id name degree_all degree_norm BC_power EV_cent
1 25 Bob Weir 213 8.52 -0.5430836 0.18725953
2 17 Phil Lesh 149 5.96 -0.1806656 0.15133380
3 14 Robert Hunter 313 12.52 -0.1735142 1.00000000
4 7 Jerry Garcia 328 13.12 -0.2551417 0.96094165
5 15 Bill Kreutzmann 100 4.00 -0.7011548 0.09223647
reflect_EV derive_EV close between
1 0.040709421 0.14655011 0.01754386 121.659478
2 0.022140576 0.12919322 0.01666667 90.396640
3 0.371327549 0.62867245 0.01587302 24.106816
4 0.332625452 0.62831620 0.01562500 16.584364
5 0.009710558 0.08252591 0.01538462 3.132042
This evaluation is more difficult as the range is made up of much less clearly defined scores.
Constraint is a measure of the redundancy of a node’s connections. It is bound between 0 and 1, with 0 being a complete lack, and 1 being complete redundancy.
constraint(gd_network_ig)
Eric Andersen John Barlow Bob Bralove Andrew Charles
1.0000000 0.6706222 0.4989170 1.0000000
John Dawson Willie Dixon Jerry Garcia Donna Godchaux
1.2945238 0.7040590 0.5061908 0.4514219
Keith Godchaux Gerrit Graham Frank Guida Mickey Hart
0.5143887 1.0000000 0.8224000 0.5294014
Bruce Hornsby Robert Hunter Bill Kreutzmann Ned Lagin
0.0000000 0.6332636 0.5159787 1.0000000
Phil Lesh Peter Monk Brent Mydland Dave Parker
0.4521996 1.0000000 0.9325133 0.5591083
Robert Petersen Pigpen Joe Royster Rob Wasserman
0.7134697 0.5404552 0.8224000 0.4756234
Bob Weir Vince Welnick
0.3367355 0.5216319
Finally, I’m going to save all of this data into a .csv file for future analysis.
centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
name=V(gd_network_ig)$name,
degree_all=igraph::degree(gd_network_ig),
degree_norm=igraph::degree(gd_network_ig,normalized=T),
BC_power=power_centrality(gd_network_ig),
EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector,
close=closeness(gd_network_ig),
between=betweenness(gd_network_ig, directed=FALSE),
burt=constraint(gd_network_ig))
write.csv(centrality_gd, file = "centrality_df.csv")
Citations:
Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm
ASCAP. 18 March 2022.
Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/
Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/
This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Becvar (2022, April 15). Data Analytics and Computational Social Science: DACSS 697E Assignment 5. Retrieved from https://www.kristinabecvar.com/posts/2022-04-04-dacss-697e-assignment-5/
BibTeX citation
@misc{brokerage-betweenness, author = {Becvar, Kristina}, title = {Data Analytics and Computational Social Science: DACSS 697E Assignment 5}, url = {https://www.kristinabecvar.com/posts/2022-04-04-dacss-697e-assignment-5/}, year = {2022} }