Assignment 4 for DACSS 697E course ‘Social and Political Network Analysis’: “Status & Eigenvector Centrality”
I am continuing to use the Grateful Dead song writing data set that I am using in this series of posts to examine co-writing links and centrality.
The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.
There are 26 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 26 nodes in the dataset.
There are a total of 183 (updated and still under review!) unique songs played, and the varies combinations of co-writing combinations are now represented in a binary affiliation matrix.
I have not weighted this version of the data; I am trying to build it from a binary affiliation matrix first, and hope to later add the number of times a given song was played live as a weight.
Loading the dataset and creating the network to begin this assignment:
Inspecting the first 8 columns of the data structure in the affiliation matrix format:
dim(gd_matrix)
[1] 26 183
gd_matrix[1:10, 1:4]
Alabama Getaway Alice D Millionaire Alligator Althea
Eric Andersen 0 0 0 0
John Barlow 0 0 0 0
Bob Bralove 0 0 0 0
Andrew Charles 0 0 0 0
John Dawson 0 0 0 0
Willie Dixon 0 0 0 0
Jerry Garcia 1 1 0 1
Donna Godchaux 0 0 0 0
Keith Godchaux 0 0 0 0
Gerrit Graham 0 0 0 0
Now I can create the single mode network and examine the bipartite projection. After converting the matrix to a square adjacency matrix, I can look at the full matrix.
I can also call the adjacency matrix count for co-writing incidences between certain songwriters, such as between writing partners Jerry Garcia and Robert Hunter and between John Barlow and Bob Weir.
[1] 26 26
gd_projection[1:10, 1:4]
Eric Andersen John Barlow Bob Bralove Andrew Charles
Eric Andersen 1 0 0 0
John Barlow 0 26 1 0
Bob Bralove 0 1 3 0
Andrew Charles 0 0 0 1
John Dawson 0 0 0 0
Willie Dixon 0 0 0 0
Jerry Garcia 0 0 0 0
Donna Godchaux 0 0 0 0
Keith Godchaux 0 0 0 0
Gerrit Graham 0 0 0 0
gd_projection["Jerry Garcia", "Robert Hunter"]
[1] 78
gd_projection["John Barlow", "Bob Weir"]
[1] 21
Now I will use this adjacency matrix to create both igraph and statnet network objects and take a look at their resulting features. This is a non-directed, unweighted dataset.
#Create Igraph and Statnet Objects
gd_network_ig <- graph.adjacency(gd_projection,mode="undirected") #igraph object
gd_network_stat <- network(gd_projection, directed=F, matrix.type="adjacency") #statnet object
#Inspect New Objects
print(gd_network_stat)
Network attributes:
vertices = 26
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 65
missing edges= 0
non-missing edges= 65
Vertex attribute names:
vertex.names
No edge attributes
igraph::vertex_attr_names(gd_network_ig)
[1] "name"
igraph::edge_attr_names(gd_network_ig)
character(0)
head(V(gd_network_ig)$name)
[1] "Eric Andersen" "John Barlow" "Bob Bralove"
[4] "Andrew Charles" "John Dawson" "Willie Dixon"
is_directed(gd_network_ig)
[1] FALSE
is_weighted(gd_network_ig)
[1] FALSE
is_bipartite(gd_network_ig)
[1] FALSE
Looking at the dyad/triad census info in igraph and statnet:
igraph::dyad.census(gd_network_ig)
$mut
[1] 738
$asym
[1] 0
$null
[1] -413
igraph::triad.census(gd_network_ig)
[1] 1788 0 488 0 0 0 0 0 0 0 237 0 0
[14] 0 0 87
sna::dyad.census(gd_network_stat)
Mut Asym Null
[1,] 65 0 260
sna::triad.census(gd_network_stat)
003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 1451 0 825 0 0 0 0 0 0 0 237 0 0
120C 210 300
[1,] 0 0 87
Knowing this network has 26 vertices, I want to see if the triad census is working correctly by comparing the following data, which I can confirm it is here!
#possible triads in network
26*25*24/6
[1] 2600
sum(igraph::triad.census(gd_network_ig))
[1] 2600
Looking next at the global v. average local transitivity of the network in igraph and confirming global transitivity in statnet and igraph (Statnet and igraph network transitivity = 0.5241, igraph local transitivity = 0.7756)
#network transitivity: statnet
gtrans(gd_network_stat)
[1] 0.5240964
#global clustering cofficient: igraph
transitivity(gd_network_ig, type="global")
[1] 0.5240964
#average local clustering coefficient: igraph
transitivity(gd_network_ig, type="average")
[1] 0.7755587
These transitivity results tells me that the average local network transitivity is significantly higher than the global transitivity, indicating, again from my still naive network knowledge, that the overall network is generally more loose, and that there is a more connected sub-network.
Looking at the geodesic distance tells me that on average, I can confirm that the path length is just over 2, so on average, each node is two “stops” from each other on the geodesic path.
average.path.length(gd_network_ig,directed=F)
[1] 2.01
Getting a look at the components of the network comfirms that there are 2 components in the network, and 25 of the 26 nodes make up the giant component with 1 isolate.
names(igraph::components(gd_network_ig))
[1] "membership" "csize" "no"
igraph::components(gd_network_ig)$no
[1] 2
igraph::components(gd_network_ig)$csize
[1] 25 1
The network density measure: First with just the call “graph.density” and then with adding “loops=TRUE”. In igraph, I know that its’ default output assumes that loops are not included but does not remove them, which wwe had corrected with the addition of “loops=TRUE” per the course tutorials when comparing output to statnet. In this case, the statnet output is far different, so I am not sure what is happening with this aspect of the network.
graph.density(gd_network_ig, loops=TRUE)
[1] 2.102564
network.density(gd_network_stat)
[1] 0.2
The network degree measure: This gives me a clear output showing the degree of each particular node (songwriter). It is not surprising, knowing my subject matter, that Jerry Garcia is the highest degree node in this network as the practical and figurative head of the band. The other band members’ degree measures are not necessarily what I expected, though. I did not anticipate that his songwriting partner, Robert Hunter, would have a lower degree than band members Phil Lesh and Bob Weir. Further, I did not anticipate that the degree measure of band member ‘Pigpen’ would be so high given his early death in the first years of the band’s touring life.
igraph::degree(gd_network_ig)
Eric Andersen John Barlow Bob Bralove Andrew Charles
3 81 14 3
John Dawson Willie Dixon Jerry Garcia Donna Godchaux
4 4 328 12
Keith Godchaux Gerrit Graham Frank Guida Mickey Hart
16 3 4 36
Bruce Hornsby Robert Hunter Bill Kreutzmann Ned Lagin
4 313 100 3
Phil Lesh Peter Monk Brent Mydland Dave Parker
149 3 41 7
Robert Petersen Pigpen Joe Royster Rob Wasserman
13 95 4 10
Bob Weir Vince Welnick
213 13
sna::degree(gd_network_stat)
[1] 2 6 10 2 4 4 20 12 14 2 4 14 0 22 18 2 28 2 8 10 4 16
[23] 4 10 34 8
To look further I will create a dataframe in igraph first, then statnet.
ig_nodes<-data.frame(name=V(gd_network_ig)$name, degree=igraph::degree(gd_network_ig))
head(ig_nodes)
name degree
Eric Andersen Eric Andersen 3
John Barlow John Barlow 81
Bob Bralove Bob Bralove 14
Andrew Charles Andrew Charles 3
John Dawson John Dawson 4
Willie Dixon Willie Dixon 4
stat_nodes<-data.frame(name=gd_network_stat%v%"vertex.names", degree=sna::degree(gd_network_stat))
head(stat_nodes)
name degree
1 Eric Andersen 2
2 John Barlow 6
3 Bob Bralove 10
4 Andrew Charles 2
5 John Dawson 4
6 Willie Dixon 4
The igraph and statnet dataframes give very different results.
A quick look at the summary statistics confirms for me the minimum, maximum, median, and mean node degree data using each package.
summary(ig_nodes)
name degree
Length:26 Min. : 3.00
Class :character 1st Qu.: 4.00
Mode :character Median : 12.50
Mean : 56.77
3rd Qu.: 71.00
Max. :328.00
summary(stat_nodes)
name degree
Length:26 Min. : 0
Class :character 1st Qu.: 4
Mode :character Median : 8
Mean :10
3rd Qu.:14
Max. :34
I’m taking a look at the dataframe of the degree nodes, though since it is not a directed network the in and out degrees are not measured or relevant to our network. But it is still interesting to look at how igraph and statnet handle these datasets differently.
#create a dataframe of the total, in and out-degree of nodes in the stat network
gd_stat_nodes <- data.frame(name=gd_network_stat%v%"vertex.names",
totdegree=sna::degree(gd_network_stat),
indegree=sna::degree(gd_network_stat, cmode="indegree"),
outdegree=sna::degree(gd_network_stat, cmode="outdegree"))
#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(totdegree))%>%slice(1:5)
name totdegree indegree outdegree
1 Bob Weir 34 17 17
2 Phil Lesh 28 14 14
3 Robert Hunter 22 11 11
4 Jerry Garcia 20 10 10
5 Bill Kreutzmann 18 9 9
#create a dataframe of the total, in and out-degree of nodes in the igraph network
gd_ig_nodes<-data.frame(name=V(gd_network_ig)$name,
degree=igraph::degree(gd_network_ig), mode="tot",
degree=igraph::degree(gd_network_ig), mode="in",
degree=igraph::degree(gd_network_ig), mode="out")
#sort the top total degree of nodes in the igraph network
arrange(gd_ig_nodes, desc(degree))%>%slice(1:5)
name degree mode degree.1 mode.1 degree.2
Jerry Garcia Jerry Garcia 328 tot 328 in 328
Robert Hunter Robert Hunter 313 tot 313 in 313
Bob Weir Bob Weir 213 tot 213 in 213
Phil Lesh Phil Lesh 149 tot 149 in 149
Bill Kreutzmann Bill Kreutzmann 100 tot 100 in 100
mode.2
Jerry Garcia out
Robert Hunter out
Bob Weir out
Phil Lesh out
Bill Kreutzmann out
The Eigenvector centrality score for each node can be accessed by calling “vector”, and I can examine the top eigenvector scores in the igraph network:
#Eigenvector centrality, top 10 in igraph network
eigen_ig <- eigen_centrality(gd_network_ig)
eigen_gd_ig <- data.frame(eigen_ig)
arrange(eigen_gd_ig[1], desc(vector))%>%slice(1:10)
vector
Robert Hunter 1.00000000
Jerry Garcia 0.96094165
Bob Weir 0.18725953
Phil Lesh 0.15133380
Bill Kreutzmann 0.09223647
Pigpen 0.07985305
Mickey Hart 0.02523896
John Barlow 0.01773746
Keith Godchaux 0.01382256
Vince Welnick 0.01192303
The Bonacich power centrality score for each node can be accessed first just using defaults, including setting the index to “1”; then, I can “rescale” so that all of the scores sum “1”.
To display my results, I have to run the calculations and save the results as a dataframe to recall, since the command “bonpow()” is the same in igraph and statnet, which is causing trouble in running then knitting this file.
I need to understand more nuance to the Bonacich power measure in order to fully understand what these two measures say about my specific network.
#Compute Bonpow scores
#bp_ig1 <- bonpow(gd_network_ig) #with a default index of "1"
#bonpow_gd_ig1 <- data.frame(bp_ig1)
#write.csv(bonpow_gd_ig1, file = "bonpow_gd_ig1.csv")
#Rescaled so that they sum to "1"
#bp_ig2 <- bonpow(gd_network_ig, rescale = TRUE) #with a default index of "1"
#bonpow_gd_ig2 <- data.frame(bp_ig2)
#write.csv(bonpow_gd_ig2, file = "bonpow_gd_ig2.csv")
X bp_ig1 bp_ig2
1 Andrew Charles 0.08220268 0.01522717
2 Bill Kreutzmann -0.70115475 -0.12988143
3 Bob Bralove -0.22064550 -0.04087222
4 Bob Weir -0.54308358 -0.10060043
5 Brent Mydland 0.52651322 0.09753095
6 Bruce Hornsby 0.00000000 0.00000000
7 Dave Parker -0.89144078 -0.16512988
8 Donna Godchaux 1.23038839 0.22791631
9 Eric Andersen -0.28021530 -0.05190689
10 Frank Guida 3.07056607 0.56878957
11 Gerrit Graham -0.28021530 -0.05190689
12 Jerry Garcia -0.25514168 -0.04726227
13 Joe Royster 3.07056607 0.56878957
14 John Barlow -0.31662818 -0.05865199
15 John Dawson 0.09708065 0.01798315
16 Keith Godchaux 1.17992241 0.21856802
17 Mickey Hart 0.15330194 0.02839755
18 Ned Lagin 0.08220268 0.01522717
19 Peter Monk 0.08220268 0.01522717
20 Phil Lesh -0.18066559 -0.03346637
21 Pigpen -0.52573655 -0.09738708
22 Rob Wasserman -0.41469644 -0.07681809
23 Robert Hunter -0.17351422 -0.03214166
24 Robert Petersen 1.11819222 0.20713317
25 Vince Welnick -0.07953575 -0.01473315
26 Willie Dixon -0.43204347 -0.08003144
Creating a data frame summarizing all of this information and doing basic visualization on a couple of them:
gd_adjacency <- as.matrix(as_adjacency_matrix(gd_network_ig))
gd_adjacency_2 <- gd_adjacency %*% gd_adjacency
#calculate portion of reflected centrality
gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)
#calculate derived centrality
gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)
#create data frame of centrality measures
centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
name=V(gd_network_ig)$name,
degree_all=igraph::degree(gd_network_ig),
BC_power=power_centrality(gd_network_ig),
degree_norm=igraph::degree(gd_network_ig,normalized=T),
EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector)
row.names(centrality_gd)<-NULL
centrality_gd
id name degree_all BC_power degree_norm EV_cent
1 1 Eric Andersen 3 -0.28021530 0.12 6.852805e-04
2 2 John Barlow 81 -0.31662818 3.24 1.773746e-02
3 3 Bob Bralove 14 -0.22064550 0.56 8.992246e-03
4 4 Andrew Charles 3 0.08220268 0.12 5.538095e-04
5 5 John Dawson 4 0.09708065 0.16 7.176110e-03
6 6 Willie Dixon 4 -0.43204347 0.16 7.041156e-04
7 7 Jerry Garcia 328 -0.25514168 13.12 9.609417e-01
8 8 Donna Godchaux 12 1.23038839 0.48 5.313952e-03
9 9 Keith Godchaux 16 1.17992241 0.64 1.382256e-02
10 10 Gerrit Graham 3 -0.28021530 0.12 6.852805e-04
11 11 Frank Guida 4 3.07056607 0.16 2.932974e-04
12 12 Mickey Hart 36 0.15330194 1.44 2.523896e-02
13 13 Bruce Hornsby 4 0.00000000 0.16 2.574501e-17
14 14 Robert Hunter 313 -0.17351422 12.52 1.000000e+00
15 15 Bill Kreutzmann 100 -0.70115475 4.00 9.223647e-02
16 16 Ned Lagin 3 0.08220268 0.12 5.538095e-04
17 17 Phil Lesh 149 -0.18066559 5.96 1.513338e-01
18 18 Peter Monk 3 0.08220268 0.12 5.538095e-04
19 19 Brent Mydland 41 0.52651322 1.64 2.659589e-03
20 20 Dave Parker 7 -0.89144078 0.28 5.385443e-03
21 21 Robert Petersen 13 1.11819222 0.52 2.274921e-03
22 22 Pigpen 95 -0.52573655 3.80 7.985305e-02
23 23 Joe Royster 4 3.07056607 0.16 2.932974e-04
24 24 Rob Wasserman 10 -0.41469644 0.40 5.146870e-03
25 25 Bob Weir 213 -0.54308358 8.52 1.872595e-01
26 26 Vince Welnick 13 -0.07953575 0.52 1.192303e-02
reflect_EV derive_EV
1 8.512801e-06 0.0006767677
2 4.171627e-03 0.0135658315
3 2.393769e-04 0.0087528693
4 9.466828e-06 0.0005443426
5 4.752391e-05 0.0071285863
6 1.242557e-05 0.0006916900
7 3.326255e-01 0.6283162014
8 1.213231e-04 0.0051926286
9 2.487863e-04 0.0135737779
10 8.512801e-06 0.0006767677
11 1.113787e-05 0.0002821595
12 1.390973e-03 0.0238479844
13 1.593739e-17 0.0000000000
14 3.713275e-01 0.6286724511
15 9.710558e-03 0.0825259133
16 9.466828e-06 0.0005443426
17 2.214058e-02 0.1291932241
18 9.466828e-06 0.0005443426
19 6.119022e-04 0.0020476869
20 4.829994e-05 0.0053371431
21 1.438169e-04 0.0021311045
22 9.031643e-03 0.0708214079
23 1.113787e-05 0.0002821595
24 1.077879e-04 0.0050390825
25 4.070942e-02 0.1465501114
26 3.311952e-04 0.0115918330
I can independently look at the correlations between all scores now. Using prompts from this week’s tutorial, it looks that all of the variables except Bonacich power are strongly correlated, so I think I’ll want to begin subsetting my network to get more meaningful interpretations.
names(centrality_gd) #Find the columns we want to run the correlation on
[1] "id" "name" "degree_all" "BC_power"
[5] "degree_norm" "EV_cent" "reflect_EV" "derive_EV"
degree_all BC_power degree_norm EV_cent reflect_EV
degree_all 1.0000000 -0.2782755 1.0000000 0.9131592 0.8729045
BC_power -0.2782755 1.0000000 -0.2782755 -0.1782509 -0.1481903
degree_norm 1.0000000 -0.2782755 1.0000000 0.9131592 0.8729045
EV_cent 0.9131592 -0.1782509 0.9131592 1.0000000 0.9946549
reflect_EV 0.8729045 -0.1481903 0.8729045 0.9946549 1.0000000
derive_EV 0.9314936 -0.1943027 0.9314936 0.9983162 0.9869907
derive_EV
degree_all 0.9314936
BC_power -0.1943027
degree_norm 0.9314936
EV_cent 0.9983162
reflect_EV 0.9869907
derive_EV 1.0000000
However, I will also make a pretty visualization of the correlation matrix, just because.
Citations:
Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm
ASCAP. 18 March 2022.
Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/
Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/
Photo by Grateful Dead Productions
This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Becvar (2022, April 15). Data Analytics and Computational Social Science: DACSS 697E Assignment 4. Retrieved from https://www.kristinabecvar.com/posts/2022-04-04-dacss-697e-assignment-4/
BibTeX citation
@misc{status-centrality, author = {Becvar, Kristina}, title = {Data Analytics and Computational Social Science: DACSS 697E Assignment 4}, url = {https://www.kristinabecvar.com/posts/2022-04-04-dacss-697e-assignment-4/}, year = {2022} }