Data Analytics and Computational Social Science: DACSS 697E Assignment 4

Kristina Becvar

Network Details

I am continuing to use the Grateful Dead song writing data set that I am using in this series of posts to examine co-writing links and centrality.

The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.

There are 26 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 26 nodes in the dataset.

There are a total of 183 (updated and still under review!) unique songs played, and the varies combinations of co-writing combinations are now represented in a binary affiliation matrix.

I have not weighted this version of the data; I am trying to build it from a binary affiliation matrix first, and hope to later add the number of times a given song was played live as a weight.

Affiliation Matrix

Loading the dataset and creating the network to begin this assignment:

Show code

gd_vertices <- read.csv("gd_nodes.csv", header=T, stringsAsFactors=F)
gd_affiliation <- read.csv("gd_affiliation_matrix.csv", row.names = 1, header = TRUE, check.names = FALSE)
gd_matrix <- as.matrix(gd_affiliation)

Inspecting the first 8 columns of the data structure in the affiliation matrix format:

Show code

dim(gd_matrix)

[1]  26 183

Show code

gd_matrix[1:10, 1:4]

               Alabama Getaway Alice D Millionaire Alligator Althea
Eric Andersen                0                   0         0      0
John Barlow                  0                   0         0      0
Bob Bralove                  0                   0         0      0
Andrew Charles               0                   0         0      0
John Dawson                  0                   0         0      0
Willie Dixon                 0                   0         0      0
Jerry Garcia                 1                   1         0      1
Donna Godchaux               0                   0         0      0
Keith Godchaux               0                   0         0      0
Gerrit Graham                0                   0         0      0

Bipartite Projection

Now I can create the single mode network and examine the bipartite projection. After converting the matrix to a square adjacency matrix, I can look at the full matrix.

I can also call the adjacency matrix count for co-writing incidences between certain songwriters, such as between writing partners Jerry Garcia and Robert Hunter and between John Barlow and Bob Weir.

Show code

gd_projection <- gd_matrix%*%t(gd_matrix)
dim(gd_projection)

[1] 26 26

Show code

gd_projection[1:10, 1:4]

               Eric Andersen John Barlow Bob Bralove Andrew Charles
Eric Andersen              1           0           0              0
John Barlow                0          26           1              0
Bob Bralove                0           1           3              0
Andrew Charles             0           0           0              1
John Dawson                0           0           0              0
Willie Dixon               0           0           0              0
Jerry Garcia               0           0           0              0
Donna Godchaux             0           0           0              0
Keith Godchaux             0           0           0              0
Gerrit Graham              0           0           0              0

Show code

gd_projection["Jerry Garcia", "Robert Hunter"]

[1] 78

Show code

gd_projection["John Barlow", "Bob Weir"]

[1] 21

Network Creation

Now I will use this adjacency matrix to create both igraph and statnet network objects and take a look at their resulting features. This is a non-directed, unweighted dataset.

Show code

#Create Igraph and Statnet Objects

gd_network_ig <- graph.adjacency(gd_projection,mode="undirected") #igraph object
gd_network_stat <- network(gd_projection, directed=F, matrix.type="adjacency") #statnet object

#Inspect New Objects
print(gd_network_stat)

 Network attributes:
  vertices = 26 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 65 
    missing edges= 0 
    non-missing edges= 65 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Show code

igraph::vertex_attr_names(gd_network_ig)

[1] "name"

Show code

igraph::edge_attr_names(gd_network_ig)

character(0)

Show code

head(V(gd_network_ig)$name)

[1] "Eric Andersen"  "John Barlow"    "Bob Bralove"   
[4] "Andrew Charles" "John Dawson"    "Willie Dixon"

Show code

is_directed(gd_network_ig)

[1] FALSE

Show code

is_weighted(gd_network_ig)

[1] FALSE

Show code

is_bipartite(gd_network_ig)

[1] FALSE

Dyad & Triad Census

Looking at the dyad/triad census info in igraph and statnet:

Show code

igraph::dyad.census(gd_network_ig)

$mut
[1] 738

$asym
[1] 0

$null
[1] -413

Show code

igraph::triad.census(gd_network_ig)

 [1] 1788    0  488    0    0    0    0    0    0    0  237    0    0
[14]    0    0   87

Show code

sna::dyad.census(gd_network_stat)

     Mut Asym Null
[1,]  65    0  260

Show code

sna::triad.census(gd_network_stat)

      003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 1451   0 825    0    0    0    0    0    0    0 237    0    0
     120C 210 300
[1,]    0   0  87

Knowing this network has 26 vertices, I want to see if the triad census is working correctly by comparing the following data, which I can confirm it is here!

Show code

#possible triads in network
26*25*24/6

[1] 2600

Show code

sum(igraph::triad.census(gd_network_ig))

[1] 2600

Transitivity

Looking next at the global v. average local transitivity of the network in igraph and confirming global transitivity in statnet and igraph (Statnet and igraph network transitivity = 0.5241, igraph local transitivity = 0.7756)

Show code

#network transitivity: statnet
gtrans(gd_network_stat)

[1] 0.5240964

Show code

#global clustering cofficient: igraph
transitivity(gd_network_ig, type="global")

[1] 0.5240964

Show code

#average local clustering coefficient: igraph
transitivity(gd_network_ig, type="average")

[1] 0.7755587

These transitivity results tells me that the average local network transitivity is significantly higher than the global transitivity, indicating, again from my still naive network knowledge, that the overall network is generally more loose, and that there is a more connected sub-network.

Geodesic Distance

Looking at the geodesic distance tells me that on average, I can confirm that the path length is just over 2, so on average, each node is two “stops” from each other on the geodesic path.

Show code

average.path.length(gd_network_ig,directed=F)

[1] 2.01

Components

Getting a look at the components of the network comfirms that there are 2 components in the network, and 25 of the 26 nodes make up the giant component with 1 isolate.

Show code

names(igraph::components(gd_network_ig))

[1] "membership" "csize"      "no"

Show code

igraph::components(gd_network_ig)$no

[1] 2

Show code

igraph::components(gd_network_ig)$csize

[1] 25  1

Density

The network density measure: First with just the call “graph.density” and then with adding “loops=TRUE”. In igraph, I know that its’ default output assumes that loops are not included but does not remove them, which wwe had corrected with the addition of “loops=TRUE” per the course tutorials when comparing output to statnet. In this case, the statnet output is far different, so I am not sure what is happening with this aspect of the network.

Show code

graph.density(gd_network_ig, loops=TRUE)

[1] 2.102564

Show code

network.density(gd_network_stat)

[1] 0.2

Degree Centrality

The network degree measure: This gives me a clear output showing the degree of each particular node (songwriter). It is not surprising, knowing my subject matter, that Jerry Garcia is the highest degree node in this network as the practical and figurative head of the band. The other band members’ degree measures are not necessarily what I expected, though. I did not anticipate that his songwriting partner, Robert Hunter, would have a lower degree than band members Phil Lesh and Bob Weir. Further, I did not anticipate that the degree measure of band member ‘Pigpen’ would be so high given his early death in the first years of the band’s touring life.

Show code

igraph::degree(gd_network_ig)

  Eric Andersen     John Barlow     Bob Bralove  Andrew Charles 
              3              81              14               3 
    John Dawson    Willie Dixon    Jerry Garcia  Donna Godchaux 
              4               4             328              12 
 Keith Godchaux   Gerrit Graham     Frank Guida     Mickey Hart 
             16               3               4              36 
  Bruce Hornsby   Robert Hunter Bill Kreutzmann       Ned Lagin 
              4             313             100               3 
      Phil Lesh      Peter Monk   Brent Mydland     Dave Parker 
            149               3              41               7 
Robert Petersen          Pigpen     Joe Royster   Rob Wasserman 
             13              95               4              10 
       Bob Weir   Vince Welnick 
            213              13

Show code

sna::degree(gd_network_stat)

 [1]  2  6 10  2  4  4 20 12 14  2  4 14  0 22 18  2 28  2  8 10  4 16
[23]  4 10 34  8

To look further I will create a dataframe in igraph first, then statnet.

Igraph

Show code

ig_nodes<-data.frame(name=V(gd_network_ig)$name, degree=igraph::degree(gd_network_ig))
head(ig_nodes)

                         name degree
Eric Andersen   Eric Andersen      3
John Barlow       John Barlow     81
Bob Bralove       Bob Bralove     14
Andrew Charles Andrew Charles      3
John Dawson       John Dawson      4
Willie Dixon     Willie Dixon      4

Statnet

Show code

stat_nodes<-data.frame(name=gd_network_stat%v%"vertex.names", degree=sna::degree(gd_network_stat))
head(stat_nodes)

            name degree
1  Eric Andersen      2
2    John Barlow      6
3    Bob Bralove     10
4 Andrew Charles      2
5    John Dawson      4
6   Willie Dixon      4

The igraph and statnet dataframes give very different results.

Summary Statistics

A quick look at the summary statistics confirms for me the minimum, maximum, median, and mean node degree data using each package.

Show code

summary(ig_nodes)

     name               degree      
 Length:26          Min.   :  3.00  
 Class :character   1st Qu.:  4.00  
 Mode  :character   Median : 12.50  
                    Mean   : 56.77  
                    3rd Qu.: 71.00  
                    Max.   :328.00

Show code

summary(stat_nodes)

     name               degree  
 Length:26          Min.   : 0  
 Class :character   1st Qu.: 4  
 Mode  :character   Median : 8  
                    Mean   :10  
                    3rd Qu.:14  
                    Max.   :34

Statnet v. Igraph Degree Treatment

I’m taking a look at the dataframe of the degree nodes, though since it is not a directed network the in and out degrees are not measured or relevant to our network. But it is still interesting to look at how igraph and statnet handle these datasets differently.

Statnet

Show code

#create a dataframe of the total, in and out-degree of nodes in the stat network
gd_stat_nodes <- data.frame(name=gd_network_stat%v%"vertex.names",
    totdegree=sna::degree(gd_network_stat),
    indegree=sna::degree(gd_network_stat, cmode="indegree"),
    outdegree=sna::degree(gd_network_stat, cmode="outdegree"))

#sort the top total degree of nodes in the stat network
arrange(gd_stat_nodes, desc(totdegree))%>%slice(1:5)

             name totdegree indegree outdegree
1        Bob Weir        34       17        17
2       Phil Lesh        28       14        14
3   Robert Hunter        22       11        11
4    Jerry Garcia        20       10        10
5 Bill Kreutzmann        18        9         9

Igraph

Show code

#create a dataframe of the total, in and out-degree of nodes in the igraph network
gd_ig_nodes<-data.frame(name=V(gd_network_ig)$name, 
                     degree=igraph::degree(gd_network_ig), mode="tot",
                     degree=igraph::degree(gd_network_ig), mode="in",
                     degree=igraph::degree(gd_network_ig), mode="out")

#sort the top total degree of nodes in the igraph network
arrange(gd_ig_nodes, desc(degree))%>%slice(1:5)

                           name degree mode degree.1 mode.1 degree.2
Jerry Garcia       Jerry Garcia    328  tot      328     in      328
Robert Hunter     Robert Hunter    313  tot      313     in      313
Bob Weir               Bob Weir    213  tot      213     in      213
Phil Lesh             Phil Lesh    149  tot      149     in      149
Bill Kreutzmann Bill Kreutzmann    100  tot      100     in      100
                mode.2
Jerry Garcia       out
Robert Hunter      out
Bob Weir           out
Phil Lesh          out
Bill Kreutzmann    out

Overall Eigenvector Score

The Eigenvector centrality score for each node can be accessed by calling “vector”, and I can examine the top eigenvector scores in the igraph network:

#Eigenvector centrality, top 10 in igraph network

eigen_ig <- eigen_centrality(gd_network_ig)
eigen_gd_ig <- data.frame(eigen_ig)
arrange(eigen_gd_ig[1], desc(vector))%>%slice(1:10)

                    vector
Robert Hunter   1.00000000
Jerry Garcia    0.96094165
Bob Weir        0.18725953
Phil Lesh       0.15133380
Bill Kreutzmann 0.09223647
Pigpen          0.07985305
Mickey Hart     0.02523896
John Barlow     0.01773746
Keith Godchaux  0.01382256
Vince Welnick   0.01192303

Bonacich Power

The Bonacich power centrality score for each node can be accessed first just using defaults, including setting the index to “1”; then, I can “rescale” so that all of the scores sum “1”.

To display my results, I have to run the calculations and save the results as a dataframe to recall, since the command “bonpow()” is the same in igraph and statnet, which is causing trouble in running then knitting this file.

I need to understand more nuance to the Bonacich power measure in order to fully understand what these two measures say about my specific network.

Show code

#Compute Bonpow scores

#bp_ig1 <- bonpow(gd_network_ig) #with a default index of "1"
#bonpow_gd_ig1 <- data.frame(bp_ig1)
#write.csv(bonpow_gd_ig1, file = "bonpow_gd_ig1.csv")

#Rescaled so that they sum to "1"

#bp_ig2 <- bonpow(gd_network_ig, rescale = TRUE) #with a default index of "1"
#bonpow_gd_ig2 <- data.frame(bp_ig2)
#write.csv(bonpow_gd_ig2, file = "bonpow_gd_ig2.csv")

Show code

#Read in dataframe from previous chunk

bon1 <- read.csv("bonpow_gd_ig1.csv")
bon2 <- read.csv("bonpow_gd_ig2.csv")

totalbonpow <- merge(bon1,bon2)

totalbonpow

                 X      bp_ig1      bp_ig2
1   Andrew Charles  0.08220268  0.01522717
2  Bill Kreutzmann -0.70115475 -0.12988143
3      Bob Bralove -0.22064550 -0.04087222
4         Bob Weir -0.54308358 -0.10060043
5    Brent Mydland  0.52651322  0.09753095
6    Bruce Hornsby  0.00000000  0.00000000
7      Dave Parker -0.89144078 -0.16512988
8   Donna Godchaux  1.23038839  0.22791631
9    Eric Andersen -0.28021530 -0.05190689
10     Frank Guida  3.07056607  0.56878957
11   Gerrit Graham -0.28021530 -0.05190689
12    Jerry Garcia -0.25514168 -0.04726227
13     Joe Royster  3.07056607  0.56878957
14     John Barlow -0.31662818 -0.05865199
15     John Dawson  0.09708065  0.01798315
16  Keith Godchaux  1.17992241  0.21856802
17     Mickey Hart  0.15330194  0.02839755
18       Ned Lagin  0.08220268  0.01522717
19      Peter Monk  0.08220268  0.01522717
20       Phil Lesh -0.18066559 -0.03346637
21          Pigpen -0.52573655 -0.09738708
22   Rob Wasserman -0.41469644 -0.07681809
23   Robert Hunter -0.17351422 -0.03214166
24 Robert Petersen  1.11819222  0.20713317
25   Vince Welnick -0.07953575 -0.01473315
26    Willie Dixon -0.43204347 -0.08003144

Creating a data frame summarizing all of this information and doing basic visualization on a couple of them:

Show code

gd_adjacency <- as.matrix(as_adjacency_matrix(gd_network_ig))
gd_adjacency_2 <- gd_adjacency %*% gd_adjacency

#calculate portion of reflected centrality
gd_reflective <- diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_reflective <- ifelse(is.nan(gd_reflective),0,gd_reflective)

#calculate derived centrality
gd_derived <- 1-diag(as.matrix(gd_adjacency_2))/rowSums(as.matrix(gd_adjacency_2))
gd_derived <- ifelse(is.nan(gd_derived),1,gd_derived)
#create data frame of centrality measures
centrality_gd <-data.frame(id=1:vcount(gd_network_ig),
                        name=V(gd_network_ig)$name,
                        degree_all=igraph::degree(gd_network_ig),
                        BC_power=power_centrality(gd_network_ig),
                        degree_norm=igraph::degree(gd_network_ig,normalized=T),
                        EV_cent=centr_eigen(gd_network_ig,directed = F)$vector,
                        reflect_EV=gd_reflective*centr_eigen(gd_network_ig,directed = F)$vector,
                        derive_EV=gd_derived*centr_eigen(gd_network_ig,directed = F)$vector)

row.names(centrality_gd)<-NULL
centrality_gd

   id            name degree_all    BC_power degree_norm      EV_cent
1   1   Eric Andersen          3 -0.28021530        0.12 6.852805e-04
2   2     John Barlow         81 -0.31662818        3.24 1.773746e-02
3   3     Bob Bralove         14 -0.22064550        0.56 8.992246e-03
4   4  Andrew Charles          3  0.08220268        0.12 5.538095e-04
5   5     John Dawson          4  0.09708065        0.16 7.176110e-03
6   6    Willie Dixon          4 -0.43204347        0.16 7.041156e-04
7   7    Jerry Garcia        328 -0.25514168       13.12 9.609417e-01
8   8  Donna Godchaux         12  1.23038839        0.48 5.313952e-03
9   9  Keith Godchaux         16  1.17992241        0.64 1.382256e-02
10 10   Gerrit Graham          3 -0.28021530        0.12 6.852805e-04
11 11     Frank Guida          4  3.07056607        0.16 2.932974e-04
12 12     Mickey Hart         36  0.15330194        1.44 2.523896e-02
13 13   Bruce Hornsby          4  0.00000000        0.16 2.574501e-17
14 14   Robert Hunter        313 -0.17351422       12.52 1.000000e+00
15 15 Bill Kreutzmann        100 -0.70115475        4.00 9.223647e-02
16 16       Ned Lagin          3  0.08220268        0.12 5.538095e-04
17 17       Phil Lesh        149 -0.18066559        5.96 1.513338e-01
18 18      Peter Monk          3  0.08220268        0.12 5.538095e-04
19 19   Brent Mydland         41  0.52651322        1.64 2.659589e-03
20 20     Dave Parker          7 -0.89144078        0.28 5.385443e-03
21 21 Robert Petersen         13  1.11819222        0.52 2.274921e-03
22 22          Pigpen         95 -0.52573655        3.80 7.985305e-02
23 23     Joe Royster          4  3.07056607        0.16 2.932974e-04
24 24   Rob Wasserman         10 -0.41469644        0.40 5.146870e-03
25 25        Bob Weir        213 -0.54308358        8.52 1.872595e-01
26 26   Vince Welnick         13 -0.07953575        0.52 1.192303e-02
     reflect_EV    derive_EV
1  8.512801e-06 0.0006767677
2  4.171627e-03 0.0135658315
3  2.393769e-04 0.0087528693
4  9.466828e-06 0.0005443426
5  4.752391e-05 0.0071285863
6  1.242557e-05 0.0006916900
7  3.326255e-01 0.6283162014
8  1.213231e-04 0.0051926286
9  2.487863e-04 0.0135737779
10 8.512801e-06 0.0006767677
11 1.113787e-05 0.0002821595
12 1.390973e-03 0.0238479844
13 1.593739e-17 0.0000000000
14 3.713275e-01 0.6286724511
15 9.710558e-03 0.0825259133
16 9.466828e-06 0.0005443426
17 2.214058e-02 0.1291932241
18 9.466828e-06 0.0005443426
19 6.119022e-04 0.0020476869
20 4.829994e-05 0.0053371431
21 1.438169e-04 0.0021311045
22 9.031643e-03 0.0708214079
23 1.113787e-05 0.0002821595
24 1.077879e-04 0.0050390825
25 4.070942e-02 0.1465501114
26 3.311952e-04 0.0115918330

Graphing Centrality Scores

Show code

attach(centrality_gd)
breaks<-round(vcount(gd_network_ig))
hist(degree_all,breaks=breaks,
     main=paste("Distribution of Total Degree Scores in GD Songwriters ",sep=""),
     xlab="Total Degree Score")

Show code

hist(EV_cent,breaks=breaks,
     main=paste("Distribution of Eigenvector Centrality Scores in GD Songwriters ",sep=""),
    xlab="Eigenvector Centrality Score")

Show code

hist(BC_power,breaks=breaks,
     main=paste("Distribution of Bonacich Power Scores in GD Songwriters",sep=""),
     xlab="Bonacich Power Score")

I can independently look at the correlations between all scores now. Using prompts from this week’s tutorial, it looks that all of the variables except Bonacich power are strongly correlated, so I think I’ll want to begin subsetting my network to get more meaningful interpretations.

Show code

names(centrality_gd) #Find the columns we want to run the correlation on

[1] "id"          "name"        "degree_all"  "BC_power"   
[5] "degree_norm" "EV_cent"     "reflect_EV"  "derive_EV"

Show code

cols<-c(3:8) #All except the id and name in this instance
corMat<-cor(centrality_gd[,cols],use="complete.obs") #Specify those in the bracket
corMat #Let's look at it, which variables are most strongly correlated?

            degree_all   BC_power degree_norm    EV_cent reflect_EV
degree_all   1.0000000 -0.2782755   1.0000000  0.9131592  0.8729045
BC_power    -0.2782755  1.0000000  -0.2782755 -0.1782509 -0.1481903
degree_norm  1.0000000 -0.2782755   1.0000000  0.9131592  0.8729045
EV_cent      0.9131592 -0.1782509   0.9131592  1.0000000  0.9946549
reflect_EV   0.8729045 -0.1481903   0.8729045  0.9946549  1.0000000
derive_EV    0.9314936 -0.1943027   0.9314936  0.9983162  0.9869907
             derive_EV
degree_all   0.9314936
BC_power    -0.1943027
degree_norm  0.9314936
EV_cent      0.9983162
reflect_EV   0.9869907
derive_EV    1.0000000

However, I will also make a pretty visualization of the correlation matrix, just because.

library(corrplot)
corrplot(corMat)

Citations:

Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm

ASCAP. 18 March 2022.

Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/

Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/

Photo by Grateful Dead Productions

This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music

Comment on this article Share:

DACSS 697E Assignment 4

Network Details

Affiliation Matrix

Bipartite Projection

Network Creation

Dyad & Triad Census

Transitivity

Geodesic Distance

Components

Density

Degree Centrality

Igraph

Statnet

Summary Statistics

Statnet v. Igraph Degree Treatment

Statnet

Igraph

Overall Eigenvector Score

Bonacich Power

Graphing Centrality Scores

Reuse

Citation