DACSS 697E Assignment 7

networks homework grateful network

Assignment 7 for DACSS 697E course ‘Social and Political Network Analysis’: “Networks: Community”

Kristina Becvar https://www.kristinabecvar.com
04-09-2022

I am continuing to use the Grateful Dead song writing data set that I used in previous assignments to examine co-writing links and centrality. The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.

There are 26 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 26 nodes in the dataset.

There are a total of 183 (updated and still under review!) unique songs played, and the varies combinations of co-writing combinations are now represented in a binary affiliation matrix.

This week I will calculate community clusters using various algorithms.

Network Creation

First, I will get my data into an igraph network object and inspect it.

Show code
#import data
gd_vertices <- read.csv("gd_nodes.csv", header=T, stringsAsFactors=F)
gd_affiliation <- read.csv("gd_affiliation_matrix.csv", row.names = 1, header = TRUE, check.names = FALSE)
gd_matrix <- as.matrix(gd_affiliation)
gd_projection <- gd_matrix%*%t(gd_matrix)
#Create igraph object
gd_network_ig <- graph.adjacency(gd_projection,mode="undirected")

Basic Visualization

Cleaning

Simplify Function

The fast and greedy function was giving me an error code of:

Error in cluster_fast_greedy(gd_network_ig):At fast_community.c:660: fast-greedy community finding works only on graphs without multiple edges, Invalid value

Some community sourcing of opinions led me to run the “simplify()” function to correct this.

#create simplified igraph network
simple_gd <- simplify(gd_network_ig)

Giant Component

Creating a function to extract the giant component. The spinglass model will not evaluate unconnected graphs, so I did this step first. The one isolate node, “Bruce Hornsby”, is the only songwriter who wrote only a song without collaborating with anyone, and would be the only unevaluated node in the spinglass model.

Show code
giant.component <- function(graph) {
  cl <- clusters(graph)
  induced.subgraph(graph, which(cl$membership == which.max(cl$csize)))
}
Show code
#extract giant component
gd_giant<-giant.component(gd_network_ig)

Fast and Greedy Community

The method attempts to detect dense sub-graphs by optimizing modularity scores on igraph networks that are un-directed. I’ll start with inspecting the names that are part of the new object.

Show code
#run fast_greedy clustering algorithm
#fg_gd <- cluster_fast_greedy(simple_gd)
Show code
#inspect
names(fg_gd)
[1] "merges"     "modularity" "membership" "names"      "algorithm" 
[6] "vcount"    

Groups

Looking at the list of which nodes belong to which clusters:

Show code
igraph::groups(fg_gd)
$`1`
[1] "Frank Guida" "Dave Parker" "Pigpen"      "Joe Royster"

$`2`
[1] "Eric Andersen" "John Barlow"   "Bob Bralove"   "Willie Dixon" 
[5] "Gerrit Graham" "Robert Hunter" "Rob Wasserman" "Bob Weir"     
[9] "Vince Welnick"

$`3`
[1] "John Dawson"     "Jerry Garcia"    "Donna Godchaux" 
[4] "Keith Godchaux"  "Mickey Hart"     "Bill Kreutzmann"

$`4`
[1] "Andrew Charles"  "Ned Lagin"       "Phil Lesh"      
[4] "Peter Monk"      "Brent Mydland"   "Robert Petersen"

$`5`
[1] "Bruce Hornsby"

Community Membership

First I’m inspecting the community membership as a vector

Show code
#Inspect community membership vector
fg_gd$membership
 [1] 2 2 2 4 3 2 3 3 3 2 1 3 5 2 3 4 4 4 4 1 4 1 1 2 2 2

And I can confirm which of the 5 membership groups each songwriter is part of:

Show code
#Membership function
membership(fg_gd)
  Eric Andersen     John Barlow     Bob Bralove  Andrew Charles 
              2               2               2               4 
    John Dawson    Willie Dixon    Jerry Garcia  Donna Godchaux 
              3               2               3               3 
 Keith Godchaux   Gerrit Graham     Frank Guida     Mickey Hart 
              3               2               1               3 
  Bruce Hornsby   Robert Hunter Bill Kreutzmann       Ned Lagin 
              5               2               3               4 
      Phil Lesh      Peter Monk   Brent Mydland     Dave Parker 
              4               4               4               1 
Robert Petersen          Pigpen     Joe Royster   Rob Wasserman 
              4               1               1               2 
       Bob Weir   Vince Welnick 
              2               2 

Plot the Network with Community Colors

Igraph colors the nodes by community

Show code
#plot network with community coloring
plot(fg_gd,gd_network_ig)

Walktrap Community Detection and Plot

The walktrap community detection created two communities; one community is the lone isolate, and the rest of the songwriters are in the other community with the giant component.

Show code
#Run clustering algorithm: walktrap
wt_gd <- walktrap.community(gd_network_ig)
#Inspect community membership
igraph::groups(wt_gd)
$`1`
 [1] "Eric Andersen"   "John Barlow"     "Bob Bralove"    
 [4] "Andrew Charles"  "John Dawson"     "Willie Dixon"   
 [7] "Jerry Garcia"    "Donna Godchaux"  "Keith Godchaux" 
[10] "Gerrit Graham"   "Frank Guida"     "Mickey Hart"    
[13] "Robert Hunter"   "Bill Kreutzmann" "Ned Lagin"      
[16] "Phil Lesh"       "Peter Monk"      "Brent Mydland"  
[19] "Dave Parker"     "Robert Petersen" "Pigpen"         
[22] "Joe Royster"     "Rob Wasserman"   "Bob Weir"       
[25] "Vince Welnick"  

$`2`
[1] "Bruce Hornsby"

Plot the Network with Community Colors

Igraph colors the nodes by community

Show code
#plot network with community coloring
plot(wt_gd,gd_network_ig)

Compare Community Partitions - Fast and Greedy and Walktrap

The modularity score for the fast and greedy algorithm is higher than the walktrap algorithm, as predicted by the tutorial.

It would be worth comparing these scores on a weighted network in the future since it would take that into consideration.

Show code
#compare community partition modularity scores
modularity(fg_gd)
[1] 0.2792899
Show code
#compare community partition modularity scores
modularity(wt_gd)
[1] 0.2482355

Collect & Compare Modularity Scores

Saving the scores for evaluation and later analysis; I will continue to add the other community modularity scores into this vector as I run them.

mods<-c(fastgreedy=modularity(fg_gd), walktrap=modularity(wt_gd))
mods
fastgreedy   walktrap 
 0.2792899  0.2482355 

Variation Method

Show code
#compare community partitions using variation of information
compare(fg_gd,wt_gd,method="vi")
[1] 1.294253

Normalized Mutual Information Method

Show code
#compare community partitions
compare(fg_gd,wt_gd,method="nmi")
[1] 0.2012264

Split Join Distance Method

Show code
#compare community partitions
compare(fg_gd,wt_gd,method="split.join")
[1] 16

Rand Index Method

Show code
#compare community partitions
compare(fg_gd,wt_gd,method="rand")
[1] 0.2984615

Adjusted Rand Index Method

Show code
#compare community partitions
compare(fg_gd,wt_gd,method="adjusted.rand")
[1] 0.04633205

Leading Label Propagation Community Detection

In this evaluation, each of the nodes was indicated to be in its’ own community. I will not plot this community.

Show code
#Run clustering algorithm: leading label
lab_gd<-label.propagation.community(gd_network_ig)
#Inspect community membership
igraph::groups(lab_gd)
$`1`
[1] "Eric Andersen"

$`2`
[1] "John Barlow"

$`3`
[1] "Bob Bralove"

$`4`
[1] "Andrew Charles"

$`5`
[1] "John Dawson"

$`6`
[1] "Willie Dixon"

$`7`
[1] "Jerry Garcia"

$`8`
[1] "Donna Godchaux"

$`9`
[1] "Keith Godchaux"

$`10`
[1] "Gerrit Graham"

$`11`
[1] "Frank Guida"

$`12`
[1] "Mickey Hart"

$`13`
[1] "Bruce Hornsby"

$`14`
[1] "Robert Hunter"

$`15`
[1] "Bill Kreutzmann"

$`16`
[1] "Ned Lagin"

$`17`
[1] "Phil Lesh"

$`18`
[1] "Peter Monk"

$`19`
[1] "Brent Mydland"

$`20`
[1] "Dave Parker"

$`21`
[1] "Robert Petersen"

$`22`
[1] "Pigpen"

$`23`
[1] "Joe Royster"

$`24`
[1] "Rob Wasserman"

$`25`
[1] "Bob Weir"

$`26`
[1] "Vince Welnick"
Show code
#collect modularity scores to compare
mods<-c(mods, label=modularity(lab_gd))

Edge Betweenness Community Detection

Again, each of the nodes was indicated to be in its’ own community. I will not plot this community.

Show code
#Run clustering algorithm: edge betweenness
edge_gd <- label.propagation.community(gd_network_ig)
#Inspect community membership
igraph::groups(edge_gd)
$`1`
[1] "Eric Andersen"

$`2`
[1] "John Barlow"

$`3`
[1] "Bob Bralove"

$`4`
[1] "Andrew Charles"

$`5`
[1] "John Dawson"

$`6`
[1] "Willie Dixon"

$`7`
[1] "Jerry Garcia"

$`8`
[1] "Donna Godchaux"

$`9`
[1] "Keith Godchaux"

$`10`
[1] "Gerrit Graham"

$`11`
[1] "Frank Guida"

$`12`
[1] "Mickey Hart"

$`13`
[1] "Bruce Hornsby"

$`14`
[1] "Robert Hunter"

$`15`
[1] "Bill Kreutzmann"

$`16`
[1] "Ned Lagin"

$`17`
[1] "Phil Lesh"

$`18`
[1] "Peter Monk"

$`19`
[1] "Brent Mydland"

$`20`
[1] "Dave Parker"

$`21`
[1] "Robert Petersen"

$`22`
[1] "Pigpen"

$`23`
[1] "Joe Royster"

$`24`
[1] "Rob Wasserman"

$`25`
[1] "Bob Weir"

$`26`
[1] "Vince Welnick"
Show code
#collect modularity scores to compare
mods<-c(mods, edge=modularity(edge_gd))

Eigenvector Community Detection

This method has also created 5 communities to examine.

Show code
#Run clustering algorithm: leading eigenvector
eigen_gd <- leading.eigenvector.community(gd_network_ig)
#Inspect community membership
igraph::groups(eigen_gd)
$`1`
[1] "Eric Andersen" "John Barlow"   "Bob Bralove"   "Willie Dixon" 
[5] "Gerrit Graham" "Brent Mydland" "Rob Wasserman" "Bob Weir"     
[9] "Vince Welnick"

$`2`
[1] "Bruce Hornsby"

$`3`
[1] "John Dawson"   "Jerry Garcia"  "Robert Hunter"

$`4`
 [1] "Andrew Charles"  "Frank Guida"     "Bill Kreutzmann"
 [4] "Ned Lagin"       "Phil Lesh"       "Peter Monk"     
 [7] "Dave Parker"     "Robert Petersen" "Pigpen"         
[10] "Joe Royster"    

$`5`
[1] "Donna Godchaux" "Keith Godchaux" "Mickey Hart"   
Show code
#collect modularity scores to compare
mods<-c(mods, eigen=modularity(eigen_gd))

Plot the Network with Community Colors

Igraph colors the nodes by community

Show code
#plot network with community coloring
plot(eigen_gd,gd_network_ig)

Spinglass Community Detection

This method has, not surprisingly, created 4 communities to examine since we are only looking at the giant component, eliminating the one isolate.

Show code
#Run clustering algorithm: spinglass
spin_gd <- spinglass.community(gd_giant)
#Inspect community membership
igraph::groups(spin_gd)
$`1`
[1] "Frank Guida" "Dave Parker" "Pigpen"      "Joe Royster"

$`2`
[1] "Andrew Charles"  "Ned Lagin"       "Phil Lesh"      
[4] "Peter Monk"      "Brent Mydland"   "Robert Petersen"

$`3`
[1] "Eric Andersen" "John Barlow"   "Bob Bralove"   "Willie Dixon" 
[5] "Gerrit Graham" "Rob Wasserman" "Bob Weir"      "Vince Welnick"

$`4`
[1] "John Dawson"     "Jerry Garcia"    "Donna Godchaux" 
[4] "Keith Godchaux"  "Mickey Hart"     "Robert Hunter"  
[7] "Bill Kreutzmann"
Show code
#collect modularity scores to compare
mods<-c(mods, spin=modularity(spin_gd))

Plot the Network with Community Colors

Igraph colors the nodes by community

Show code
#plot network with community coloring
plot(spin_gd,gd_giant)

Modularity

Finally, for this post, I will look at the modularity scores across the various methods of community evaluation.

Show code
gd_mods <- as.matrix(mods)
gd_mods
                 [,1]
fastgreedy 0.27928994
walktrap   0.24823554
label      0.40297240
edge       0.40297240
eigen      0.45402593
spin       0.04933856

Evaluation

After an initial look at the network through various community algorithms, it is my instinct that the fast and greedy community structure actually makes very little sense to me, outside the isolate.

The eigenvector community makes the most sense to me. Even the inclusion of John Dawson with Jerry Garcia and Robert Hunter makes sense, given the eigenvector principle. The only song John Dawson wrote was a popular song, written with Jerry Garcia and Robert Hunter.

The spinglass community structure is interesting, and I would like to look at that in more detail in the future. It has intuitively more of a logical distribution than the fast and greedy model, though it’s not quite as “clean” of a picture as the eigenvector model.

Citations:

Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm

ASCAP. 18 March 2022.

Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/

Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/

Photo by Rosie McGee

This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Becvar (2022, April 15). Data Analytics and Computational Social Science: DACSS 697E Assignment 7. Retrieved from https://www.kristinabecvar.com/posts/networks-community/

BibTeX citation

@misc{networks-community,
  author = {Becvar, Kristina},
  title = {Data Analytics and Computational Social Science: DACSS 697E Assignment 7},
  url = {https://www.kristinabecvar.com/posts/networks-community/},
  year = {2022}
}