Assignment 7 for DACSS 697E course ‘Social and Political Network Analysis’: “Networks: Community”
I am continuing to use the Grateful Dead song writing data set that I used in previous assignments to examine co-writing links and centrality. The data set consists of the links between co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled.
There are 26 songwriters that contributed to the songs played over the course of the Grateful Dead history, resulting in 26 nodes in the dataset.
There are a total of 183 (updated and still under review!) unique songs played, and the varies combinations of co-writing combinations are now represented in a binary affiliation matrix.
This week I will calculate community clusters using various algorithms.
First, I will get my data into an igraph network object and inspect it.
#import data
gd_vertices <- read.csv("gd_nodes.csv", header=T, stringsAsFactors=F)
gd_affiliation <- read.csv("gd_affiliation_matrix.csv", row.names = 1, header = TRUE, check.names = FALSE)
gd_matrix <- as.matrix(gd_affiliation)
gd_projection <- gd_matrix%*%t(gd_matrix)
#Create igraph object
gd_network_ig <- graph.adjacency(gd_projection,mode="undirected")
The fast and greedy function was giving me an error code of:
Error in cluster_fast_greedy(gd_network_ig):At fast_community.c:660: fast-greedy community finding works only on graphs without multiple edges, Invalid value
Some community sourcing of opinions led me to run the “simplify()” function to correct this.
#create simplified igraph network
simple_gd <- simplify(gd_network_ig)
Creating a function to extract the giant component. The spinglass model will not evaluate unconnected graphs, so I did this step first. The one isolate node, “Bruce Hornsby”, is the only songwriter who wrote only a song without collaborating with anyone, and would be the only unevaluated node in the spinglass model.
#extract giant component
gd_giant<-giant.component(gd_network_ig)
The method attempts to detect dense sub-graphs by optimizing modularity scores on igraph networks that are un-directed. I’ll start with inspecting the names that are part of the new object.
#run fast_greedy clustering algorithm
#fg_gd <- cluster_fast_greedy(simple_gd)
#inspect
names(fg_gd)
[1] "merges" "modularity" "membership" "names" "algorithm"
[6] "vcount"
Looking at the list of which nodes belong to which clusters:
igraph::groups(fg_gd)
$`1`
[1] "Frank Guida" "Dave Parker" "Pigpen" "Joe Royster"
$`2`
[1] "Eric Andersen" "John Barlow" "Bob Bralove" "Willie Dixon"
[5] "Gerrit Graham" "Robert Hunter" "Rob Wasserman" "Bob Weir"
[9] "Vince Welnick"
$`3`
[1] "John Dawson" "Jerry Garcia" "Donna Godchaux"
[4] "Keith Godchaux" "Mickey Hart" "Bill Kreutzmann"
$`4`
[1] "Andrew Charles" "Ned Lagin" "Phil Lesh"
[4] "Peter Monk" "Brent Mydland" "Robert Petersen"
$`5`
[1] "Bruce Hornsby"
First I’m inspecting the community membership as a vector
#Inspect community membership vector
fg_gd$membership
[1] 2 2 2 4 3 2 3 3 3 2 1 3 5 2 3 4 4 4 4 1 4 1 1 2 2 2
And I can confirm which of the 5 membership groups each songwriter is part of:
#Membership function
membership(fg_gd)
Eric Andersen John Barlow Bob Bralove Andrew Charles
2 2 2 4
John Dawson Willie Dixon Jerry Garcia Donna Godchaux
3 2 3 3
Keith Godchaux Gerrit Graham Frank Guida Mickey Hart
3 2 1 3
Bruce Hornsby Robert Hunter Bill Kreutzmann Ned Lagin
5 2 3 4
Phil Lesh Peter Monk Brent Mydland Dave Parker
4 4 4 1
Robert Petersen Pigpen Joe Royster Rob Wasserman
4 1 1 2
Bob Weir Vince Welnick
2 2
Igraph colors the nodes by community
#plot network with community coloring
plot(fg_gd,gd_network_ig)
The walktrap community detection created two communities; one community is the lone isolate, and the rest of the songwriters are in the other community with the giant component.
#Run clustering algorithm: walktrap
wt_gd <- walktrap.community(gd_network_ig)
#Inspect community membership
igraph::groups(wt_gd)
$`1`
[1] "Eric Andersen" "John Barlow" "Bob Bralove"
[4] "Andrew Charles" "John Dawson" "Willie Dixon"
[7] "Jerry Garcia" "Donna Godchaux" "Keith Godchaux"
[10] "Gerrit Graham" "Frank Guida" "Mickey Hart"
[13] "Robert Hunter" "Bill Kreutzmann" "Ned Lagin"
[16] "Phil Lesh" "Peter Monk" "Brent Mydland"
[19] "Dave Parker" "Robert Petersen" "Pigpen"
[22] "Joe Royster" "Rob Wasserman" "Bob Weir"
[25] "Vince Welnick"
$`2`
[1] "Bruce Hornsby"
Igraph colors the nodes by community
#plot network with community coloring
plot(wt_gd,gd_network_ig)
The modularity score for the fast and greedy algorithm is higher than the walktrap algorithm, as predicted by the tutorial.
It would be worth comparing these scores on a weighted network in the future since it would take that into consideration.
#compare community partition modularity scores
modularity(fg_gd)
[1] 0.2792899
#compare community partition modularity scores
modularity(wt_gd)
[1] 0.2482355
Saving the scores for evaluation and later analysis; I will continue to add the other community modularity scores into this vector as I run them.
mods<-c(fastgreedy=modularity(fg_gd), walktrap=modularity(wt_gd))
mods
fastgreedy walktrap
0.2792899 0.2482355
#compare community partitions using variation of information
compare(fg_gd,wt_gd,method="vi")
[1] 1.294253
#compare community partitions
compare(fg_gd,wt_gd,method="nmi")
[1] 0.2012264
#compare community partitions
compare(fg_gd,wt_gd,method="split.join")
[1] 16
#compare community partitions
compare(fg_gd,wt_gd,method="rand")
[1] 0.2984615
#compare community partitions
compare(fg_gd,wt_gd,method="adjusted.rand")
[1] 0.04633205
In this evaluation, each of the nodes was indicated to be in its’ own community. I will not plot this community.
#Run clustering algorithm: leading label
lab_gd<-label.propagation.community(gd_network_ig)
#Inspect community membership
igraph::groups(lab_gd)
$`1`
[1] "Eric Andersen"
$`2`
[1] "John Barlow"
$`3`
[1] "Bob Bralove"
$`4`
[1] "Andrew Charles"
$`5`
[1] "John Dawson"
$`6`
[1] "Willie Dixon"
$`7`
[1] "Jerry Garcia"
$`8`
[1] "Donna Godchaux"
$`9`
[1] "Keith Godchaux"
$`10`
[1] "Gerrit Graham"
$`11`
[1] "Frank Guida"
$`12`
[1] "Mickey Hart"
$`13`
[1] "Bruce Hornsby"
$`14`
[1] "Robert Hunter"
$`15`
[1] "Bill Kreutzmann"
$`16`
[1] "Ned Lagin"
$`17`
[1] "Phil Lesh"
$`18`
[1] "Peter Monk"
$`19`
[1] "Brent Mydland"
$`20`
[1] "Dave Parker"
$`21`
[1] "Robert Petersen"
$`22`
[1] "Pigpen"
$`23`
[1] "Joe Royster"
$`24`
[1] "Rob Wasserman"
$`25`
[1] "Bob Weir"
$`26`
[1] "Vince Welnick"
#collect modularity scores to compare
mods<-c(mods, label=modularity(lab_gd))
Again, each of the nodes was indicated to be in its’ own community. I will not plot this community.
#Run clustering algorithm: edge betweenness
edge_gd <- label.propagation.community(gd_network_ig)
#Inspect community membership
igraph::groups(edge_gd)
$`1`
[1] "Eric Andersen"
$`2`
[1] "John Barlow"
$`3`
[1] "Bob Bralove"
$`4`
[1] "Andrew Charles"
$`5`
[1] "John Dawson"
$`6`
[1] "Willie Dixon"
$`7`
[1] "Jerry Garcia"
$`8`
[1] "Donna Godchaux"
$`9`
[1] "Keith Godchaux"
$`10`
[1] "Gerrit Graham"
$`11`
[1] "Frank Guida"
$`12`
[1] "Mickey Hart"
$`13`
[1] "Bruce Hornsby"
$`14`
[1] "Robert Hunter"
$`15`
[1] "Bill Kreutzmann"
$`16`
[1] "Ned Lagin"
$`17`
[1] "Phil Lesh"
$`18`
[1] "Peter Monk"
$`19`
[1] "Brent Mydland"
$`20`
[1] "Dave Parker"
$`21`
[1] "Robert Petersen"
$`22`
[1] "Pigpen"
$`23`
[1] "Joe Royster"
$`24`
[1] "Rob Wasserman"
$`25`
[1] "Bob Weir"
$`26`
[1] "Vince Welnick"
#collect modularity scores to compare
mods<-c(mods, edge=modularity(edge_gd))
This method has also created 5 communities to examine.
#Run clustering algorithm: leading eigenvector
eigen_gd <- leading.eigenvector.community(gd_network_ig)
#Inspect community membership
igraph::groups(eigen_gd)
$`1`
[1] "Eric Andersen" "John Barlow" "Bob Bralove" "Willie Dixon"
[5] "Gerrit Graham" "Brent Mydland" "Rob Wasserman" "Bob Weir"
[9] "Vince Welnick"
$`2`
[1] "Bruce Hornsby"
$`3`
[1] "John Dawson" "Jerry Garcia" "Robert Hunter"
$`4`
[1] "Andrew Charles" "Frank Guida" "Bill Kreutzmann"
[4] "Ned Lagin" "Phil Lesh" "Peter Monk"
[7] "Dave Parker" "Robert Petersen" "Pigpen"
[10] "Joe Royster"
$`5`
[1] "Donna Godchaux" "Keith Godchaux" "Mickey Hart"
#collect modularity scores to compare
mods<-c(mods, eigen=modularity(eigen_gd))
Igraph colors the nodes by community
#plot network with community coloring
plot(eigen_gd,gd_network_ig)
This method has, not surprisingly, created 4 communities to examine since we are only looking at the giant component, eliminating the one isolate.
#Run clustering algorithm: spinglass
spin_gd <- spinglass.community(gd_giant)
#Inspect community membership
igraph::groups(spin_gd)
$`1`
[1] "Frank Guida" "Dave Parker" "Pigpen" "Joe Royster"
$`2`
[1] "Andrew Charles" "Ned Lagin" "Phil Lesh"
[4] "Peter Monk" "Brent Mydland" "Robert Petersen"
$`3`
[1] "Eric Andersen" "John Barlow" "Bob Bralove" "Willie Dixon"
[5] "Gerrit Graham" "Rob Wasserman" "Bob Weir" "Vince Welnick"
$`4`
[1] "John Dawson" "Jerry Garcia" "Donna Godchaux"
[4] "Keith Godchaux" "Mickey Hart" "Robert Hunter"
[7] "Bill Kreutzmann"
#collect modularity scores to compare
mods<-c(mods, spin=modularity(spin_gd))
Igraph colors the nodes by community
#plot network with community coloring
plot(spin_gd,gd_giant)
Finally, for this post, I will look at the modularity scores across the various methods of community evaluation.
gd_mods <- as.matrix(mods)
gd_mods
[,1]
fastgreedy 0.27928994
walktrap 0.24823554
label 0.40297240
edge 0.40297240
eigen 0.45402593
spin 0.04933856
After an initial look at the network through various community algorithms, it is my instinct that the fast and greedy community structure actually makes very little sense to me, outside the isolate.
The eigenvector community makes the most sense to me. Even the inclusion of John Dawson with Jerry Garcia and Robert Hunter makes sense, given the eigenvector principle. The only song John Dawson wrote was a popular song, written with Jerry Garcia and Robert Hunter.
The spinglass community structure is interesting, and I would like to look at that in more detail in the future. It has intuitively more of a logical distribution than the fast and greedy model, though it’s not quite as “clean” of a picture as the eigenvector model.
Citations:
Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm
ASCAP. 18 March 2022.
Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/
Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/
Photo by Rosie McGee
This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Becvar (2022, April 15). Data Analytics and Computational Social Science: DACSS 697E Assignment 7. Retrieved from https://www.kristinabecvar.com/posts/networks-community/
BibTeX citation
@misc{networks-community, author = {Becvar, Kristina}, title = {Data Analytics and Computational Social Science: DACSS 697E Assignment 7}, url = {https://www.kristinabecvar.com/posts/networks-community/}, year = {2022} }