DACSS 697E Assignment 3

networks homework grateful network

Assignment 3 for DACSS 697E course ‘Social and Political Network Analysis’: “Grateful Research: Creating a Network”

Kristina Becvar https://www.kristinabecvar.com
03-18-2022

Network Creation

Purpose

For my final project, I am using a data set that is somewhat similar in structure to that of my final project to try and get a feel for the process of creating the appropriate network. After recovering data from thousands of New York Times articles pulled through their API on Afghanistan from a 2-year period, I will be analyzing the network of article authorship and themes of articles. To understand the process, I am using for my assignment a data set of co-writers of songs played by the Grateful Dead over their 30-year touring career that I compiled. While compiling the data, I added an attribute that represents the connections between co-writers as songs with the added observation of the number of times each song was played live. The nature of the band was that of a collaborative subculture where the energy of the live shows reflected the crowd’s buy-in to the songs being played. Since the band was primarily one whose popularity was measured by ticket sales, not album sales. I’m still not sure if that will serve appropriately as a ‘weight’ for the network data, so I need to explore the process more thoroughly.

First Try - A Miss

Understanding this as an affiliation network, I first created a matrix linking actors (songwriters) to an event (songs). I began by assigning a unique ID to each actor and event. Taking the data I have pulled from my research, I created an affiliation spreadsheet with the songwriters as rows and the songs as columns. When a songwriter was affiliated with a song, there was a number in that matrix spot. However, I struggled to get this affiliation data into a network in R because I was using weights incorrectly, so I took a different approach.

Regouping

In this example, I used a node list where unique IDs are numbers which correspond to the name of a songwriter.

The edgelist is in a separate spreadsheet where the first two columns are the IDs of the source and the target node (songwriter ID), regardless of whether the network is directed, for each edge. Each row contains an observation of a connection between writers for a given song, and since there are multiple collaborations, there may be multiple rows of writer combinations for a given song ID. If there was only one writer on a song, that songwriter’s ID is indicated in both the source and target column for that song.

The following columns are edge attributes. In my edgelist, I have the two songwriters representing the co-writing relationship in columns “1” and “2”, the song ID in column “3”, the song name in column “4”, and the number of times the corresponding song was played live is indicated in column “5”.

I have NOT utilized the number of times the song was played live as a network weight at this point. Additionally, this edgelist format is not the ideal format, but it is the first step in the process I am working through to utilize different methods of working through the data.

Network Creation

# Loading nodes and vertices
gd_vertices <- read.csv("gd_nodes.csv")
gd_edgelist <- read.csv("gd_clean_data.csv")

Converting network data into igraph objects using the “graph.data.frame: function, which takes two data frames: d and vertices.

“d” describes the edges of the network and “vertices” the nodes.

set.seed(1234)
grateful_data <- graph_from_data_frame(d = gd_edgelist, vertices = gd_vertices, directed = FALSE)

Network Details

Now to check the vertices and edges in the graph I’ve created to ensure they represent the data accurately, and confirm that all of the attributes have been represented properly:

head(V(grateful_data)$name)
[1] "Eric Andersen"  "John Barlow"    "Bob Bralove"   
[4] "Andrew Charles" "John Dawson"    "Willie Dixon"  
head(E(grateful_data)$song.id)
[1] 1 2 2 2 2 2
head(E(grateful_data)$song.name)
[1] "Alabama Getaway"     "Alice D Millionaire" "Alice D Millionaire"
[4] "Alice D Millionaire" "Alice D Millionaire" "Alice D Millionaire"
head(E(grateful_data)$weight)
NULL
is_directed(grateful_data)
[1] FALSE
is_weighted(grateful_data)
[1] FALSE
is_bipartite(grateful_data)
[1] FALSE
igraph::vertex_attr_names(grateful_data)
[1] "name"
igraph::edge_attr_names(grateful_data)
[1] "song.id"      "song.name"    "times.played"

Visualizing the Network

Show code
plot(grateful_data)

It’s basically plotting what I want it to illustrate, though I will need to do a lot more work to make the graph represent anything meaningful!

Dyad and Triad Census

Finishing the look at the basic network information such as the dyad and triad census: I have 558 mutual dyads and null value of “-233”, with a warning that calling a dyad census on an undirected graph. This does indicate to me that the edgelist format is not the best representation of this data.

Show code
igraph::dyad.census(grateful_data)
$mut
[1] 558

$asym
[1] 0

$null
[1] -233
Show code
igraph::triad.census(grateful_data)
 [1] 2043    0  233    0    0    0    0    0    0    0  237    0    0
[14]    0    0   87

Knowing this network has 26 vertices, I want to see if the triad census is working correctly by comparing the following data, which I can confirm using this calculation.

Show code
#possible triads in network
26*25*24/6
[1] 2600
Show code
sum(igraph::triad.census(grateful_data))
[1] 2600

Transitivity

Looking next at the global v. average local transitivity of the network:

Show code
#get global clustering cofficient: igraph
transitivity(grateful_data, type="global")
[1] 0.5240964
Show code
#get average local clustering coefficient: igraph
transitivity(grateful_data, type="average")
[1] 0.7755587

This transitivity tells me that the average network transitivity is significantly higher than the global transitivity, indicating, from my still naive network knowledge, that the overall network is generally more loose, and that there is a more connected sub-network.

Geodesic Distance

Looking at the geodesic distance tells me that on average, the path length is just over 2.

Show code
average.path.length(grateful_data,directed=F)
[1] 2.01

Components

Getting a look at the components of the network shows that there are 2 components in the network, and 25 of the 26 nodes make up the giant component with 1 isolate.

Show code
names(igraph::components(grateful_data))
[1] "membership" "csize"      "no"        
Show code
igraph::components(grateful_data)$no 
[1] 2
Show code
igraph::components(grateful_data)$csize
[1] 25  1

This is a great start - now I can get to looking at the network density, centrality, and centralization.

Density

The network density measure: First with just the call “graph.density” and then with adding “loops=TRUE”. Since I’m using igraph, I know that its’ default output assumes that loops are not included but does not remove them, which can be corrected with the addition of “loops=TRUE” per the course tutorials when comparing output to statnet. This gives me confidence that my network density is closer to 1.58.

Show code
graph.density(grateful_data)
[1] 1.716923
Show code
graph.density(grateful_data, loops=TRUE)
[1] 1.589744

Degree

The network degree measure: This gives me a clear output showing the degree of each particular node (songwriter). It is not suprising, knowing my subject matter, that Jerry Garcia is the highest degree node in this network as the practical and figurative head of the band. The other band members’ degree measures are not necessarily what I expected, though. I did not anticipate that his songwriting partner, Robert Hunter, would have a lower degree than band members Phil Lesh and Bob Weir. Further, I did not anticipate that the degree measure of band member ‘Pigpen’ would be so high given his early death in the first years of the band’s touring life.

Show code
igraph::degree(grateful_data)
  Eric Andersen     John Barlow     Bob Bralove  Andrew Charles 
              1              30              12               1 
    John Dawson    Willie Dixon    Jerry Garcia  Donna Godchaux 
              2               2             215              16 
 Keith Godchaux   Gerrit Graham     Frank Guida     Mickey Hart 
             19               1               2              25 
  Bruce Hornsby   Robert Hunter Bill Kreutzmann       Ned Lagin 
              4             136             121               1 
      Phil Lesh      Peter Monk   Brent Mydland     Dave Parker 
            158               1              24              10 
Robert Petersen          Pigpen     Joe Royster   Rob Wasserman 
              5             119               2              10 
       Bob Weir   Vince Welnick 
            188              11 

To look further I will create a dataframe for easier review going forward.

Show code
grateful_nodes<-data.frame(name=V(grateful_data)$name, degree=igraph::degree(grateful_data))
head(grateful_nodes)
                         name degree
Eric Andersen   Eric Andersen      1
John Barlow       John Barlow     30
Bob Bralove       Bob Bralove     12
Andrew Charles Andrew Charles      1
John Dawson       John Dawson      2
Willie Dixon     Willie Dixon      2

Summary Statistics

A quick look at the summary statistics confirms for me the minimum, maximum, median, and mean node degree data.

Show code
summary(grateful_nodes)
     name               degree      
 Length:26          Min.   :  1.00  
 Class :character   1st Qu.:  2.00  
 Mode  :character   Median : 10.50  
                    Mean   : 42.92  
                    3rd Qu.: 28.75  
                    Max.   :215.00  

Plotting the Network

Now I want to take a step back and try to visually represent this data better.

Show code
# Community detection algoritm 
community <- cluster_louvain(grateful_data) 

# Attach communities to relevant vertices
V(grateful_data)$color <- community$membership 

# Graph layout
layout <- layout.random(grateful_data) 

# igraph plot 
plot(grateful_data, layout = layout)

Better Plotting

Better, but not quite.

Show code
ggraph(grateful_data, layout = "fr") +
  geom_edge_link() + 
  geom_node_point(aes(color = factor(color))) + 
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void() +
  theme(legend.position = "none") 

Adding More Detail

That is starting to look more meaningful!

Show code
# Set size to degree centrality 
V(grateful_data)$size = degree(grateful_data)

# Additional customisation for better legibility 
ggraph(grateful_data, layout = "fr") +
  geom_edge_arc(strength = 0.2, width = 0.5, alpha = 0.15) + 
  geom_node_point(aes(size = size, color = factor(color))) + 
  geom_node_text(aes(label = name, size = size), repel = TRUE) +
  theme_void() +
  theme(legend.position = "none") 

There is a lot more to do, but this is a great start.

Citations:

Allan, Alex; Grateful Dead Lyric & Song Finder: https://whitegum.com/~acsa/intro.htm

ASCAP. 18 March 2022.

Dodd, David; The Annotated Grateful Dead Lyrics: http://artsites.ucsc.edu/gdead/agdl/

Schofield, Matt; The Grateful Dead Family Discography: http://www.deaddisc.com/

This information is intended for private research only, and not for any commercial use. Original Grateful Dead songs are ©copyright Ice Nine Music

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Becvar (2022, April 15). Data Analytics and Computational Social Science: DACSS 697E Assignment 3. Retrieved from https://www.kristinabecvar.com/posts/2022-04-03-dacss-697e-assignment-3/

BibTeX citation

@misc{network-creation,
  author = {Becvar, Kristina},
  title = {Data Analytics and Computational Social Science: DACSS 697E Assignment 3},
  url = {https://www.kristinabecvar.com/posts/2022-04-03-dacss-697e-assignment-3/},
  year = {2022}
}