Week 2 Assignment

An exploration of the Sampson’s Monks dataset.

Lissie Bates-Haus, Ph.D. https://github.com/lbateshaus (DACSS, University of Massachusetts-Amherst)https://www.umass.edu/sbs/data-analytics-and-computational-social-science-program/ms
2022-02-06

In this assignment, I will be exploring the Sampson’s Monks dataset.

First, I utilized the provided Import Script shared in the Google Classroom to import the data and create the relevant data formats to interact with igraph and statnet.

#This script imports the sampson monk dataset from the ergm package.

#Let's load the libraries you need (install them first if you need to)
  if("statnet" %in% rownames(installed.packages()) == FALSE) {install.packages("statnet")}
  if("igraph"   %in% rownames(installed.packages()) == FALSE) {install.packages("igraph")}
  if("intergraph"   %in% rownames(installed.packages()) == FALSE) {install.packages("intergraph")}
  
  
  library(statnet)
  library(igraph)
  library(intergraph)
  
#Lets read the data into the enviroment. This will import it as a 
  data("sampson", package = "ergm")
  network_statnet <- samplike
  rm(samplike)

#Let's create an edgelist version
  network_edgelist <- as.data.frame(as.edgelist(network_statnet))
  network_edgelist$nominaations <- network_statnet%e%'nominations'
  
#Let's create a dataframe of node attributes
  network_nodes <- data.frame(cloisterville = network_statnet%v%'cloisterville',
                              group         = network_statnet%v%'group',
                              names         = network_statnet%v%'vertex.names'
  )

  
#Finaly, lets make an igraph version
  network_igraph <- asIgraph(network_statnet)

Information about the network data can be accessed by the command: “?sampson”

First, using igraph:

dim(network_edgelist)
[1] 88  3

The dim() command tells us that we have a dataframe (called network_edgelist) which has 88 observations (rows) of 3 variables, which tells us that this (as the name indicates from the Import Script) an edgelist and not an adjacency matrix (which would be a square dataframe).

is_bipartite(network_igraph)
[1] FALSE
is_directed(network_igraph)
[1] TRUE
is_weighted(network_igraph)
[1] FALSE

From these commands, we learn that this dataset is not bipartite, it is directed, and it is not weighted.

vertex_attr_names(network_igraph)
[1] "cloisterville" "group"         "na"            "vertex.names" 

From here we learn that our nodes have the following attributes (meaning, additional information available about each node): Cloisterville, Group, NA and vertex.names.

Note: we have also created a Nodes dataframe, which has three columns: cloisterville, group and names. It’s not clear to me what the NA attribute is.

edge_attr_names(network_igraph)
[1] "na"          "nominations"

Our edge attributes include na and nominations. In this case, the nominations value is the “the number of times (out of 3) that monk A nominated monk B.”

We can also utilize the statnet package to learn about our network:

summary(network_statnet)
Network attributes:
  vertices = 18
  directed = TRUE
  hyper = FALSE
  loops = FALSE
  multiple = FALSE
 total edges = 88 
   missing edges = 0 
   non-missing edges = 88 
 density = 0.2875817 

Vertex attributes:

 cloisterville:
   logical valued attribute
   attribute summary:
   Mode   FALSE    TRUE 
logical      12       6 

 group:
   character valued attribute
   attribute summary:
   Loyal Outcasts    Turks 
       7        4        7 
  vertex.names:
   character valued attribute
   18 valid vertex names

Edge attributes:

 nominations:
   numeric valued attribute
   attribute summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   2.000   1.909   3.000   3.000 

Network edgelist matrix:
      [,1] [,2]
 [1,]    5    1
 [2,]    7    1
 [3,]    1    2
 [4,]    3    2
 [5,]   12    2
 [6,]   15    2
 [7,]    1    3
 [8,]    5    4
 [9,]    1    5
[10,]    4    5
[11,]    6    5
[12,]   13    7
[13,]    9    8
[14,]   10    8
[15,]   11    8
[16,]    8    9
[17,]   10    9
[18,]    8   10
[19,]   14   12
[20,]   10   13
[21,]   18   13
[22,]    2   15
[23,]   16   15
[24,]    9   16
[25,]   18   17
[26,]   17   18
[27,]    2    1
[28,]    3    1
[29,]    6    1
[30,]    8    1
[31,]   12    1
[32,]   14    1
[33,]   15    1
[34,]   16    1
[35,]   18    1
[36,]    7    2
[37,]    8    2
[38,]   14    2
[39,]   16    2
[40,]   17    2
[41,]   18    2
[42,]   17    3
[43,]   18    3
[44,]    6    4
[45,]    8    4
[46,]   10    4
[47,]   11    4
[48,]    9    5
[49,]   10    5
[50,]   11    5
[51,]   13    5
[52,]   15    5
[53,]    4    6
[54,]    8    6
[55,]    2    7
[56,]   12    7
[57,]   15    7
[58,]   16    7
[59,]   18    7
[60,]    1    8
[61,]    7    8
[62,]    5    9
[63,]    6    9
[64,]    4   10
[65,]    4   11
[66,]    5   11
[67,]   14   11
[68,]    1   12
[69,]    2   12
[70,]    7   12
[71,]    9   12
[72,]   15   12
[73,]   16   12
[74,]    3   13
[75,]    5   13
[76,]   17   13
[77,]    1   14
[78,]    2   14
[79,]   10   14
[80,]   11   14
[81,]   12   14
[82,]   15   14
[83,]   14   15
[84,]    7   16
[85,]   11   16
[86,]    3   17
[87,]    3   18
[88,]   13   18

From here, we’ll run some assessments based on our Week 2 tutorial.

First, we’ll run a dyad census:
igraph::dyad.census(network_igraph)
$mut
[1] 28

$asym
[1] 32

$null
[1] 93

There are 153 possible combinations of dyads in a group of 18 people. What this tells us that of those 153 combinations, only 28 are mutual (where A chooses B and B chooses A). Another 32 are assymmetic, meaning only one pair of the dyad chooses another, and 93, or more than 60% are null.

Next, we’ll look at a triad census (note: there are 816 possible triads in this network). We’ll confirm this:

sum(sna::triad.census(network_statnet, mode="graph"))
[1] 816
#Classify all triads in the network: statnet
#note: omit the 'mode' option for a directed network
sna::triad.census(network_statnet)
     003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 167 205 190   12   24   24   68   34    5    0  35   15    6
     120C 210 300
[1,]    5  18   8
#get network transitivity: igraph
transitivity(network_igraph)
[1] 0.4646739

this measure states that about 46.5% of the triads in our network are connected. However, this is a directed network.

gtrans(network_statnet)
[1] 0.4074074

We can look at global vs local transitivity as well.

transitivity(network_igraph, type="global")
[1] 0.4646739
transitivity(network_igraph, type="average")
[1] 0.4925926

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Ph.D. (2022, Feb. 17). Data Analytics and Computational Social Science: Week 2 Assignment. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlbateshaus862879/

BibTeX citation

@misc{ph.d.2022week,
  author = {Ph.D., Lissie Bates-Haus,},
  title = {Data Analytics and Computational Social Science: Week 2 Assignment},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlbateshaus862879/},
  year = {2022}
}