Social Network Analysis: Week 2: Basic Network Structure

use igraph and statnet tools to describe aspects of network structure introduced in the Week 2 Lecture: Dyads and Dyad Census, Triads and Triad Census, Network Transitivity and Clustering, Path Length & Geodesic

Audra Jamai White (Umass Amherst - DACSS 679: Social Network Analysis)
2022-02-08

Social Network Analysis

Week 2 Assignment: Network Structure

  1. Identify an existing data set.
This can be one provided in the course directory, in an R package or library, located online, or some other source. 
  1. Briefly describe the network dataset.
Identify and describe content of nodes and links, and identify format of data set (i.e., matrix or edgelist, directed or not, weighted or not), and whether attribute data are present. Be sure to provide information about network size (e.g., information obtained from network description using week 1 network basics tutorial commands.)
  1. Explore the dataset using commands from week 2 tutorial.
Comment on the highlighted aspects of network structure such as geodesic and path distances, triads or transitivity, connectedness and.or component structure, etc. Be sure to both provide the relevant statistics calculated in R, as well as your own interpretation of these statistics.

1. Identify an existing data set.

  # install.packages("network")
  
    data("flo", package = "network")

2. Briefly describe the network dataset.

  1. Identify and describe content of nodes and links,
 [1] "Acciaiuoli"   "Albizzi"      "Barbadori"    "Bischeri"    
 [5] "Castellani"   "Ginori"       "Guadagni"     "Lamberteschi"
 [9] "Medici"       "Pazzi"        "Peruzzi"      "Pucci"       
[13] "Ridolfi"      "Salviati"     "Strozzi"      "Tornabuoni"  
  1. Identify format of data set (i.e., matrix or edgelist, directed or not, weighted or not), and

    • Flo Format: Size: 16 x 16 square dimmentions indicate a adjacency matrix format

    Network attributes: vertices = 16 directed = TRUE hyper = FALSE loops = FALSE multiple = FALSE bipartite = FALSE total edges= 40 missing edges= 0 non-missing edges= 40

Vertex attribute names: vertex.names

No edge attributes

  dim(flo)
[1] 16 16
  network(flo)
 Network attributes:
  vertices = 16 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 40 
    missing edges= 0 
    non-missing edges= 40 

 Vertex attribute names: 
    vertex.names 

No edge attributes
  1. Identify whether attribute data are present.
  summary(flo,
          print.adj = TRUE
          )
   Acciaiuoli        Albizzi         Barbadori        Bischeri     
 Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000  
 Median :0.0000   Median :0.0000   Median :0.000   Median :0.0000  
 Mean   :0.0625   Mean   :0.1875   Mean   :0.125   Mean   :0.1875  
 3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.000   3rd Qu.:0.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.000   Max.   :1.0000  
   Castellani         Ginori          Guadagni     Lamberteschi   
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:0.0000  
 Median :0.0000   Median :0.0000   Median :0.00   Median :0.0000  
 Mean   :0.1875   Mean   :0.0625   Mean   :0.25   Mean   :0.0625  
 3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.25   3rd Qu.:0.0000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00   Max.   :1.0000  
     Medici          Pazzi           Peruzzi           Pucci  
 Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0  
 1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0  
 Median :0.000   Median :0.0000   Median :0.0000   Median :0  
 Mean   :0.375   Mean   :0.0625   Mean   :0.1875   Mean   :0  
 3rd Qu.:1.000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0  
 Max.   :1.000   Max.   :1.0000   Max.   :1.0000   Max.   :0  
    Ridolfi          Salviati        Strozzi       Tornabuoni    
 Min.   :0.0000   Min.   :0.000   Min.   :0.00   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.00   1st Qu.:0.0000  
 Median :0.0000   Median :0.000   Median :0.00   Median :0.0000  
 Mean   :0.1875   Mean   :0.125   Mean   :0.25   Mean   :0.1875  
 3rd Qu.:0.0000   3rd Qu.:0.000   3rd Qu.:0.25   3rd Qu.:0.0000  
 Max.   :1.0000   Max.   :1.000   Max.   :1.00   Max.   :1.0000  

Create & Describe: Directed Adjacency Matrix

Florentine Families Directed Adjacency Matrix Network Objects: iGraph flo.ig Statnet flo.stat

iGraph and Statnet both use {r graph4} for Directed Adjacency Matrix Objects

    flo.stat<-
        network(
        flo,
        directed=F,
        matrix.type="adjacency"
        )

#     provides a description of several critical network features
    print(flo.stat)
 Network attributes:
  vertices = 16 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 20 
    missing edges= 0 
    non-missing edges= 20 

 Vertex attribute names: 
    vertex.names 

No edge attributes
    flo.ig<-
      graph.adjacency(
      flo,
      mode="undirected"
      ) 
    print(flo.ig)
IGRAPH 60a423d UN-- 16 20 -- 
+ attr: name (v/c)
+ edges from 60a423d (vertex names):
 [1] Acciaiuoli--Medici       Albizzi   --Ginori      
 [3] Albizzi   --Guadagni     Albizzi   --Medici      
 [5] Barbadori --Castellani   Barbadori --Medici      
 [7] Bischeri  --Guadagni     Bischeri  --Peruzzi     
 [9] Bischeri  --Strozzi      Castellani--Peruzzi     
[11] Castellani--Strozzi      Guadagni  --Lamberteschi
[13] Guadagni  --Tornabuoni   Medici    --Ridolfi     
[15] Medici    --Salviati     Medici    --Tornabuoni  
+ ... omitted several edges
#     Count Vertices
    vcount(flo.ig)
[1] 16
#     Count Edges
    ecount(flo.ig)
[1] 20
# Is this a Bipartite or single mode network?
    is_bipartite(flo.ig)
[1] FALSE
#  Are edges directed or undirected?
    is_directed(flo.ig)
[1] FALSE
#Are edges weighted or unweighted?
    is_weighted(flo.ig)        
[1] FALSE

Vertex and Edge Attributes

#     access vertex attributes
  head(flo.stat %v% "vertex.names")
[1] "Acciaiuoli" "Albizzi"    "Barbadori"  "Bischeri"   "Castellani"
[6] "Ginori"    
#      list the names of vertex attributes
  network::list.vertex.attributes(flo.stat)
[1] "na"           "vertex.names"
#     access edge attribute
  head(flo.stat%e% "weight")
NULL
#  Generate a list the names of edge attributes
  network::list.edge.attributes(flo.stat)
[1] "na"
#   summarize numeric network attribute
  summary(flo.stat  %v% "name")
   Mode    NA's 
logical      16 
#     access vertex attribute 
  V(flo.ig)$name
 [1] "Acciaiuoli"   "Albizzi"      "Barbadori"    "Bischeri"    
 [5] "Castellani"   "Ginori"       "Guadagni"     "Lamberteschi"
 [9] "Medici"       "Pazzi"        "Peruzzi"      "Pucci"       
[13] "Ridolfi"      "Salviati"     "Strozzi"      "Tornabuoni"  
#     Generate a list the names of vertex attributes
  igraph::vertex_attr_names(flo.ig)
[1] "name"
#     access edge attribute
  E(flo.ig)$weight
NULL
#   Generate a list the names of edge attributes  
  igraph::edge_attr_names(flo.ig)
character(0)
#   summarize numeric network attribute
  summary(E(flo.ig)$weight)  
Length  Class   Mode 
     0   NULL   NULL 

3. Explore the dataset using commands from week 2 tutorial.

  1. geodesic and path distances,

Path Length and Geodesic

#     Calculate distances between two nodes
    distances(flo.ig,"Bischeri","Ridolfi")
         Ridolfi
Bischeri       2
#     Calculate distance between two nodes using unweighted edges
    distances(flo.ig,"Bischeri", "Castellani",weights=NA)
         Castellani
Bischeri          2

We can also find all of the shortest paths between two famalies in this network. These list the starting node and the ending node with all nodes inbetween for each path. The option weights=NA means that any available edge weights are ignored.

#     isolating the distances between specific node
    all_shortest_paths(flo.ig,"Strozzi","Tornabuoni", weights=NA)$res
[[1]]
+ 3/16 vertices, named, from 60a423d:
[1] Strozzi    Ridolfi    Tornabuoni
## Note: manually tell igraph to ignore edge weights.

The concept of shortest path to describe the overall network structure can be more useful

#     find average shortest path for network
    average.path.length(flo.ig,directed=F)    
[1] 2.485714
  1. triads or transitivity,

Dyad Census

Classifies all dyads in the network as: - Reciprocal (mutual), or mut - Asymmetric (non-mutual), or asym, and - Absent, or null

    sna::dyad.census(flo.stat)
     Mut Asym Null
[1,]  20    0  100
  igraph::dyad.census(flo.ig)
$mut
[1] 20

$asym
[1] 0

$null
[1] 100

Triad Census

Classifies all triads in the network. The triad census provides a fundamental descriptive insight into the types of triads found in a particular dataset.

#     Classify all Triad in the network: Directed Network
  sna::triad.census(flo.stat, mode= )
     003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 324   0 195    0    0    0    0    0    0    0  38    0    0
     120C 210 300
[1,]    0   0   3
#     Classify all Triad in the network: Undirected Network
   sna::triad.census(flo.stat, mode="graph")
       0   1  2 3
[1,] 324 195 38 3
#     total number of all four triad types returned by triad census
   sum(sna::triad.census(flo.stat, mode="graph"))
[1] 560
#Classify all Triad in the network
     igraph::triad_census(flo.ig)
 [1] 324   0 195   0   0   0   0   0   0   0  38   0   0   0   0   3

Transitivity or Global Clustering

#     network transitivity:
    transitivity(flo.ig)
[1] 0.1914894
#     weighted network transitivity:
#   igraph::transivity(flo.ig)
#     Directed Networks transitivity:
  gtrans(flo.stat)
[1] 0.1914894
#   methods for weighted networks, rank and correlation, along with the relevant references.
#  sna::gtrans()

Local Transivity or Clustering

Local transitivity (local clustering coefficient), is a technical description of the density of an ego network.

#     Retrive a list of the vertices we are interested
    V(flo.ig)[c("Bischeri" ,
                "Castellani", 
                "Medici",
                "Ridolfi")]
+ 4/16 vertices, named, from 60a423d:
[1] Bischeri   Castellani Medici     Ridolfi   
#     check ego network transitivity
transitivity(flo.ig,
             type="local", 
             vids=V(flo.ig)
             [c("Bischeri",
                "Castellani", 
                "Medici",
                "Ridolfi")]) 
[1] 0.33333333 0.33333333 0.06666667 0.33333333
#     get global clustering cofficient
  transitivity(flo.ig, type="global")
[1] 0.1914894
#     get average local clustering coefficient
  transitivity(flo.ig, type="average")
[1] 0.2181818
  1. connectedness and.or component structure, etc.

Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill for R Markdown at https://rstudio.github.io/distill.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

White (2022, Feb. 17). Data Analytics and Computational Social Science: Social Network Analysis: Week 2: Basic Network Structure. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscombunnificent862631/

BibTeX citation

@misc{white2022social,
  author = {White, Audra Jamai},
  title = {Data Analytics and Computational Social Science: Social Network Analysis: Week 2: Basic Network Structure},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscombunnificent862631/},
  year = {2022}
}