challenge_2
instructions
Describing the Basic Structure of a Network
Author

Ben Ramsey

Published

February 22, 2023

Challenge Overview

Describe the basic structure of a network following the steps in tutorial of week 2, this time using a dataset of your choice: for instance, you could use Marriages in Game of Thrones or Like/Dislike from week 1.

Another more complex option is the newly added dataset of the US input-output table of direct requirements by industry, availabe in the Bureau of Economic Analysis. Input-output tables show the economic transactions between industries of an economy and thus can be understood as a directed adjacency matrix. Data is provided in the form of an XLSX file, so using read_xlsx from package readxl is recommended, including the sheet as an argument (2012 for instance).

Identify and describe content of nodes and links, and identify format of data set (i.e., matrix or edgelist, directed or not, weighted or not), and whether attribute data are present. Be sure to provide information about network size (e.g., information obtained from network description using week 1 network basic tutorial commands.)

Explore the dataset using commands from week 2 tutorial. Comment on the highlighted aspects of network structure such as:

  • Geodesic and Path Distances; Path Length
  • Dyads and Dyad Census
  • Triads and Triad Census
  • Network Transitivity and Clustering
  • Component Structure and Membership

Be sure to both provide the relevant statistics calculated in R, as well as your own interpretation of these statistics.

Describe the Network Data

  1. List and inspect List the objects to make sure the datafiles are working properly.
Code
got_marriages <- read_csv("_data/got/got_marriages.csv")
Rows: 255 Columns: 5
-- Column specification --------------------------------------------------------
Delimiter: ","
chr (5): From, To, Type, Notes, Generation

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
got_marriages.net <- as.network(got_marriages, loops = TRUE, multiple = TRUE, directed = FALSE)
got_marriages.ig <- graph_from_data_frame(got_marriages)

ls()
[1] "got_marriages"     "got_marriages.ig"  "got_marriages.net"
  1. Network Size What is the size of the network? You may use vcount and ecount.
  2. Network features Are these networks weighted, directed, and bipartite?
  3. Network Attributes Now, using commands from either statnet or igraph, list the vertex and edge attributes.
Code
print(got_marriages.net) 
 Network attributes:
  vertices = 20 
  directed = FALSE 
  hyper = FALSE 
  loops = TRUE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 255 
    missing edges= 0 
    non-missing edges= 255 

 Vertex attribute names: 
    vertex.names 

 Edge attribute names: 
    Generation Notes Type 

The network has twenty vertices and 255 edges and is not directed, not bipartite and not weighted.

Dyad and Triad Census

Now try a full dyad census. This gives us the number of dyads where the relationship is:

  • Reciprocal (mutual), or mut
  • Asymmetric (non-mutual), or asym, and
  • Absent, or null
Code
sna::dyad.census(got_marriages.net)
     Mut Asym Null
[1,] 310 -250  130

Now use triad.census in order to do a triad census.

Code
sna::triad.census(got_marriages.net, mode = "graph")
       0   1   2  3
[1,] 408 444 228 60

Global and Local Transitivity or Clustering

Compute global transitivity using transitivity on igraph or gtrans on statnet and local transitivity of specific nodes of your choice, in addition to the average clustering coefficient. What is the distribution of node degree and how does it compare with the distribution of local transitivity?

Code
transitivity(got_marriages.ig, type = "global")
[1] 0.4411765
Code
transitivity(got_marriages.ig, type = "average")
[1] 0.5478074
Code
transitivity(got_marriages.ig, type = "local", vids = V(got_marriages.ig)[c("Targaryen", "Stark", "Martell")])
[1] 0.3636364 0.4166667 0.4000000

The alters of the ego could be more connected, although I don’t have the information to make that determination.

Path Length and Component Structure

Can you compute the average path length and the diameter of the network? Can you find the component structure of the network and identify the cluster membership of each node?

Code
average.path.length(got_marriages.ig)
[1] 1.86875
Code
diameter(got_marriages.ig)
[1] 4
Code
components(got_marriages.ig)$no
[1] 1
Code
components(got_marriages.ig)$csize
[1] 20

The average path length is 1.86875. The graph has one component, and the component is made up of twenty nodes. The diameter is four.