WK7Challenge_KDocekal

Code
library(tidyverse)
library(igraph)
library(statnet)
library(gssr)
library(drat)

knitr::opts_chunk$set(echo = TRUE)

My dataset uses results from the 1985 General Social Survey. The 1985 GSS dataset is in edgelist format. There are 1534 observations with 622 variables.

Code
remotes::install_github("kjhealy/gssr")
Skipping install of 'gssr' from a github remote, the SHA1 (abe949b1) has not changed since last install.
  Use `force = TRUE` to force installation
Code
drat::addRepo("kjhealy")

gss85 <- gss_get_yr(1985)
Fetching: https://gss.norc.org/documents/stata/1985_stata.zip
Code
head(
gss85)
# A tibble: 6 × 662
  year     id wrkstat hrs1        hrs2        evwork      occ   prestige wrkslf 
  <dbl> <dbl> <dbl+l> <dbl+lbl>   <dbl+lbl>   <dbl+lbl>   <dbl> <dbl+lb> <dbl+l>
1 1985      1 1 [wor…    40       NA(i) [iap] NA(i) [iap] 194   51       2 [som…
2 1985      2 1 [wor…    65       NA(i) [iap] NA(i) [iap]  31   76       1 [sel…
3 1985      3 2 [wor…     9       NA(i) [iap] NA(i) [iap] 180   51       2 [som…
4 1985      4 3 [wit… NA(i) [iap]    60       NA(i) [iap]  65   82       2 [som…
5 1985      5 1 [wor…    40       NA(i) [iap] NA(i) [iap] 915   20       2 [som…
6 1985      6 1 [wor…    40       NA(i) [iap] NA(i) [iap] 185   46       1 [sel…
# ℹ 653 more variables: wrkgovt <dbl+lbl>, industry <dbl+lbl>, found <dbl+lbl>,
#   occ10 <dbl+lbl>, occindv <dbl+lbl>, occstatus <dbl+lbl>, occtag <dbl+lbl>,
#   prestg10 <dbl+lbl>, prestg105plus <dbl+lbl>, indus10 <dbl+lbl>,
#   indstatus <dbl+lbl>, indtag <dbl+lbl>, marital <dbl+lbl>, agewed <dbl+lbl>,
#   divorce <dbl+lbl>, spwrksta <dbl+lbl>, sphrs1 <dbl+lbl>, sphrs2 <dbl+lbl>,
#   spevwork <dbl+lbl>, spocc <dbl+lbl>, sppres <dbl+lbl>, spwrkslf <dbl+lbl>,
#   spind <dbl+lbl>, spocc10 <dbl+lbl>, spoccindv <dbl+lbl>, …
Code
dim(gss85)
[1] 1534  662

We can first create the dataframe by selecting the ties variable. Here I use “talkto” which provides weighted edges based on members a respondent’s group of contacts. Weights are based on respondents’ perception of how much each contact talks to each other.

Code
ties <- gss85[,grepl("talkto", colnames(gss85))]
head(ties)
# A tibble: 6 × 5
  talkto1             talkto2          talkto3          talkto4      talkto5    
  <dbl+lbl>           <dbl+lbl>        <dbl+lbl>        <dbl+lbl>    <dbl+lbl>  
1 2 [once a week]     1 [almost daily] 3 [once a month] 3 [once a m…     2 [onc…
2 1 [almost daily]    2 [once a week]  1 [almost daily] 1 [almost d…     1 [alm…
3 2 [once a week]     2 [once a week]  2 [once a week]  1 [almost d…     2 [onc…
4 1 [almost daily]    1 [almost daily] 3 [once a month] 2 [once a w…     1 [alm…
5 2 [once a week]     2 [once a week]  3 [once a month] 3 [once a m…     2 [onc…
6 4 [lt once a month] 2 [once a week]  2 [once a week]  1 [almost d… NA(i) [iap]

A matrix and the igraph network can be created using the previous ties. There are 5 rows and columns corresponding to the number of respondent contacts, ties are undirected and weighted.

Code
mat = matrix(nrow = 5, ncol = 5)

mat[lower.tri(mat)] <- as.numeric(ties[3,])

mat[upper.tri(mat)] = t(mat)[upper.tri(mat)]

na_vals <- is.na(mat)
non_missing_rows <- rowSums(na_vals) < nrow(mat)
mat <- mat[non_missing_rows,non_missing_rows]

diag(mat) <- 0

mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    2    2    2    1
[2,]    2    0    2    2    2
[3,]    2    2    0    2    1
[4,]    2    2    2    0    2
[5,]    1    2    1    2    0
Code
ig.net <- graph.adjacency(mat, mode = "undirected", weighted = T)

Edges represent the weight between contacts. Contacts are numbered 1-5, based on participants’ response order. For example, Edge 1–2 indicates that contacts number 1 and 2 talk to each other.

Code
print(ig.net)
IGRAPH 20c2069 U-W- 5 10 -- 
+ attr: weight (e/n)
+ edges from 20c2069:
 [1] 1--2 1--3 1--4 1--5 2--3 2--4 2--5 3--4 3--5 4--5
Code
head(ig.net)
5 x 5 sparse Matrix of class "dgCMatrix"
              
[1,] . 2 2 2 1
[2,] 2 . 2 2 2
[3,] 2 2 . 2 1
[4,] 2 2 2 . 2
[5,] 1 2 1 2 .

Results show that there is 1 component and the median weight is 2.

Code
igraph::components(ig.net)$no
[1] 1
Code
summary(E(ig.net)$weight)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0     2.0     2.0     1.8     2.0     2.0 

This network contains 5 vertices and 10 edges. Ties are not bipartite or directed but they are weighted.

Code
vcount(ig.net)
[1] 5
Code
ecount(ig.net)
[1] 10
Code
is_bipartite(ig.net)
[1] FALSE
Code
is_directed(ig.net)
[1] FALSE
Code
is_weighted(ig.net)
[1] TRUE

As an undirected graph we can use Fast and Greedy for community clustering.

Code
#Run clustering algorithm: fast_greedy
comm.fg<-cluster_fast_greedy(ig.net)

#Inspect clustering object
names(comm.fg)
[1] "merges"     "modularity" "membership" "algorithm"  "vcount"    

This identifies two groups; contact 3 and all other nodes.

Code
comm.fg
IGRAPH clustering fast greedy, groups: 2, mod: 2.8e-17
+ groups:
  $`1`
  [1] 1 2 4 5
  
  $`2`
  [1] 3
  

Examining the community membership vector shows membership distribution.

Code
comm.fg$membership
[1] 1 1 2 1 1

Plotting with coloring shows a visualization of these two communities.

Code
plot(comm.fg,ig.net)

Walktrap is another potential algorithm for community detection.

Code
comm.wt<-walktrap.community(ig.net)


igraph::groups(comm.wt)
$`1`
[1] 1 2 3 4 5

Testing with steps ranging from 20 to 2000 all reveal the same, single community.

Code
igraph::groups(walktrap.community(ig.net ,steps=20))
$`1`
[1] 1 2 3 4 5
Code
igraph::groups(walktrap.community(ig.net ,steps=200))
$`1`
[1] 1 2 3 4 5
Code
igraph::groups(walktrap.community(ig.net ,steps=2000))
$`1`
[1] 1 2 3 4 5

Plotting the network with Walktrap shows a single community with all nodes connected. The walktrap community makes the most sense as the more representative graph as we already know nodes are connected through their association with the respondent. In this case changing the number of steps does affect theses results and confirms our expectations.

Code
plot(comm.wt,ig.net)