Code
library(tidyverse)
library(igraph)
library(statnet)
library(gssr)
library(drat)
::opts_chunk$set(echo = TRUE) knitr
My dataset uses results from the 1985 General Social Survey. The 1985 GSS dataset is in edgelist format. There are 1534 observations with 622 variables.
Skipping install of 'gssr' from a github remote, the SHA1 (abe949b1) has not changed since last install.
Use `force = TRUE` to force installation
Fetching: https://gss.norc.org/documents/stata/1985_stata.zip
# A tibble: 6 × 662
year id wrkstat hrs1 hrs2 evwork occ prestige wrkslf
<dbl> <dbl> <dbl+l> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl> <dbl+lb> <dbl+l>
1 1985 1 1 [wor… 40 NA(i) [iap] NA(i) [iap] 194 51 2 [som…
2 1985 2 1 [wor… 65 NA(i) [iap] NA(i) [iap] 31 76 1 [sel…
3 1985 3 2 [wor… 9 NA(i) [iap] NA(i) [iap] 180 51 2 [som…
4 1985 4 3 [wit… NA(i) [iap] 60 NA(i) [iap] 65 82 2 [som…
5 1985 5 1 [wor… 40 NA(i) [iap] NA(i) [iap] 915 20 2 [som…
6 1985 6 1 [wor… 40 NA(i) [iap] NA(i) [iap] 185 46 1 [sel…
# ℹ 653 more variables: wrkgovt <dbl+lbl>, industry <dbl+lbl>, found <dbl+lbl>,
# occ10 <dbl+lbl>, occindv <dbl+lbl>, occstatus <dbl+lbl>, occtag <dbl+lbl>,
# prestg10 <dbl+lbl>, prestg105plus <dbl+lbl>, indus10 <dbl+lbl>,
# indstatus <dbl+lbl>, indtag <dbl+lbl>, marital <dbl+lbl>, agewed <dbl+lbl>,
# divorce <dbl+lbl>, spwrksta <dbl+lbl>, sphrs1 <dbl+lbl>, sphrs2 <dbl+lbl>,
# spevwork <dbl+lbl>, spocc <dbl+lbl>, sppres <dbl+lbl>, spwrkslf <dbl+lbl>,
# spind <dbl+lbl>, spocc10 <dbl+lbl>, spoccindv <dbl+lbl>, …
[1] 1534 662
We can first create the dataframe by selecting the ties variable. Here I use “talkto” which provides weighted edges based on members a respondent’s group of contacts. Weights are based on respondents’ perception of how much each contact talks to each other.
# A tibble: 6 × 5
talkto1 talkto2 talkto3 talkto4 talkto5
<dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl>
1 2 [once a week] 1 [almost daily] 3 [once a month] 3 [once a m… 2 [onc…
2 1 [almost daily] 2 [once a week] 1 [almost daily] 1 [almost d… 1 [alm…
3 2 [once a week] 2 [once a week] 2 [once a week] 1 [almost d… 2 [onc…
4 1 [almost daily] 1 [almost daily] 3 [once a month] 2 [once a w… 1 [alm…
5 2 [once a week] 2 [once a week] 3 [once a month] 3 [once a m… 2 [onc…
6 4 [lt once a month] 2 [once a week] 2 [once a week] 1 [almost d… NA(i) [iap]
A matrix and the igraph network can be created using the previous ties. There are 5 rows and columns corresponding to the number of respondent contacts, ties are undirected and weighted.
[,1] [,2] [,3] [,4] [,5]
[1,] 0 2 2 2 1
[2,] 2 0 2 2 2
[3,] 2 2 0 2 1
[4,] 2 2 2 0 2
[5,] 1 2 1 2 0
Edges represent the weight between contacts. Contacts are numbered 1-5, based on participants’ response order. For example, Edge 1–2 indicates that contacts number 1 and 2 talk to each other.
IGRAPH 20c2069 U-W- 5 10 --
+ attr: weight (e/n)
+ edges from 20c2069:
[1] 1--2 1--3 1--4 1--5 2--3 2--4 2--5 3--4 3--5 4--5
5 x 5 sparse Matrix of class "dgCMatrix"
[1,] . 2 2 2 1
[2,] 2 . 2 2 2
[3,] 2 2 . 2 1
[4,] 2 2 2 . 2
[5,] 1 2 1 2 .
Results show that there is 1 component and the median weight is 2.
[1] 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 2.0 2.0 1.8 2.0 2.0
This network contains 5 vertices and 10 edges. Ties are not bipartite or directed but they are weighted.
[1] 5
[1] 10
[1] FALSE
[1] FALSE
[1] TRUE
As an undirected graph we can use Fast and Greedy for community clustering.
[1] "merges" "modularity" "membership" "algorithm" "vcount"
This identifies two groups; contact 3 and all other nodes.
IGRAPH clustering fast greedy, groups: 2, mod: 2.8e-17
+ groups:
$`1`
[1] 1 2 4 5
$`2`
[1] 3
Examining the community membership vector shows membership distribution.
Plotting with coloring shows a visualization of these two communities.
Walktrap is another potential algorithm for community detection.
Testing with steps ranging from 20 to 2000 all reveal the same, single community.
$`1`
[1] 1 2 3 4 5
$`1`
[1] 1 2 3 4 5
$`1`
[1] 1 2 3 4 5
Plotting the network with Walktrap shows a single community with all nodes connected. The walktrap community makes the most sense as the more representative graph as we already know nodes are connected through their association with the respondent. In this case changing the number of steps does affect theses results and confirms our expectations.
---
title: "WK7Challenge_KDocekal"
output: html_document
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
---
```{r, message=FALSE}
#| label: setup
#| warning: false
library(tidyverse)
library(igraph)
library(statnet)
library(gssr)
library(drat)
knitr::opts_chunk$set(echo = TRUE)
```
My dataset uses results from the 1985 General Social Survey. The 1985 GSS dataset is in edgelist format. There are 1534 observations with 622 variables.
```{r}
remotes::install_github("kjhealy/gssr")
drat::addRepo("kjhealy")
gss85 <- gss_get_yr(1985)
head(
gss85)
dim(gss85)
```
We can first create the dataframe by selecting the ties variable. Here I use "talkto" which provides weighted edges based on members a respondent's group of contacts. Weights are based on respondents' perception of how much each contact talks to each other.
```{r}
ties <- gss85[,grepl("talkto", colnames(gss85))]
head(ties)
```
A matrix and the igraph network can be created using the previous ties. There are 5 rows and columns corresponding to the number of respondent contacts, ties are undirected and weighted.
```{r}
mat = matrix(nrow = 5, ncol = 5)
mat[lower.tri(mat)] <- as.numeric(ties[3,])
mat[upper.tri(mat)] = t(mat)[upper.tri(mat)]
na_vals <- is.na(mat)
non_missing_rows <- rowSums(na_vals) < nrow(mat)
mat <- mat[non_missing_rows,non_missing_rows]
diag(mat) <- 0
mat
ig.net <- graph.adjacency(mat, mode = "undirected", weighted = T)
```
Edges represent the weight between contacts. Contacts are numbered 1-5, based on participants' response order. For example, Edge 1--2 indicates that contacts number 1 and 2 talk to each other.
```{r}
print(ig.net)
head(ig.net)
```
Results show that there is 1 component and the median weight is 2.
```{r}
igraph::components(ig.net)$no
summary(E(ig.net)$weight)
```
This network contains 5 vertices and 10 edges. Ties are not bipartite or directed but they are weighted.
```{r}
vcount(ig.net)
ecount(ig.net)
is_bipartite(ig.net)
is_directed(ig.net)
is_weighted(ig.net)
```
As an undirected graph we can use Fast and Greedy for community clustering.
```{r}
#Run clustering algorithm: fast_greedy
comm.fg<-cluster_fast_greedy(ig.net)
#Inspect clustering object
names(comm.fg)
```
This identifies two groups; contact 3 and all other nodes.
```{r}
comm.fg
```
Examining the community membership vector shows membership distribution.
```{r}
comm.fg$membership
```
Plotting with coloring shows a visualization of these two communities.
```{r}
plot(comm.fg,ig.net)
```
Walktrap is another potential algorithm for community detection.
```{r}
comm.wt<-walktrap.community(ig.net)
igraph::groups(comm.wt)
```
Testing with steps ranging from 20 to 2000 all reveal the same, single community.
```{r}
igraph::groups(walktrap.community(ig.net ,steps=20))
igraph::groups(walktrap.community(ig.net ,steps=200))
igraph::groups(walktrap.community(ig.net ,steps=2000))
```
Plotting the network with Walktrap shows a single community with all nodes connected. The walktrap community makes the most sense as the more representative graph as we already know nodes are connected through their association with the respondent. In this case changing the number of steps does affect theses results and confirms our expectations.
```{r}
plot(comm.wt,ig.net)
```