Data Analytics and Computational Social Science: Network Analysis Exploration - Harry Potter Version

Lissie Bates-Haus, Ph.D.

#This code chunk will contain all necessary libraries and packages for this analysis.

library(dplyr)
library(ggplot2)
library(igraph)
library(igraphdata)
library(statnet)

First, I downloaded the complete Harry Potter Support Networks dataset zipfile by Goele Bossaert and Nadine Meidert.

Note: the following commentary was offered by Professor Sniders:

Goele Bossaert and Nadine Meidert have coded the support ties between 64 characters in the well-known books about Harry Potter. They analyzed this by Siena; their findings were published in Goele Bossaert and Nadine Meidert (2013). ‘We are only as strong as we are united, as weak as we are divided’. A dynamic analysis of the peer support networks in the Harry Potter books. Open Journal of Applied Sciences, Vol. 3 No. 2, pp. 174-185. DOI: http://dx.doi.org/10.4236/ojapps.2013.32024

Bossart and Meidert described the coding process in their article (p. 176) as follows:

“Contact between the 64 Hogwarts students was coded as peer support when one of the four types of peer support, described in Tardy’s model, were found: 1) Student A supports student B emotionally, e.g., in Book 1: Harry, Ron and Hermione assure Neville that he is definitely a Gryffindor when he doubts he is not brave enough to be part of the house; 2) Student A gives students B instrumental help; e.g., in Book 1: Fred and George Weasley help Harry Potter to get his trunk into the compartment of the Hogwarts Express; 3) Student A gives student B certain information to help student B, e.g., in Book 1: Hermione Granger helps Harry Potter with his homework and; 4) Student A praises student B, e.g., in book 5: Terry Boot praises Hermione Granger, for doing a Protean Charm, which is advanced magic.”

They made the data available for general use. It must be noted that their paper shows that the data are quite heterogeneous over time. This must be dealt with by specifying a time-heterogeneous model, or by analyzing only a small number of (consecutive) waves.

Load data:

#Set my wd to my own path to the data
setwd("~/DACCS R/Networks/HP Data/bossaert_meidert_harrypotter")

# Read data
book1 <- as.matrix(read.table("hpbook1.txt"))
book2 <- as.matrix(read.table("hpbook2.txt"))
book3 <- as.matrix(read.table("hpbook3.txt"))
book4 <- as.matrix(read.table("hpbook4.txt"))
book5 <- as.matrix(read.table("hpbook5.txt"))
book6 <- as.matrix(read.table("hpbook6.txt"))
hp.attributes <- as.matrix(read.table("hpattributes.txt", header=TRUE))
hp.name <- as.matrix(read.table("hpnames.txt", header=TRUE))

Next, I need to create network objects.

#book1.ig <- graph.adjacency(book1, directed=T)
book1.ig <- graph.adjacency(book1)
book2.ig <- graph.adjacency(book2)
book3.ig <- graph.adjacency(book3)
book4.ig <- graph.adjacency(book4)
book5.ig <- graph.adjacency(book5)
book6.ig <- graph.adjacency(book6)

book1.stat <- as.network.matrix(book1)
book2.stat <- as.network.matrix(book2)
book3.stat <- as.network.matrix(book3)
book4.stat <- as.network.matrix(book4)
book5.stat <- as.network.matrix(book5)
book6.stat <- as.network.matrix(book6)

For the purposes of this analysis, I will be focusing only on Book 4, The Goblet of Fire. First, we’ll look at some basic network information:

vcount(book4.ig)

[1] 64

ecount(book4.ig)

[1] 49

print(book4.stat)

 Network attributes:
  vertices = 64 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 34 
    missing edges= 0 
    non-missing edges= 34 

 Vertex attribute names: 
    vertex.names 

No edge attributes

So basic information: this network has 64 verticies. BUT - the edge count returned by igraph vs statnet is different? Why is this? Is this a function of this being a directed network?

Now we’ll look at metrics in a different way:

is_bipartite(book4.ig)

[1] FALSE

is_directed(book4.ig)

[1] TRUE

is_weighted(book4.ig)

[1] FALSE

Now we’ll take a look at vertex and edge attributes.

#display vertex attributes for igraph object
vertex_attr_names(book4.ig)

[1] "name"

#display edge attributes for igraph object
edge_attr_names(book4.ig)

character(0)

Now, I also have these available to me in two dataframes: hp.attributes and hp.name. I think part of this project will be figuring out how to access it!

Now I’m going to try and understand something about the components of my network:

names(igraph::components(book4.ig))

[1] "membership" "csize"      "no"

igraph::components(book4.ig)$no    #Number of components

[1] 53

igraph::components(book4.ig)$csize #Size of each component

 [1]  1  1  1  1  1  1 12  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[23]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[45]  1  1  1  1  1  1  1  1  1

So from here we can see that we have 53 components, one of which has 12 members, all the rest of which, for this particular book, have 1.

Let’s look at the isolates:

#retrieve isolates
isolates(book4.stat)

 [1]  1  2  3  4  5  6 10 11 12 14 15 16 17 18 22 23 24 27 28 29 30 31
[23] 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 53 54 55
[45] 57 58 59 60 61 62 63 64

#subset vertex.names attribute to get names of isolates
as.vector(book4.stat%v%'vertex.names')[c(isolates(book4.stat))]

 [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V10" "V11" "V12" "V14" "V15"
[12] "V16" "V17" "V18" "V22" "V23" "V24" "V27" "V28" "V29" "V30" "V31"
[23] "V32" "V33" "V34" "V35" "V36" "V37" "V38" "V39" "V40" "V41" "V42"
[34] "V43" "V45" "V46" "V47" "V48" "V49" "V50" "V51" "V53" "V54" "V55"
[45] "V57" "V58" "V59" "V60" "V61" "V62" "V63" "V64"

So this tells us that the components that are NOT isolates are: 7, 8, 9, 13, 19, 20, 21, 25, 26, 44, 52, 56

Now, here’s the issue - in our network objects, we don’t have the actual names listed; they’re in a separate dataframe. How might we pull that in?

We have book4 as an adjacency matrix and we have the names as a separate dataframe. Can we combine them and make an edgelist? I found the documentation for igraph, so I’m going to try this:

edge4 <- graph.adjacency(book4)
book4_edge <- get.edgelist(edge4)

When I look at book4_edge, however, I have some confusion, because for every variable that is not an isolate, there is a directed edge with itself, and this seems a bit wonky to me.

plot(edge4)

And for some of these, the only reason these components are not isolates is because of the self-edge.

When I look back up at my network description, it says LOOPS are false. So shall I rerun this without loops? Given the context of the connection (support not just interaction), that seems like it would make sense.

simple4 <- igraph::simplify(book4.ig, remove.loops = TRUE)
is_directed(simple4)

[1] TRUE

plot(simple4)

simpl4_edge <- get.edgelist(simple4)

Let’s look at components of our simple graph now:

names(igraph::components(simple4))

[1] "membership" "csize"      "no"

igraph::components(simple4)$no    #Number of components

[1] 53

igraph::components(simple4)$csize #Size of each component

 [1]  1  1  1  1  1  1 12  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[23]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[45]  1  1  1  1  1  1  1  1  1

Huh. So removing the loops didn’t change that.

I wonder if there’s a way to plot just the component with connections, and not the isolates?

Okay, let’s see if we can create an edgelist of just the components who are not isolates. (I did find a person doing this but it looks like the function in igraph has been removed)

# Which vertices are not isolates?

Isolated = which(degree(book4)==0)
Isolated

 [1]  1  2  3  4  5  6 10 11 12 14 15 16 17 18 22 23 24 27 28 29 30 31
[23] 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 53 54 55
[45] 57 58 59 60 61 62 63 64

#simple4_sub = delete.vertices(book4.stat, Isolated)
#plot(simple4_sub)

OK, I am going to start over here with Book 4, and see if I can remove the loops in the creation of the igraph object!

book4a.ig <- graph.adjacency(book4, diag = 0)
plot(book4a.ig)

simple4ege <- get.edgelist(book4a.ig)
simple4a.ig <- graph.edgelist(simple4ege)
plot(simple4a.ig)

OKAY this looks more interesting!! Now, the question is, how do we get the Nodes to read student name instead of V7, V9 etc.

V(simple4a.ig)$name

 [1] "V7"  "V8"  "V25" "V9"  "V13" "V19" "V20" "V21" "V26" "V56" "V44"
[12] "V52"

Okay, so I’m going to try and replace the column names:

colnames(work4)

 [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11"
[12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20" "V21" "V22"
[23] "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33"
[34] "V34" "V35" "V36" "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44"
[45] "V45" "V46" "V47" "V48" "V49" "V50" "V51" "V52" "V53" "V54" "V55"
[56] "V56" "V57" "V58" "V59" "V60" "V61" "V62" "V63" "V64"

as.vector(names)

# A tibble: 1 × 64
  ` 1`     ` 2`  ` 3`  ` 4`  ` 5`  ` 6`  ` 7`  ` 8`  ` 9`  `10`  `11` 
  <chr>    <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Adrian … Alic… Ange… Anth… Blai… C. W… Cedr… Cho … Coli… Corm… Dean…
# … with 53 more variables: `12` <chr>, `13` <chr>, `14` <chr>,
#   `15` <chr>, `16` <chr>, `17` <chr>, `18` <chr>, `19` <chr>,
#   `20` <chr>, `21` <chr>, `22` <chr>, `23` <chr>, `24` <chr>,
#   `25` <chr>, `26` <chr>, `27` <chr>, `28` <chr>, `29` <chr>,
#   `30` <chr>, `31` <chr>, `32` <chr>, `33` <chr>, `34` <chr>,
#   `35` <chr>, `36` <chr>, `37` <chr>, `38` <chr>, `39` <chr>,
#   `40` <chr>, `41` <chr>, `42` <chr>, `43` <chr>, `44` <chr>, …

colnames(work4) <- names
colnames(work4)

 [1] "Adrian Pucey"           "Alicia Spinnet"        
 [3] "Angelina Johnson"       "Anthony Goldstein"     
 [5] "Blaise Zabini"          "C. Warrington"         
 [7] "Cedric Diggory"         "Cho Chang"             
 [9] "Colin Creevey"          "Cormac McLaggen"       
[11] "Dean Thomas"            "Demelza Robins"        
[13] "Dennis Creevey"         "Draco Malfoy"          
[15] "Eddie Carmichael"       "Eleanor Branstone"     
[17] "Ernie Macmillan"        "Euan Abercrombie"      
[19] "Fred Weasley"           "George Weasley"        
[21] "Ginny Weasley"          "Graham Pritchard"      
[23] "Gregory Goyle"          "Hannah Abbott"         
[25] "Harry James Potter"     "Hermione Granger"      
[27] "Jimmy Peakes"           "Justin Finch-Fletchley"
[29] "Katie Bell"             "Kevin Whitby"          
[31] "Lavender Brown"         "Leanne"                
[33] "Lee Jordan"             "Lucian Bole"           
[35] "Luna Lovegood"          "Malcolm Baddock"       
[37] "Mandy Brocklehurst"     "Marcus Belby"          
[39] "Marcus Flint"           "Michael Corner"        
[41] "Miles Bletchley"        "Millicent Bulstrode"   
[43] "Natalie McDonald"       "Neville Longbottom"    
[45] "Oliver Wood"            "Orla Quirke"           
[47] "Owen Cauldwell"         "Padma Patil"           
[49] "Pansy Parkinson"        "Parvati Patil"         
[51] "Penelope Clearwater"    "Percy Weasley"         
[53] "Peregrine Derrick"      "Roger Davies"          
[55] "Romilda Vane"           "Ronald Weasley"        
[57] "Rose Zeller"            "Seamus Finnigan"       
[59] "Stewart Ackerley"       "Susan Bones"           
[61] "Terry Boot"             "Theodore Nott"         
[63] "Vincent Crabbe"         "Zacharias Smith"

HOLY CRAP I THINK I DID IT

Now to try and plot only the non-isolates of this network.

#work4 is our working dataframe - convert to a matrix

work4 <- as.matrix(work4)

#convert to graph object
work4.ig <- graph.adjacency(work4, diag = 0)
plot(work4.ig)

#now we're going to get only the connected part of the graph
work4edge <- get.edgelist(work4.ig)
work4_simple.ig <- graph.edgelist(work4edge)
plot(work4_simple.ig)

YES !!! It only took me 5 hours to do this HAHA!! But dang, that is an ugly plot.

Now I’m going to fool around a bit with the plotting to see if I can make it prettier.

library("igraph") 
library("network") 
library("sna")
library("ggraph")
library("visNetwork")
library("threejs")
library("networkD3")
library("ndtv")

#layout = 'linear', circular = TRUE
#, start_cap = label_rect(node1.name), end_cap = label_rect(node2.name) end_cap = circle(3, 'mm')

ggraph(work4_simple.ig, layout = 'graphopt') +
  geom_edge_link(aes(start_cap = label_rect(node1.name), end_cap = label_rect(node2.name)), arrow = arrow(length = unit(4, 'mm'))) + 
  scale_edge_alpha('Edge direction', guide = 'edge_direction') + 
  geom_node_point() +
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void()

Okay, I need to let this go for a moment. It’s making me nuts though, how ugly this is.

Apparently I am unable to let it go.

ggraph(work4_simple.ig,  layout = 'lgl') +
  geom_edge_arc(color="gray", strength=0.3, arrow = arrow(length = unit(4, 'mm'))) +            
  geom_node_point() +     
  geom_node_text(aes(label = name), size=3, repel=T) +
  theme_void() +
  labs(title = "Student Support Network in Harry Potter and the Goblet of Fire")

That’s a tiny bit prettier, I can move on now. I don’t love it.

I’m going try one more thing - a different layout:

ggraph(work4_simple.ig,  layout = 'kk') +
  geom_edge_arc(color="gray", strength=0.3, arrow = arrow(length = unit(4, 'mm'))) +            
  geom_node_point() +     
  geom_node_text(aes(label = name), size=3, repel=T) +
  theme_void() +
  labs(title = "Student Support Network in Harry Potter and the Goblet of Fire")

Enh, that doesn’t change anything, does it?

Okay, now back to working my way through the tutorials.

Network Structures

First we’ll go back to our overall view of the network and run a dyad census - this is information we have in a slightly different format from the look at isolates.

book4.stat <- as.network.matrix(book4)
print(book4.stat)

 Network attributes:
  vertices = 64 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 34 
    missing edges= 0 
    non-missing edges= 34 

 Vertex attribute names: 
    vertex.names 

No edge attributes

sna::dyad.census(book4.stat)

     Mut Asym Null
[1,]  12   10 1994

Because this is a directed network, this dyad information reconciles with our edge information. (12 x 2) + 10 =34

I’m going to rerun a look at the components here as well:

igraph::components(book4.ig)$no    #Number of components

[1] 53

#retrieve isolates
isolates(book4.stat)

 [1]  1  2  3  4  5  6 10 11 12 14 15 16 17 18 22 23 24 27 28 29 30 31
[23] 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 53 54 55
[45] 57 58 59 60 61 62 63 64

Now we’ll run a triad census:

sna::triad.census(book4.stat)

       003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 40378 556 668    0    5    5   21   11    0    0   4    1    4
     120C 210 300
[1,]    1   0  10

I’m not entirely sure how to interpret these numbers.

Transitivity or Global Clustering

Now we’ll take a look at transitivity:

#get network transitivity: igraph
transitivity(book4.ig)

[1] 0.5106383

Our total possible number of triads is 41,664 (or 64 choose 3), and clearly the majority of these are empty (96.9% of them in fact). Per our homework: Global transitivity “…is a network-level statistic that captures the proportion of connected triads in the network that are complete (all three possible links present), and varies between 0 and 1, with 0 meaning none of the connected triads are transitive and 1 meaning all connected triads are transitive.” In iGraph, the direction of the connections is ignored. What this tells us is that just over half of our connected triads are complete.

Because this is a directed network, we can also get the hierarchical network transitivity score:

#get hierarchical network transitivity:statnet
gtrans(book4.stat)

[1] 0.6016949

Because my vertices aren’t named, I’m going to use my work4 matrix instead, so I can look at some local transitivity. Because the majority of my components in my network are isolates, I do expect these all to be low.

#Retrieve a list of the three vertices we are interested in
V(work4.ig)[c("Harry James Potter","Hermione Granger", "Ronald Weasley")]

+ 3/64 vertices, named, from c9694a2:
[1] Harry James Potter Hermione Granger   Ronald Weasley

#check ego network transitivity: igraph
transitivity(work4.ig, type="local", vids=V(work4.ig)[c("Harry James Potter","Hermione Granger", "Ronald Weasley")])

[1] 0.2444444 0.8000000 0.6000000

It is interesting to me that Harry’s local transitivity (a technical description of how many of your friends are also friends with each other) score is so much lower than Ron and Hermione’s, but I think that can be explained by the type of network this is, which is not just a friendship network but a support network, meaning that the people who are offering Harry some form of support, are not necessarily offering that support to each other, whereas that seems more likely for Ron and Hermione.

Now we’ll compare global and average local transitivity:

#get global clustering cofficient: igraph
transitivity(work4.ig, type="global")

[1] 0.5106383

#get average local clustering coefficient: igraph
transitivity(work4.ig, type="average")

[1] 0.840404

Our average local transitivity is much higher than our global transitivity.

Now I’m going to calculate the distance between Harry and Ginny in this network (keep in mind, they don’t start dating until Book 6). Our edges are already unweighted, so we do not need to specify that:

#Calculate distances between two nodes
distances(work4.ig,"Harry James Potter","Ginny Weasley")

                   Ginny Weasley
Harry James Potter             2

At this stage of the series, there is no direct support connection between Harry and Ginny.

Now we’ll start looking at measures of degree in this network.

I think for these exercises, I’ll use the book4a.ig object, which has self-references removed. I’m not sure why that’s in there, to be honest.

book4a.ig <- graph.adjacency(book4, diag = 0)

First we’ll get network density from igraph and compare it to our statnet results in the original data:

graph.density(book4.ig)

[1] 0.01215278

graph.density(book4a.ig)

[1] 0.00843254

network.density(book4.stat)

[1] 0.00843254

So when we compare the original objects, we get a different result because of the loops present in the original data. When we run the function on the data with the loops removed, the results are identical.

Also, these numbers look very low to me, which makes sense because most of network is isolated. If we run the numbers on our simplified network, we get the following:

graph.density(work4_simple.ig)

[1] 0.2575758

When we remove the isolates, the network is obviously substantially denser.

Vertex Degree

#Calculate average network degree: igraph WITH LOOPS
igraph::degree(book4.ig)

 V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 
  0   0   0   0   0   0   5   4   5   0   2   0   5   2   0   0   0 
V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 
  0  11  10   3   0   0   0  17  11   0   0   0   0   0   0   0   0 
V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51 
  0   0   0   0   0   0   0   0   0   5   0   0   0   0   0   0   0 
V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64 
  4   0   2   0  12   0   0   0   0   0   0   0   0

#Calculate average network degree: igraph WITHOUT LOOPS
igraph::degree(book4.ig, loops=FALSE)

 V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 
  0   0   0   0   0   0   3   2   3   0   0   0   3   0   0   0   0 
V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 
  0   9   8   1   0   0   0  15   9   0   0   0   0   0   0   0   0 
V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51 
  0   0   0   0   0   0   0   0   0   3   0   0   0   0   0   0   0 
V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64 
  2   0   0   0  10   0   0   0   0   0   0   0   0

#Calculate average network degree: statnet
sna::degree(book4.stat)

 [1]  0  0  0  0  0  0  3  2  3  0  0  0  3  0  0  0  0  0  9  8  1  0
[23]  0  0 15  9  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  3
[45]  0  0  0  0  0  0  0  2  0  0  0 10  0  0  0  0  0  0  0  0

I’m going to rerun this with the work4 object so we can have vertex names! For the sake of brevity, I’m only running the analysis without loops.

#Calculate average network degree: igraph WITH LOOPS
#igraph::degree(work4.ig)

#Calculate average network degree: igraph WITHOUT LOOPS
igraph::degree(work4.ig, loops=FALSE)

          Adrian Pucey         Alicia Spinnet       Angelina Johnson 
                     0                      0                      0 
     Anthony Goldstein          Blaise Zabini          C. Warrington 
                     0                      0                      0 
        Cedric Diggory              Cho Chang          Colin Creevey 
                     3                      2                      3 
       Cormac McLaggen            Dean Thomas         Demelza Robins 
                     0                      0                      0 
        Dennis Creevey           Draco Malfoy       Eddie Carmichael 
                     3                      0                      0 
     Eleanor Branstone        Ernie Macmillan       Euan Abercrombie 
                     0                      0                      0 
          Fred Weasley         George Weasley          Ginny Weasley 
                     9                      8                      1 
      Graham Pritchard          Gregory Goyle          Hannah Abbott 
                     0                      0                      0 
    Harry James Potter       Hermione Granger           Jimmy Peakes 
                    15                      9                      0 
Justin Finch-Fletchley             Katie Bell           Kevin Whitby 
                     0                      0                      0 
        Lavender Brown                 Leanne             Lee Jordan 
                     0                      0                      0 
           Lucian Bole          Luna Lovegood        Malcolm Baddock 
                     0                      0                      0 
    Mandy Brocklehurst           Marcus Belby           Marcus Flint 
                     0                      0                      0 
        Michael Corner        Miles Bletchley    Millicent Bulstrode 
                     0                      0                      0 
      Natalie McDonald     Neville Longbottom            Oliver Wood 
                     0                      3                      0 
           Orla Quirke         Owen Cauldwell            Padma Patil 
                     0                      0                      0 
       Pansy Parkinson          Parvati Patil    Penelope Clearwater 
                     0                      0                      0 
         Percy Weasley      Peregrine Derrick           Roger Davies 
                     2                      0                      0 
          Romilda Vane         Ronald Weasley            Rose Zeller 
                     0                     10                      0 
       Seamus Finnigan       Stewart Ackerley            Susan Bones 
                     0                      0                      0 
            Terry Boot          Theodore Nott         Vincent Crabbe 
                     0                      0                      0 
       Zacharias Smith 
                     0

Now I’m going to create the dataset of Vertex names and degrees using the work4.ig object:

# create a dataset of vertex names and degree: igraph
work4.nodes<-data.frame(name=V(work4.ig)$name, degree=igraph::degree(work4.ig))

Now we’ll calculate in- and out-degree of our nodes, as this is a directed network, and attach those numbers to the dataframe we just created.

#igraph version:
work4.nodes <- work4.nodes %>%
    mutate(indegree=igraph::degree(work4.ig, mode="in", loops=FALSE),
           outdegree=igraph::degree(work4.ig, mode="out", loops=FALSE))

Now we’ll look at the summary statistics:

#get summary statistics for node attributes
summary(work4.nodes)

     name               degree          indegree     
 Length:64          Min.   : 0.000   Min.   :0.0000  
 Class :character   1st Qu.: 0.000   1st Qu.:0.0000  
 Mode  :character   Median : 0.000   Median :0.0000  
                    Mean   : 1.062   Mean   :0.5312  
                    3rd Qu.: 0.000   3rd Qu.:0.0000  
                    Max.   :15.000   Max.   :9.0000  
   outdegree     
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.5312  
 3rd Qu.:0.0000  
 Max.   :6.0000

#create a histogram of HP Support Indegree
hist(work4.nodes$indegree, main="Harry Potter: GoF In-degree Distribution", xlab="Instances of Support")

hist(work4.nodes$outdegree, main="Harry Potter: GoF Out-degree Distribution", xlab="Instances of Support")

Should the mean for in and out-degree be the same?

Again, because so many of our nodes are isolates, obviously our data are skewed.

work4.stat <- as.network.matrix(work4)

#get network centralization score: igraph
centr_degree(work4.ig, loops = FALSE, mode="in")$centralization

[1] 0.1365583

centr_degree(work4.ig, loops = FALSE, mode="out")$centralization

[1] 0.08818342

Highest support connections (In/Out/Total)

#sort the vector of nodes in the work4 network
arrange(work4.nodes, desc(indegree))%>%slice(1:5)

                                 name degree indegree outdegree
Harry James Potter Harry James Potter     15        9         6
Ronald Weasley         Ronald Weasley     10        5         5
Fred Weasley             Fred Weasley      9        4         5
George Weasley         George Weasley      8        4         4
Hermione Granger     Hermione Granger      9        4         5

arrange(work4.nodes, desc(outdegree))%>%slice(1:5)

                                 name degree indegree outdegree
Harry James Potter Harry James Potter     15        9         6
Fred Weasley             Fred Weasley      9        4         5
Hermione Granger     Hermione Granger      9        4         5
Ronald Weasley         Ronald Weasley     10        5         5
George Weasley         George Weasley      8        4         4

arrange(work4.nodes, desc(degree))%>%slice(1:5)

                                 name degree indegree outdegree
Harry James Potter Harry James Potter     15        9         6
Ronald Weasley         Ronald Weasley     10        5         5
Fred Weasley             Fred Weasley      9        4         5
Hermione Granger     Hermione Granger      9        4         5
George Weasley         George Weasley      8        4         4

Unsurprisingly. Harry tops the list on all three measures.

Status

Now we can calculate the eigenvector centrality and attach it to our nodes df:

#igraph version:
#work4.nodes <- work4.nodes %>%
#    mutate(eigen = igraph::centr_eigen(work4.ig, directed = T))

eigen <- centr_eigen(work4.ig, directed = T)
eigen$vector

 [1] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
 [7] 0.24603669 0.06053405 0.00000000 0.00000000 0.00000000 0.00000000
[13] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[19] 0.93946595 0.93946595 0.23114309 0.00000000 0.00000000 0.00000000
[25] 1.00000000 0.93946595 0.00000000 0.00000000 0.00000000 0.00000000
[31] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[37] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[43] 0.00000000 0.70832286 0.00000000 0.00000000 0.00000000 0.00000000
[49] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[55] 0.00000000 0.93946595 0.00000000 0.00000000 0.00000000 0.00000000
[61] 0.00000000 0.00000000 0.00000000 0.00000000

eigen$centralization

[1] 0.920573

work4.eigen<-cbind(eigen$vector,V(work4.ig)$name)

#top 5
class(work4.eigen)

[1] "matrix" "array"

Attempt to figure out top 5 of eigenvector centrality

work4.eigenDF <- as.data.frame(work4.eigen)
class(work4.eigenDF)

[1] "data.frame"

arrange(work4.eigenDF, desc(V1))%>%slice(1:5)

                 V1                 V2
1                 1 Harry James Potter
2 0.939465949524624       Fred Weasley
3 0.939465949524624     George Weasley
4 0.939465949524624   Hermione Granger
5 0.939465949524624     Ronald Weasley

Let’s add this to our nodes df

work4.nodes <- work4.nodes %>%
    mutate(eigenCentrality = work4.eigenDF$V1)

Bonacich Power Centrality

#work4.BPC<-power_centrality(work4.ig)
#head(work4.BPC)

This fails, with the error: “Error in .solve.dgC(a, as(b,”denseMatrix“), tol = tol, sparse = sparse) : cs_lu(A) failed: near-singular A (or out of memory)” so I’m going to try and run it on work4_simple.ig

#work4.BPC<-power_centrality(work4_simple.ig,rescale=T)
#head(work4.BPC)

I’m not sure why this is failing, so I’m going to move on, and maybe come back to this.

Now I’m going to try and calculate our derived and reflected centrality and attach that to our work4.nodes df [Note: not sure this is appropriate for this type of network?? Let’s see what happens!]

Note from Week 4 slides:

Eigenvector Centrality taps into two different types of centrality
- A few ties to central actors (Bridge)
- Many ties to peripheral actors (Hub)
Disaggregate Eigenvector Centrality of r(i,j)
- Reflected Centrality: The centrality that i receives from j that is attributable to i’s contribution to j’s score
- Derived Centrality – The remainder of the centrality that i receives from j

#Based on code from NetStatusSolutions.rmd

work4_adjacency<-as.matrix(as_adjacency_matrix(work4.ig))

#work4_adjacency
work4_adjacency2 <- work4_adjacency %*% work4_adjacency

#Calculate the proportion of reflected centrality.
rc<-diag(as.matrix(work4_adjacency2))/rowSums(as.matrix(work4_adjacency2))
rc<-ifelse(is.nan(rc),0,rc)
#rc

#Calculate the proportion of derived centrality.

dc<-1-diag(as.matrix(work4_adjacency2))/rowSums(as.matrix(work4_adjacency2))
  dc<-ifelse(is.nan(dc),1,dc)
#dc

Let’s add these to our nodes matrix:

rc <- as.data.frame(rc)
dc <- as.data.frame(dc)

work4.nodes <- work4.nodes %>%
    mutate(reflectedCentrality = rc)

work4.nodes <- work4.nodes %>%
    mutate(derivedCentrality = dc)

So, my question is, given that so many of my nodes are isolates - they neither offered nor received any sort of support in this book - is reflected and derived centrality an appropriate measure? Or is it only meaningful for those nodes that actually have ties to other nodes? I’m going with yes, but I’d love feedback on this. (Honestly, I think I’m still a bit confused on what reflected vs derived centrality is)

Brokerage

#load necessary libraries

library(corrr)
library(igraph)
library(statnet)
library(tidyverse)

For the purposes of this exercise, I’m going to create an adjacency matrix of only those nodes that have ties to each other (exclude the isolates):

#First attempt - guessing I actually already did this, oh well!

old4 <- work4_adjacency
old4.ig <- graph.adjacency(old4)
class(old4.ig)

[1] "igraph"

new4.ig <- igraph::delete.vertices(old4.ig, which(igraph::degree(old4.ig)==0))
plot(new4.ig)

Calculate closeness centrality

The closeness centrality of a node is defined as the sum of the geodesic distances between that node and all other nodes in a network.

#calculate closeness centrality: igraph
closeness4 <- igraph::closeness(new4.ig, mode = "all")  #made the choice to use undirected even though this is a directed graph
closeness4 <- as.data.frame(closeness4)
arrange(closeness4, desc(closeness4))

                   closeness4
Harry James Potter 0.08333333
Ronald Weasley     0.06250000
Fred Weasley       0.05882353
Hermione Granger   0.05882353
George Weasley     0.05555556
Neville Longbottom 0.05000000
Cedric Diggory     0.04761905
Cho Chang          0.04761905
Colin Creevey      0.04761905
Dennis Creevey     0.04761905
Percy Weasley      0.04761905
Ginny Weasley      0.03703704

close <- igraph::closeness(old4.ig, mode = "all")
work4.nodes <- work4.nodes %>% mutate(closeness = close)

Honestly, I’m not really sure how to interpret these results. I am assuming this means that Harry Potter is the closest to all other nodes in the network, which makes sense.

Network Closeness Centralization

#calculate closeness centralization index: igraph
igraph::centr_clo(new4.ig, mode = "all")$centralization

[1] 0.7472467

#had to make sure you specify mode = all, meaning treat it as undirected

Note: I tried to run this on the network object with isolates not removed and it returned NaN. I do not know why this is.

#calculate closeness centralization index: igraph
igraph::centr_clo(work4.ig, mode = "all")$centralization

[1] NaN

Calculate Betweenness Centrality

Betweenness is defined as the number of geodesics on which a node sits. Remember, a geodesic is (if any) the shortest path(s) between any two nodes.

#calculate betweenness centrality: igraph
igraph::betweenness(new4.ig, directed=TRUE)

    Cedric Diggory          Cho Chang      Colin Creevey 
         8.0000000          0.0000000          0.0000000 
    Dennis Creevey       Fred Weasley     George Weasley 
         0.0000000          9.0000000          0.0000000 
     Ginny Weasley Harry James Potter   Hermione Granger 
         0.0000000         42.1666667          0.6666667 
Neville Longbottom      Percy Weasley     Ronald Weasley 
         0.0000000          0.0000000          3.1666667

between <- igraph::betweenness(old4.ig, directed=TRUE)

work4.nodes <- work4.nodes %>% mutate(between = between)

Okay… I really don’t know what to make of this. Does this mean that those nodes with a betweenness of 0 are pendants? Let’s take a look at the graph again and see if it makes sense:

ggraph(work4_simple.ig,  layout = 'kk') +
  geom_edge_arc(color="gray", strength=0.3, arrow = arrow(length = unit(4, 'mm'))) +            
  geom_node_point() +     
  geom_node_text(aes(label = name), size=3, repel=T) +
  theme_void() +
  labs(title = "Student Support Network in Harry Potter and the Goblet of Fire")

Oh wait! Geodesic is the SHORTEST path. So, while, for example, Cho is on a path betwen Cedric and Harry,it’s not the SHORTEST path. I wonder why Fred gets a 9 and George gets a 0?

Betweenness Centralization

#calculate betweenness centralization index: igraph
centr_betw(new4.ig,directed=F)$centralization

[1] 0.6214876

Calculate Network Constraint (Burt)

Constraint is a measure of the redundancy of a node’s connections. It is bound between 0 and 1, with 0 being a complete lack, and 1 being complete redundancy. The constraint() function is only available in igraph, and automatically uses the available weight vector.

constraint(new4.ig)

    Cedric Diggory          Cho Chang      Colin Creevey 
         0.8371605          1.0155556          0.7832099 
    Dennis Creevey       Fred Weasley     George Weasley 
         0.7832099          0.5210288          0.6270679 
     Ginny Weasley Harry James Potter   Hermione Granger 
         1.0000000          0.3173394          0.5873297 
Neville Longbottom      Percy Weasley     Ronald Weasley 
         0.6264472          0.6811111          0.5378099

I am particularly struck by the fact that Cho’s constraint score is over 1.

Gould-Fernandez Brokerage

#First, I need to create a statnet object.

old4_adj <- as.matrix(graph.adjacency(old4))
new4_adj <- as.matrix(as_adjacency_matrix(new4.ig))

old4.stat <- as.network.matrix(old4_adj)
new4.stat <- as.network.matrix(new4_adj)

print(new4.stat)

 Network attributes:
  vertices = 12 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 34 
    missing edges= 0 
    non-missing edges= 34 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Now I’m going to try and create brokerage scores:

brokerage(new4.stat, cl = work4.nodes$name)

#for the sake of brevity I'm going to try and hide the results in the final document, but this DOES run! huzzah!

#return matrix of standardized brokerage scores
brokerage(new4.stat, cl = work4.nodes$name)$z.nli

                   w_I w_O b_IO b_OI        b_O          t
Cedric Diggory     NaN NaN  NaN  NaN -11.819463 -1.0136651
Cho Chang          NaN NaN  NaN  NaN -11.881217 -1.2430947
Colin Creevey      NaN NaN  NaN  NaN -11.881217 -1.2430947
Dennis Creevey     NaN NaN  NaN  NaN -11.881217 -1.2430947
Fred Weasley       NaN NaN  NaN  NaN -11.634200 -0.3253764
George Weasley     NaN NaN  NaN  NaN -11.881217 -1.2430947
Ginny Weasley      NaN NaN  NaN  NaN -11.881217 -1.2430947
Harry James Potter NaN NaN  NaN  NaN  -9.781574  6.5575109
Hermione Granger   NaN NaN  NaN  NaN -11.757709 -0.7842355
Neville Longbottom NaN NaN  NaN  NaN -11.881217 -1.2430947
Percy Weasley      NaN NaN  NaN  NaN -11.881217 -1.2430947
Ronald Weasley     NaN NaN  NaN  NaN -11.510692  0.1334828

Okay, now I really don’t know what’s going on. Why is this returning NaN? Are the only types of brokerage here Liason? How is this calculated? I HAVE QUESTIONS!

Now I’m going to calculate on the entire network and add to the Nodes df

#return matrix of standardized brokerage scores
brokerage(old4.stat, cl = work4.nodes$name)$z.nli

                       w_I w_O b_IO b_OI        b_O          t
Adrian Pucey           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Alicia Spinnet         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Angelina Johnson       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Anthony Goldstein      NaN NaN  NaN  NaN -0.3700275 -0.3700275
Blaise Zabini          NaN NaN  NaN  NaN -0.3700275 -0.3700275
C. Warrington          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Cedric Diggory         NaN NaN  NaN  NaN  0.9735500  0.9735500
Cho Chang              NaN NaN  NaN  NaN -0.3700275 -0.3700275
Colin Creevey          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Cormac McLaggen        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Dean Thomas            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Demelza Robins         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Dennis Creevey         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Draco Malfoy           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Eddie Carmichael       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Eleanor Branstone      NaN NaN  NaN  NaN -0.3700275 -0.3700275
Ernie Macmillan        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Euan Abercrombie       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Fred Weasley           NaN NaN  NaN  NaN  5.0042826  5.0042826
George Weasley         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Ginny Weasley          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Graham Pritchard       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Gregory Goyle          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Hannah Abbott          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Harry James Potter     NaN NaN  NaN  NaN 45.3116087 45.3116087
Hermione Granger       NaN NaN  NaN  NaN  2.3171276  2.3171276
Jimmy Peakes           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Justin Finch-Fletchley NaN NaN  NaN  NaN -0.3700275 -0.3700275
Katie Bell             NaN NaN  NaN  NaN -0.3700275 -0.3700275
Kevin Whitby           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Lavender Brown         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Leanne                 NaN NaN  NaN  NaN -0.3700275 -0.3700275
Lee Jordan             NaN NaN  NaN  NaN -0.3700275 -0.3700275
Lucian Bole            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Luna Lovegood          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Malcolm Baddock        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Mandy Brocklehurst     NaN NaN  NaN  NaN -0.3700275 -0.3700275
Marcus Belby           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Marcus Flint           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Michael Corner         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Miles Bletchley        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Millicent Bulstrode    NaN NaN  NaN  NaN -0.3700275 -0.3700275
Natalie McDonald       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Neville Longbottom     NaN NaN  NaN  NaN -0.3700275 -0.3700275
Oliver Wood            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Orla Quirke            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Owen Cauldwell         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Padma Patil            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Pansy Parkinson        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Parvati Patil          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Penelope Clearwater    NaN NaN  NaN  NaN -0.3700275 -0.3700275
Percy Weasley          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Peregrine Derrick      NaN NaN  NaN  NaN -0.3700275 -0.3700275
Roger Davies           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Romilda Vane           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Ronald Weasley         NaN NaN  NaN  NaN  7.6914377  7.6914377
Rose Zeller            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Seamus Finnigan        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Stewart Ackerley       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Susan Bones            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Terry Boot             NaN NaN  NaN  NaN -0.3700275 -0.3700275
Theodore Nott          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Vincent Crabbe         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Zacharias Smith        NaN NaN  NaN  NaN -0.3700275 -0.3700275

So these numbers are different from our smaller network, and for all of them, there is no difference between the liaison brokerage score and the total. I’m quite confused by this!

I’m going to add only the total brokerage score to the nodes df

#return matrix of standardized brokerage scores
brokerage(old4.stat, cl = work4.nodes$name)$z.nli

                       w_I w_O b_IO b_OI        b_O          t
Adrian Pucey           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Alicia Spinnet         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Angelina Johnson       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Anthony Goldstein      NaN NaN  NaN  NaN -0.3700275 -0.3700275
Blaise Zabini          NaN NaN  NaN  NaN -0.3700275 -0.3700275
C. Warrington          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Cedric Diggory         NaN NaN  NaN  NaN  0.9735500  0.9735500
Cho Chang              NaN NaN  NaN  NaN -0.3700275 -0.3700275
Colin Creevey          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Cormac McLaggen        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Dean Thomas            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Demelza Robins         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Dennis Creevey         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Draco Malfoy           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Eddie Carmichael       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Eleanor Branstone      NaN NaN  NaN  NaN -0.3700275 -0.3700275
Ernie Macmillan        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Euan Abercrombie       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Fred Weasley           NaN NaN  NaN  NaN  5.0042826  5.0042826
George Weasley         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Ginny Weasley          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Graham Pritchard       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Gregory Goyle          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Hannah Abbott          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Harry James Potter     NaN NaN  NaN  NaN 45.3116087 45.3116087
Hermione Granger       NaN NaN  NaN  NaN  2.3171276  2.3171276
Jimmy Peakes           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Justin Finch-Fletchley NaN NaN  NaN  NaN -0.3700275 -0.3700275
Katie Bell             NaN NaN  NaN  NaN -0.3700275 -0.3700275
Kevin Whitby           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Lavender Brown         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Leanne                 NaN NaN  NaN  NaN -0.3700275 -0.3700275
Lee Jordan             NaN NaN  NaN  NaN -0.3700275 -0.3700275
Lucian Bole            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Luna Lovegood          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Malcolm Baddock        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Mandy Brocklehurst     NaN NaN  NaN  NaN -0.3700275 -0.3700275
Marcus Belby           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Marcus Flint           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Michael Corner         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Miles Bletchley        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Millicent Bulstrode    NaN NaN  NaN  NaN -0.3700275 -0.3700275
Natalie McDonald       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Neville Longbottom     NaN NaN  NaN  NaN -0.3700275 -0.3700275
Oliver Wood            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Orla Quirke            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Owen Cauldwell         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Padma Patil            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Pansy Parkinson        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Parvati Patil          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Penelope Clearwater    NaN NaN  NaN  NaN -0.3700275 -0.3700275
Percy Weasley          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Peregrine Derrick      NaN NaN  NaN  NaN -0.3700275 -0.3700275
Roger Davies           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Romilda Vane           NaN NaN  NaN  NaN -0.3700275 -0.3700275
Ronald Weasley         NaN NaN  NaN  NaN  7.6914377  7.6914377
Rose Zeller            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Seamus Finnigan        NaN NaN  NaN  NaN -0.3700275 -0.3700275
Stewart Ackerley       NaN NaN  NaN  NaN -0.3700275 -0.3700275
Susan Bones            NaN NaN  NaN  NaN -0.3700275 -0.3700275
Terry Boot             NaN NaN  NaN  NaN -0.3700275 -0.3700275
Theodore Nott          NaN NaN  NaN  NaN -0.3700275 -0.3700275
Vincent Crabbe         NaN NaN  NaN  NaN -0.3700275 -0.3700275
Zacharias Smith        NaN NaN  NaN  NaN -0.3700275 -0.3700275

#add GF brokerage scores to nodes dataframe
temp<-data.frame(brokerage(old4.stat, cl = work4.nodes$name)$z.nli)
work4.nodes <- work4.nodes %>%
  mutate(broker.tot = temp$t)

Correlations

#correlations <- work4.nodes %>% 
#  select(as.numeric(degree),as.numeric(work4.nodes$eigenCentrality), as.numeric(between), as.numeric(broker.tot)) %>%
#  correlate() 
#  %>% rearrange()

#fashion(correlations)

#visualize correlations
#rplot(correlations)

Okay, this is just failing all over the place and I’m not really sure why? I’m going to go back to my small group without isolates and see if I can run these calculations again.

Note to self: use new4.ig, new4.stat

#Create nodes df for new and add all degree
#Confirm degree is numeric

new4.nodes<-data.frame(name=V(new4.ig)$name, degree=igraph::degree(new4.ig))

#add in degree and out degree
new4.nodes <- new4.nodes %>%
    mutate(indegree=igraph::degree(new4.ig, mode="in", loops=FALSE),
           outdegree=igraph::degree(new4.ig, mode="out", loops=FALSE))

#add eigenCentrality

newEigen <- centr_eigen(new4.ig, directed = T)
newEigen$vector

 [1] 2.460367e-01 6.053405e-02 1.054024e-17 0.000000e+00 9.394659e-01
 [6] 9.394659e-01 2.311431e-01 1.000000e+00 9.394659e-01 7.083229e-01
[11] 2.280940e-17 9.394659e-01

newEigen$centralization

[1] 0.5451

new4.eigen<-cbind(newEigen$vector,V(new4.ig)$name)
new4.eigenDF <- data_frame(new4.eigen)

#add to our nodes
new4.nodes <- new4.nodes %>%
    mutate(eigenCentrality = new4.eigen)

Calculate derived and reflected centrality and add to nodes df

#Matrix to 2nd power for two step derived/reflected centrality measurement.
new4_adjacency2<-new4_adj %*% new4_adj

#Calculate the proportion of reflected centrality.

new_rc<-diag(as.matrix(new4_adjacency2))/rowSums(as.matrix(new4_adjacency2))
new_rc<-ifelse(is.nan(new_rc),0,new_rc)
head(new_rc)

Cedric Diggory      Cho Chang  Colin Creevey Dennis Creevey 
     0.1428571      0.0000000      0.1250000      0.1250000 
  Fred Weasley George Weasley 
     0.2000000      0.1904762

#Calculate the proportion of derived centrality.

new_dc<-1-diag(as.matrix(new4_adjacency2))/rowSums(as.matrix(new4_adjacency2))
new_dc<-ifelse(is.nan(new_dc),1,new_dc)
head(new_dc)

Cedric Diggory      Cho Chang  Colin Creevey Dennis Creevey 
     0.8571429      1.0000000      0.8750000      0.8750000 
  Fred Weasley George Weasley 
     0.8000000      0.8095238

#Add to nodes

new4.nodes <- new4.nodes %>%
  mutate(derCentrality = new_dc,
         reflectedCentrality = new_rc)

Can we run the Bonacich Power measure on this?

#igraph::power_centrality(new4.ig)

Nope! There’s something wrong how our data is structured? I guess?

Error text: “Error in .local(a, b, …) : cs_lu(A) failed: near-singular A (or out of memory)” - Google is not helping.

Now we’ll add in our closesness, betweeness and brokerage scores:

#calculate closeness centrality: igraph
closeIG <- igraph::closeness(new4.ig)
head(igraph::closeness(new4.ig))

Cedric Diggory      Cho Chang  Colin Creevey Dennis Creevey 
    0.06666667     0.06250000     0.05000000     0.05000000 
  Fred Weasley George Weasley 
    0.08333333     0.07692308

#calculate closeness centrality: statnet
head(sna::closeness(new4.stat))

[1] 0 0 0 0 0 0

closeIG_df <- data_frame(closeIG)

new4.nodes <- new4.nodes %>%
  mutate(closeness = closeIG_df$closeIG)

So, this is a directed graph, which per our homework is the default for statnet, but we’re getting zeros on this eval.

#calculate closeness centrality: statnet
head(sna::closeness(new4.stat, gmode="graph"))

[1] 0.5238095 0.5238095 0.5238095 0.5238095 0.6470588 0.6111111

if we tell it that it’s an undirected graph, then we get a measurement. I do not know what to make of this, quite honestly,so I’m going to use my igraph results.

I have no idea why Ginny and Neville’s scores are NaN?

#Betweeness Centrality

head(igraph::betweenness(new4.ig, directed=TRUE, weights=NA))

Cedric Diggory      Cho Chang  Colin Creevey Dennis Creevey 
             8              0              0              0 
  Fred Weasley George Weasley 
             9              0

betweenIG <- igraph::betweenness(new4.ig, directed=TRUE, weights=NA)
betweenIG_df <- data_frame(betweenIG)

new4.nodes <- new4.nodes %>%
  mutate(between = betweenIG_df$betweenIG)

Adding constraint

constraintIG <- constraint(new4.ig)
constraintIG_df <- data_frame(constraintIG)
new4.nodes <- new4.nodes %>%
  mutate(constraint = constraintIG_df$constraintIG)

Brokerage:

#return matrix of standardized brokerage scores
head(brokerage(new4.stat, cl = new4.nodes$name)$z.nli)

               w_I w_O b_IO b_OI        b_O          t
Cedric Diggory NaN NaN  NaN  NaN -1.0136651 -1.0136651
Cho Chang      NaN NaN  NaN  NaN -1.2430947 -1.2430947
Colin Creevey  NaN NaN  NaN  NaN -1.2430947 -1.2430947
Dennis Creevey NaN NaN  NaN  NaN -1.2430947 -1.2430947
Fred Weasley   NaN NaN  NaN  NaN -0.3253764 -0.3253764
George Weasley NaN NaN  NaN  NaN -1.2430947 -1.2430947

Quick Reminder of the types of brokerage:

Liaison: all three actors occupy different groups
Representative: one member of a subgroup has role (either assumed or assigned) of communicating/negotiating with outsiders
Gatekeeper: actor screens or gathers resources and distributes them to members of own group
Itinerant/Cosmopolitan: principals belong to same group while intermediary belongs to a different group
Coordinator: all actors belong to the same group

b_O: Liaison role; the broker mediates contact between two individuals from different groups, neither of which is the group to which he or she belongs. Two-path structure: A -> B -> C

In this evaluation, liaison and total brokerage are the same, so we’ll add it to the nodes df as liaison/total

brokerage <- brokerage(new4.stat, cl = new4.nodes$name)$z.nli
brokerage_df <- as.data.frame(brokerage)

new4.nodes <- new4.nodes %>%
  mutate(liaison_total = brokerage_df$b_O)

eigenCentraility has returned as a character, which will not work for statistical evaluations (I’m assuming)

as.numeric(new4.nodes$eigenCentrality)

 [1] 2.460367e-01 6.053405e-02 1.054024e-17 0.000000e+00 9.394659e-01
 [6] 9.394659e-01 2.311431e-01 1.000000e+00 9.394659e-01 7.083229e-01
[11] 2.280940e-17 9.394659e-01           NA           NA           NA
[16]           NA           NA           NA           NA           NA
[21]           NA           NA           NA           NA

Okay I’m not sure why it won’t return as a number? I think I’m going to move on and just see if I can run some correlations. Note: I’m leaving out closeness because it has NaNs and I am not sure how to handle those.

correlations <- new4.nodes %>% 
  select(degree,indegree,outdegree, derCentrality, reflectedCentrality, between, constraint, liaison_total) %>%
  correlate() %>%
  rearrange()

fashion(correlations)

                 term degree indegree outdegree reflectedCentrality
1              degree             .95       .93                 .85
2            indegree    .95                .78                 .71
3           outdegree    .93      .78                           .91
4 reflectedCentrality    .85      .71       .91                    
5       liaison_total    .77      .84       .60                 .51
6             between    .72      .78       .56                 .51
7          constraint   -.87     -.84      -.81                -.70
8       derCentrality   -.85     -.71      -.91               -1.00
  liaison_total between constraint derCentrality
1           .77     .72       -.87          -.85
2           .84     .78       -.84          -.71
3           .60     .56       -.81          -.91
4           .51     .51       -.70         -1.00
5                   .97       -.67          -.51
6           .97               -.63          -.51
7          -.67    -.63                      .70
8          -.51    -.51        .70

#visualize correlations
rplot(correlations)

I’m not 100% sure how to interpret this?

I’m going to stop here and publish this, and then move on to the next two tutorials!

Comment on this article Share:

Network Analysis Exploration - Harry Potter Version