Catching Up in Networks Using the Harry Potter Network Dataset
First, I downloaded the complete Harry Potter Support Networks dataset zipfile by Goele Bossaert and Nadine Meidert.
Note: the following commentary was offered by Professor Sniders:
Goele Bossaert and Nadine Meidert have coded the support ties between 64 characters in the well-known books about Harry Potter. They analyzed this by Siena; their findings were published in Goele Bossaert and Nadine Meidert (2013). ‘We are only as strong as we are united, as weak as we are divided’. A dynamic analysis of the peer support networks in the Harry Potter books. Open Journal of Applied Sciences, Vol. 3 No. 2, pp. 174-185. DOI: http://dx.doi.org/10.4236/ojapps.2013.32024
Bossart and Meidert described the coding process in their article (p. 176) as follows:
“Contact between the 64 Hogwarts students was coded as peer support when one of the four types of peer support, described in Tardy’s model, were found: 1) Student A supports student B emotionally, e.g., in Book 1: Harry, Ron and Hermione assure Neville that he is definitely a Gryffindor when he doubts he is not brave enough to be part of the house; 2) Student A gives students B instrumental help; e.g., in Book 1: Fred and George Weasley help Harry Potter to get his trunk into the compartment of the Hogwarts Express; 3) Student A gives student B certain information to help student B, e.g., in Book 1: Hermione Granger helps Harry Potter with his homework and; 4) Student A praises student B, e.g., in book 5: Terry Boot praises Hermione Granger, for doing a Protean Charm, which is advanced magic.”
They made the data available for general use. It must be noted that their paper shows that the data are quite heterogeneous over time. This must be dealt with by specifying a time-heterogeneous model, or by analyzing only a small number of (consecutive) waves.
Load data:
#Set my wd to my own path to the data
setwd("~/DACCS R/Networks/HP Data/bossaert_meidert_harrypotter")
# Read data
book1 <- as.matrix(read.table("hpbook1.txt"))
book2 <- as.matrix(read.table("hpbook2.txt"))
book3 <- as.matrix(read.table("hpbook3.txt"))
book4 <- as.matrix(read.table("hpbook4.txt"))
book5 <- as.matrix(read.table("hpbook5.txt"))
book6 <- as.matrix(read.table("hpbook6.txt"))
hp.attributes <- as.matrix(read.table("hpattributes.txt", header=TRUE))
hp.name <- as.matrix(read.table("hpnames.txt", header=TRUE))
Next, I need to create network objects.
#book1.ig <- graph.adjacency(book1, directed=T)
book1.ig <- graph.adjacency(book1)
book2.ig <- graph.adjacency(book2)
book3.ig <- graph.adjacency(book3)
book4.ig <- graph.adjacency(book4)
book5.ig <- graph.adjacency(book5)
book6.ig <- graph.adjacency(book6)
book1.stat <- as.network.matrix(book1)
book2.stat <- as.network.matrix(book2)
book3.stat <- as.network.matrix(book3)
book4.stat <- as.network.matrix(book4)
book5.stat <- as.network.matrix(book5)
book6.stat <- as.network.matrix(book6)
For the purposes of this analysis, I will be focusing only on Book 4, The Goblet of Fire. First, we’ll look at some basic network information:
vcount(book4.ig)
[1] 64
ecount(book4.ig)
[1] 49
print(book4.stat)
Network attributes:
vertices = 64
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 34
missing edges= 0
non-missing edges= 34
Vertex attribute names:
vertex.names
No edge attributes
So basic information: this network has 64 verticies. BUT - the edge count returned by igraph vs statnet is different? Why is this? Is this a function of this being a directed network?
Now we’ll look at metrics in a different way:
Now we’ll take a look at vertex and edge attributes.
#display vertex attributes for igraph object
vertex_attr_names(book4.ig)
[1] "name"
#display edge attributes for igraph object
edge_attr_names(book4.ig)
character(0)
Now, I also have these available to me in two dataframes: hp.attributes and hp.name. I think part of this project will be figuring out how to access it!
Now I’m going to try and understand something about the components of my network:
names(igraph::components(book4.ig))
[1] "membership" "csize" "no"
igraph::components(book4.ig)$no #Number of components
[1] 53
igraph::components(book4.ig)$csize #Size of each component
[1] 1 1 1 1 1 1 12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[23] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[45] 1 1 1 1 1 1 1 1 1
So from here we can see that we have 53 components, one of which has 12 members, all the rest of which, for this particular book, have 1.
Let’s look at the isolates:
#retrieve isolates
isolates(book4.stat)
[1] 1 2 3 4 5 6 10 11 12 14 15 16 17 18 22 23 24 27 28 29 30 31
[23] 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 53 54 55
[45] 57 58 59 60 61 62 63 64
#subset vertex.names attribute to get names of isolates
as.vector(book4.stat%v%'vertex.names')[c(isolates(book4.stat))]
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V10" "V11" "V12" "V14" "V15"
[12] "V16" "V17" "V18" "V22" "V23" "V24" "V27" "V28" "V29" "V30" "V31"
[23] "V32" "V33" "V34" "V35" "V36" "V37" "V38" "V39" "V40" "V41" "V42"
[34] "V43" "V45" "V46" "V47" "V48" "V49" "V50" "V51" "V53" "V54" "V55"
[45] "V57" "V58" "V59" "V60" "V61" "V62" "V63" "V64"
So this tells us that the components that are NOT isolates are: 7, 8, 9, 13, 19, 20, 21, 25, 26, 44, 52, 56
Now, here’s the issue - in our network objects, we don’t have the actual names listed; they’re in a separate dataframe. How might we pull that in?
We have book4 as an adjacency matrix and we have the names as a separate dataframe. Can we combine them and make an edgelist? I found the documentation for igraph, so I’m going to try this:
edge4 <- graph.adjacency(book4)
book4_edge <- get.edgelist(edge4)
When I look at book4_edge, however, I have some confusion, because for every variable that is not an isolate, there is a directed edge with itself, and this seems a bit wonky to me.
plot(edge4)
And for some of these, the only reason these components are not isolates is because of the self-edge.
When I look back up at my network description, it says LOOPS are false. So shall I rerun this without loops? Given the context of the connection (support not just interaction), that seems like it would make sense.
simple4 <- igraph::simplify(book4.ig, remove.loops = TRUE)
is_directed(simple4)
[1] TRUE
plot(simple4)
simpl4_edge <- get.edgelist(simple4)
Let’s look at components of our simple graph now:
names(igraph::components(simple4))
[1] "membership" "csize" "no"
igraph::components(simple4)$no #Number of components
[1] 53
igraph::components(simple4)$csize #Size of each component
[1] 1 1 1 1 1 1 12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[23] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[45] 1 1 1 1 1 1 1 1 1
Huh. So removing the loops didn’t change that.
I wonder if there’s a way to plot just the component with connections, and not the isolates?
Okay, let’s see if we can create an edgelist of just the components who are not isolates. (I did find a person doing this but it looks like the function in igraph has been removed)
[1] 1 2 3 4 5 6 10 11 12 14 15 16 17 18 22 23 24 27 28 29 30 31
[23] 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 53 54 55
[45] 57 58 59 60 61 62 63 64
#simple4_sub = delete.vertices(book4.stat, Isolated)
#plot(simple4_sub)
OK, I am going to start over here with Book 4, and see if I can remove the loops in the creation of the igraph object!
book4a.ig <- graph.adjacency(book4, diag = 0)
plot(book4a.ig)
simple4ege <- get.edgelist(book4a.ig)
simple4a.ig <- graph.edgelist(simple4ege)
plot(simple4a.ig)
OKAY this looks more interesting!! Now, the question is, how do we get the Nodes to read student name instead of V7, V9 etc.
V(simple4a.ig)$name
[1] "V7" "V8" "V25" "V9" "V13" "V19" "V20" "V21" "V26" "V56" "V44"
[12] "V52"
Okay, so I’m going to try and replace the column names:
colnames(work4)
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11"
[12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20" "V21" "V22"
[23] "V23" "V24" "V25" "V26" "V27" "V28" "V29" "V30" "V31" "V32" "V33"
[34] "V34" "V35" "V36" "V37" "V38" "V39" "V40" "V41" "V42" "V43" "V44"
[45] "V45" "V46" "V47" "V48" "V49" "V50" "V51" "V52" "V53" "V54" "V55"
[56] "V56" "V57" "V58" "V59" "V60" "V61" "V62" "V63" "V64"
as.vector(names)
# A tibble: 1 × 64
` 1` ` 2` ` 3` ` 4` ` 5` ` 6` ` 7` ` 8` ` 9` `10` `11`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Adrian … Alic… Ange… Anth… Blai… C. W… Cedr… Cho … Coli… Corm… Dean…
# … with 53 more variables: `12` <chr>, `13` <chr>, `14` <chr>,
# `15` <chr>, `16` <chr>, `17` <chr>, `18` <chr>, `19` <chr>,
# `20` <chr>, `21` <chr>, `22` <chr>, `23` <chr>, `24` <chr>,
# `25` <chr>, `26` <chr>, `27` <chr>, `28` <chr>, `29` <chr>,
# `30` <chr>, `31` <chr>, `32` <chr>, `33` <chr>, `34` <chr>,
# `35` <chr>, `36` <chr>, `37` <chr>, `38` <chr>, `39` <chr>,
# `40` <chr>, `41` <chr>, `42` <chr>, `43` <chr>, `44` <chr>, …
[1] "Adrian Pucey" "Alicia Spinnet"
[3] "Angelina Johnson" "Anthony Goldstein"
[5] "Blaise Zabini" "C. Warrington"
[7] "Cedric Diggory" "Cho Chang"
[9] "Colin Creevey" "Cormac McLaggen"
[11] "Dean Thomas" "Demelza Robins"
[13] "Dennis Creevey" "Draco Malfoy"
[15] "Eddie Carmichael" "Eleanor Branstone"
[17] "Ernie Macmillan" "Euan Abercrombie"
[19] "Fred Weasley" "George Weasley"
[21] "Ginny Weasley" "Graham Pritchard"
[23] "Gregory Goyle" "Hannah Abbott"
[25] "Harry James Potter" "Hermione Granger"
[27] "Jimmy Peakes" "Justin Finch-Fletchley"
[29] "Katie Bell" "Kevin Whitby"
[31] "Lavender Brown" "Leanne"
[33] "Lee Jordan" "Lucian Bole"
[35] "Luna Lovegood" "Malcolm Baddock"
[37] "Mandy Brocklehurst" "Marcus Belby"
[39] "Marcus Flint" "Michael Corner"
[41] "Miles Bletchley" "Millicent Bulstrode"
[43] "Natalie McDonald" "Neville Longbottom"
[45] "Oliver Wood" "Orla Quirke"
[47] "Owen Cauldwell" "Padma Patil"
[49] "Pansy Parkinson" "Parvati Patil"
[51] "Penelope Clearwater" "Percy Weasley"
[53] "Peregrine Derrick" "Roger Davies"
[55] "Romilda Vane" "Ronald Weasley"
[57] "Rose Zeller" "Seamus Finnigan"
[59] "Stewart Ackerley" "Susan Bones"
[61] "Terry Boot" "Theodore Nott"
[63] "Vincent Crabbe" "Zacharias Smith"
HOLY CRAP I THINK I DID IT
Now to try and plot only the non-isolates of this network.
#work4 is our working dataframe - convert to a matrix
work4 <- as.matrix(work4)
#convert to graph object
work4.ig <- graph.adjacency(work4, diag = 0)
plot(work4.ig)
#now we're going to get only the connected part of the graph
work4edge <- get.edgelist(work4.ig)
work4_simple.ig <- graph.edgelist(work4edge)
plot(work4_simple.ig)
YES !!! It only took me 5 hours to do this HAHA!! But dang, that is an ugly plot.
Now I’m going to fool around a bit with the plotting to see if I can make it prettier.
#layout = 'linear', circular = TRUE
#, start_cap = label_rect(node1.name), end_cap = label_rect(node2.name) end_cap = circle(3, 'mm')
ggraph(work4_simple.ig, layout = 'graphopt') +
geom_edge_link(aes(start_cap = label_rect(node1.name), end_cap = label_rect(node2.name)), arrow = arrow(length = unit(4, 'mm'))) +
scale_edge_alpha('Edge direction', guide = 'edge_direction') +
geom_node_point() +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()
Okay, I need to let this go for a moment. It’s making me nuts though, how ugly this is.
Apparently I am unable to let it go.
ggraph(work4_simple.ig, layout = 'lgl') +
geom_edge_arc(color="gray", strength=0.3, arrow = arrow(length = unit(4, 'mm'))) +
geom_node_point() +
geom_node_text(aes(label = name), size=3, repel=T) +
theme_void() +
labs(title = "Student Support Network in Harry Potter and the Goblet of Fire")
That’s a tiny bit prettier, I can move on now. I don’t love it.
I’m going try one more thing - a different layout:
ggraph(work4_simple.ig, layout = 'kk') +
geom_edge_arc(color="gray", strength=0.3, arrow = arrow(length = unit(4, 'mm'))) +
geom_node_point() +
geom_node_text(aes(label = name), size=3, repel=T) +
theme_void() +
labs(title = "Student Support Network in Harry Potter and the Goblet of Fire")
Enh, that doesn’t change anything, does it?
Okay, now back to working my way through the tutorials.
First we’ll go back to our overall view of the network and run a dyad census - this is information we have in a slightly different format from the look at isolates.
book4.stat <- as.network.matrix(book4)
print(book4.stat)
Network attributes:
vertices = 64
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 34
missing edges= 0
non-missing edges= 34
Vertex attribute names:
vertex.names
No edge attributes
sna::dyad.census(book4.stat)
Mut Asym Null
[1,] 12 10 1994
Because this is a directed network, this dyad information reconciles with our edge information. (12 x 2) + 10 =34
I’m going to rerun a look at the components here as well:
igraph::components(book4.ig)$no #Number of components
[1] 53
#retrieve isolates
isolates(book4.stat)
[1] 1 2 3 4 5 6 10 11 12 14 15 16 17 18 22 23 24 27 28 29 30 31
[23] 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 53 54 55
[45] 57 58 59 60 61 62 63 64
Now we’ll run a triad census:
sna::triad.census(book4.stat)
003 012 102 021D 021U 021C 111D 111U 030T 030C 201 120D 120U
[1,] 40378 556 668 0 5 5 21 11 0 0 4 1 4
120C 210 300
[1,] 1 0 10
I’m not entirely sure how to interpret these numbers.
Now we’ll take a look at transitivity:
#get network transitivity: igraph
transitivity(book4.ig)
[1] 0.5106383
Our total possible number of triads is 41,664 (or 64 choose 3), and clearly the majority of these are empty (96.9% of them in fact). Per our homework: Global transitivity “…is a network-level statistic that captures the proportion of connected triads in the network that are complete (all three possible links present), and varies between 0 and 1, with 0 meaning none of the connected triads are transitive and 1 meaning all connected triads are transitive.” In iGraph, the direction of the connections is ignored. What this tells us is that just over half of our connected triads are complete.
Because this is a directed network, we can also get the hierarchical network transitivity score:
#get hierarchical network transitivity:statnet
gtrans(book4.stat)
[1] 0.6016949
Because my vertices aren’t named, I’m going to use my work4 matrix instead, so I can look at some local transitivity. Because the majority of my components in my network are isolates, I do expect these all to be low.
#Retrieve a list of the three vertices we are interested in
V(work4.ig)[c("Harry James Potter","Hermione Granger", "Ronald Weasley")]
+ 3/64 vertices, named, from c9694a2:
[1] Harry James Potter Hermione Granger Ronald Weasley
#check ego network transitivity: igraph
transitivity(work4.ig, type="local", vids=V(work4.ig)[c("Harry James Potter","Hermione Granger", "Ronald Weasley")])
[1] 0.2444444 0.8000000 0.6000000
It is interesting to me that Harry’s local transitivity (a technical description of how many of your friends are also friends with each other) score is so much lower than Ron and Hermione’s, but I think that can be explained by the type of network this is, which is not just a friendship network but a support network, meaning that the people who are offering Harry some form of support, are not necessarily offering that support to each other, whereas that seems more likely for Ron and Hermione.
Now we’ll compare global and average local transitivity:
#get global clustering cofficient: igraph
transitivity(work4.ig, type="global")
[1] 0.5106383
#get average local clustering coefficient: igraph
transitivity(work4.ig, type="average")
[1] 0.840404
Our average local transitivity is much higher than our global transitivity.
Now I’m going to calculate the distance between Harry and Ginny in this network (keep in mind, they don’t start dating until Book 6). Our edges are already unweighted, so we do not need to specify that:
#Calculate distances between two nodes
distances(work4.ig,"Harry James Potter","Ginny Weasley")
Ginny Weasley
Harry James Potter 2
At this stage of the series, there is no direct support connection between Harry and Ginny.
I think for these exercises, I’ll use the book4a.ig object, which has self-references removed. I’m not sure why that’s in there, to be honest.
book4a.ig <- graph.adjacency(book4, diag = 0)
First we’ll get network density from igraph and compare it to our statnet results in the original data:
graph.density(book4.ig)
[1] 0.01215278
graph.density(book4a.ig)
[1] 0.00843254
network.density(book4.stat)
[1] 0.00843254
So when we compare the original objects, we get a different result because of the loops present in the original data. When we run the function on the data with the loops removed, the results are identical.
Also, these numbers look very low to me, which makes sense because most of network is isolated. If we run the numbers on our simplified network, we get the following:
graph.density(work4_simple.ig)
[1] 0.2575758
When we remove the isolates, the network is obviously substantially denser.
#Calculate average network degree: igraph WITH LOOPS
igraph::degree(book4.ig)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
0 0 0 0 0 0 5 4 5 0 2 0 5 2 0 0 0
V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34
0 11 10 3 0 0 0 17 11 0 0 0 0 0 0 0 0
V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51
0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0
V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64
4 0 2 0 12 0 0 0 0 0 0 0 0
#Calculate average network degree: igraph WITHOUT LOOPS
igraph::degree(book4.ig, loops=FALSE)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
0 0 0 0 0 0 3 2 3 0 0 0 3 0 0 0 0
V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34
0 9 8 1 0 0 0 15 9 0 0 0 0 0 0 0 0
V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51
0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0
V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64
2 0 0 0 10 0 0 0 0 0 0 0 0
#Calculate average network degree: statnet
sna::degree(book4.stat)
[1] 0 0 0 0 0 0 3 2 3 0 0 0 3 0 0 0 0 0 9 8 1 0
[23] 0 0 15 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
[45] 0 0 0 0 0 0 0 2 0 0 0 10 0 0 0 0 0 0 0 0
I’m going to rerun this with the work4 object so we can have vertex names! For the sake of brevity, I’m only running the analysis without loops.
#Calculate average network degree: igraph WITH LOOPS
#igraph::degree(work4.ig)
#Calculate average network degree: igraph WITHOUT LOOPS
igraph::degree(work4.ig, loops=FALSE)
Adrian Pucey Alicia Spinnet Angelina Johnson
0 0 0
Anthony Goldstein Blaise Zabini C. Warrington
0 0 0
Cedric Diggory Cho Chang Colin Creevey
3 2 3
Cormac McLaggen Dean Thomas Demelza Robins
0 0 0
Dennis Creevey Draco Malfoy Eddie Carmichael
3 0 0
Eleanor Branstone Ernie Macmillan Euan Abercrombie
0 0 0
Fred Weasley George Weasley Ginny Weasley
9 8 1
Graham Pritchard Gregory Goyle Hannah Abbott
0 0 0
Harry James Potter Hermione Granger Jimmy Peakes
15 9 0
Justin Finch-Fletchley Katie Bell Kevin Whitby
0 0 0
Lavender Brown Leanne Lee Jordan
0 0 0
Lucian Bole Luna Lovegood Malcolm Baddock
0 0 0
Mandy Brocklehurst Marcus Belby Marcus Flint
0 0 0
Michael Corner Miles Bletchley Millicent Bulstrode
0 0 0
Natalie McDonald Neville Longbottom Oliver Wood
0 3 0
Orla Quirke Owen Cauldwell Padma Patil
0 0 0
Pansy Parkinson Parvati Patil Penelope Clearwater
0 0 0
Percy Weasley Peregrine Derrick Roger Davies
2 0 0
Romilda Vane Ronald Weasley Rose Zeller
0 10 0
Seamus Finnigan Stewart Ackerley Susan Bones
0 0 0
Terry Boot Theodore Nott Vincent Crabbe
0 0 0
Zacharias Smith
0
Now I’m going to create the dataset of Vertex names and degrees using the work4.ig object:
# create a dataset of vertex names and degree: igraph
work4.nodes<-data.frame(name=V(work4.ig)$name, degree=igraph::degree(work4.ig))
Now we’ll calculate in- and out-degree of our nodes, as this is a directed network, and attach those numbers to the dataframe we just created.
Now we’ll look at the summary statistics:
#get summary statistics for node attributes
summary(work4.nodes)
name degree indegree
Length:64 Min. : 0.000 Min. :0.0000
Class :character 1st Qu.: 0.000 1st Qu.:0.0000
Mode :character Median : 0.000 Median :0.0000
Mean : 1.062 Mean :0.5312
3rd Qu.: 0.000 3rd Qu.:0.0000
Max. :15.000 Max. :9.0000
outdegree
Min. :0.0000
1st Qu.:0.0000
Median :0.0000
Mean :0.5312
3rd Qu.:0.0000
Max. :6.0000
#create a histogram of HP Support Indegree
hist(work4.nodes$indegree, main="Harry Potter: GoF In-degree Distribution", xlab="Instances of Support")
hist(work4.nodes$outdegree, main="Harry Potter: GoF Out-degree Distribution", xlab="Instances of Support")
Should the mean for in and out-degree be the same?
Again, because so many of our nodes are isolates, obviously our data are skewed.
work4.stat <- as.network.matrix(work4)
#get network centralization score: igraph
centr_degree(work4.ig, loops = FALSE, mode="in")$centralization
[1] 0.1365583
centr_degree(work4.ig, loops = FALSE, mode="out")$centralization
[1] 0.08818342
Highest support connections (In/Out/Total)
name degree indegree outdegree
Harry James Potter Harry James Potter 15 9 6
Ronald Weasley Ronald Weasley 10 5 5
Fred Weasley Fred Weasley 9 4 5
George Weasley George Weasley 8 4 4
Hermione Granger Hermione Granger 9 4 5
name degree indegree outdegree
Harry James Potter Harry James Potter 15 9 6
Fred Weasley Fred Weasley 9 4 5
Hermione Granger Hermione Granger 9 4 5
Ronald Weasley Ronald Weasley 10 5 5
George Weasley George Weasley 8 4 4
name degree indegree outdegree
Harry James Potter Harry James Potter 15 9 6
Ronald Weasley Ronald Weasley 10 5 5
Fred Weasley Fred Weasley 9 4 5
Hermione Granger Hermione Granger 9 4 5
George Weasley George Weasley 8 4 4
Unsurprisingly. Harry tops the list on all three measures.
Now we can calculate the eigenvector centrality and attach it to our nodes df:
#igraph version:
#work4.nodes <- work4.nodes %>%
# mutate(eigen = igraph::centr_eigen(work4.ig, directed = T))
eigen <- centr_eigen(work4.ig, directed = T)
eigen$vector
[1] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[7] 0.24603669 0.06053405 0.00000000 0.00000000 0.00000000 0.00000000
[13] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[19] 0.93946595 0.93946595 0.23114309 0.00000000 0.00000000 0.00000000
[25] 1.00000000 0.93946595 0.00000000 0.00000000 0.00000000 0.00000000
[31] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[37] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[43] 0.00000000 0.70832286 0.00000000 0.00000000 0.00000000 0.00000000
[49] 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
[55] 0.00000000 0.93946595 0.00000000 0.00000000 0.00000000 0.00000000
[61] 0.00000000 0.00000000 0.00000000 0.00000000
eigen$centralization
[1] 0.920573
[1] "matrix" "array"
Attempt to figure out top 5 of eigenvector centrality
work4.eigenDF <- as.data.frame(work4.eigen)
class(work4.eigenDF)
[1] "data.frame"
V1 V2
1 1 Harry James Potter
2 0.939465949524624 Fred Weasley
3 0.939465949524624 George Weasley
4 0.939465949524624 Hermione Granger
5 0.939465949524624 Ronald Weasley
Let’s add this to our nodes df
#work4.BPC<-power_centrality(work4.ig)
#head(work4.BPC)
This fails, with the error: “Error in .solve.dgC(a, as(b,”denseMatrix“), tol = tol, sparse = sparse) : cs_lu(A) failed: near-singular A (or out of memory)” so I’m going to try and run it on work4_simple.ig
#work4.BPC<-power_centrality(work4_simple.ig,rescale=T)
#head(work4.BPC)
I’m not sure why this is failing, so I’m going to move on, and maybe come back to this.
Now I’m going to try and calculate our derived and reflected centrality and attach that to our work4.nodes df [Note: not sure this is appropriate for this type of network?? Let’s see what happens!]
Note from Week 4 slides:
#Based on code from NetStatusSolutions.rmd
work4_adjacency<-as.matrix(as_adjacency_matrix(work4.ig))
#work4_adjacency
work4_adjacency2 <- work4_adjacency %*% work4_adjacency
#Calculate the proportion of reflected centrality.
rc<-diag(as.matrix(work4_adjacency2))/rowSums(as.matrix(work4_adjacency2))
rc<-ifelse(is.nan(rc),0,rc)
#rc
#Calculate the proportion of derived centrality.
dc<-1-diag(as.matrix(work4_adjacency2))/rowSums(as.matrix(work4_adjacency2))
dc<-ifelse(is.nan(dc),1,dc)
#dc
Let’s add these to our nodes matrix:
rc <- as.data.frame(rc)
dc <- as.data.frame(dc)
work4.nodes <- work4.nodes %>%
mutate(reflectedCentrality = rc)
work4.nodes <- work4.nodes %>%
mutate(derivedCentrality = dc)
So, my question is, given that so many of my nodes are isolates - they neither offered nor received any sort of support in this book - is reflected and derived centrality an appropriate measure? Or is it only meaningful for those nodes that actually have ties to other nodes? I’m going with yes, but I’d love feedback on this. (Honestly, I think I’m still a bit confused on what reflected vs derived centrality is)
For the purposes of this exercise, I’m going to create an adjacency matrix of only those nodes that have ties to each other (exclude the isolates):
#First attempt - guessing I actually already did this, oh well!
old4 <- work4_adjacency
old4.ig <- graph.adjacency(old4)
class(old4.ig)
[1] "igraph"
new4.ig <- igraph::delete.vertices(old4.ig, which(igraph::degree(old4.ig)==0))
plot(new4.ig)
The closeness centrality of a node is defined as the sum of the geodesic distances between that node and all other nodes in a network.
#calculate closeness centrality: igraph
closeness4 <- igraph::closeness(new4.ig, mode = "all") #made the choice to use undirected even though this is a directed graph
closeness4 <- as.data.frame(closeness4)
arrange(closeness4, desc(closeness4))
closeness4
Harry James Potter 0.08333333
Ronald Weasley 0.06250000
Fred Weasley 0.05882353
Hermione Granger 0.05882353
George Weasley 0.05555556
Neville Longbottom 0.05000000
Cedric Diggory 0.04761905
Cho Chang 0.04761905
Colin Creevey 0.04761905
Dennis Creevey 0.04761905
Percy Weasley 0.04761905
Ginny Weasley 0.03703704
Honestly, I’m not really sure how to interpret these results. I am assuming this means that Harry Potter is the closest to all other nodes in the network, which makes sense.
Network Closeness Centralization
#calculate closeness centralization index: igraph
igraph::centr_clo(new4.ig, mode = "all")$centralization
[1] 0.7472467
#had to make sure you specify mode = all, meaning treat it as undirected
Note: I tried to run this on the network object with isolates not removed and it returned NaN. I do not know why this is.
#calculate closeness centralization index: igraph
igraph::centr_clo(work4.ig, mode = "all")$centralization
[1] NaN
Betweenness is defined as the number of geodesics on which a node sits. Remember, a geodesic is (if any) the shortest path(s) between any two nodes.
#calculate betweenness centrality: igraph
igraph::betweenness(new4.ig, directed=TRUE)
Cedric Diggory Cho Chang Colin Creevey
8.0000000 0.0000000 0.0000000
Dennis Creevey Fred Weasley George Weasley
0.0000000 9.0000000 0.0000000
Ginny Weasley Harry James Potter Hermione Granger
0.0000000 42.1666667 0.6666667
Neville Longbottom Percy Weasley Ronald Weasley
0.0000000 0.0000000 3.1666667
between <- igraph::betweenness(old4.ig, directed=TRUE)
work4.nodes <- work4.nodes %>% mutate(between = between)
Okay… I really don’t know what to make of this. Does this mean that those nodes with a betweenness of 0 are pendants? Let’s take a look at the graph again and see if it makes sense:
ggraph(work4_simple.ig, layout = 'kk') +
geom_edge_arc(color="gray", strength=0.3, arrow = arrow(length = unit(4, 'mm'))) +
geom_node_point() +
geom_node_text(aes(label = name), size=3, repel=T) +
theme_void() +
labs(title = "Student Support Network in Harry Potter and the Goblet of Fire")
Oh wait! Geodesic is the SHORTEST path. So, while, for example, Cho is on a path betwen Cedric and Harry,it’s not the SHORTEST path. I wonder why Fred gets a 9 and George gets a 0?
#calculate betweenness centralization index: igraph
centr_betw(new4.ig,directed=F)$centralization
[1] 0.6214876
Constraint is a measure of the redundancy of a node’s connections. It is bound between 0 and 1, with 0 being a complete lack, and 1 being complete redundancy. The constraint()
function is only available in igraph, and automatically uses the available weight vector.
constraint(new4.ig)
Cedric Diggory Cho Chang Colin Creevey
0.8371605 1.0155556 0.7832099
Dennis Creevey Fred Weasley George Weasley
0.7832099 0.5210288 0.6270679
Ginny Weasley Harry James Potter Hermione Granger
1.0000000 0.3173394 0.5873297
Neville Longbottom Percy Weasley Ronald Weasley
0.6264472 0.6811111 0.5378099
I am particularly struck by the fact that Cho’s constraint score is over 1.
#First, I need to create a statnet object.
old4_adj <- as.matrix(graph.adjacency(old4))
new4_adj <- as.matrix(as_adjacency_matrix(new4.ig))
old4.stat <- as.network.matrix(old4_adj)
new4.stat <- as.network.matrix(new4_adj)
print(new4.stat)
Network attributes:
vertices = 12
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 34
missing edges= 0
non-missing edges= 34
Vertex attribute names:
vertex.names
No edge attributes
Now I’m going to try and create brokerage scores:
brokerage(new4.stat, cl = work4.nodes$name)
#for the sake of brevity I'm going to try and hide the results in the final document, but this DOES run! huzzah!
#return matrix of standardized brokerage scores
brokerage(new4.stat, cl = work4.nodes$name)$z.nli
w_I w_O b_IO b_OI b_O t
Cedric Diggory NaN NaN NaN NaN -11.819463 -1.0136651
Cho Chang NaN NaN NaN NaN -11.881217 -1.2430947
Colin Creevey NaN NaN NaN NaN -11.881217 -1.2430947
Dennis Creevey NaN NaN NaN NaN -11.881217 -1.2430947
Fred Weasley NaN NaN NaN NaN -11.634200 -0.3253764
George Weasley NaN NaN NaN NaN -11.881217 -1.2430947
Ginny Weasley NaN NaN NaN NaN -11.881217 -1.2430947
Harry James Potter NaN NaN NaN NaN -9.781574 6.5575109
Hermione Granger NaN NaN NaN NaN -11.757709 -0.7842355
Neville Longbottom NaN NaN NaN NaN -11.881217 -1.2430947
Percy Weasley NaN NaN NaN NaN -11.881217 -1.2430947
Ronald Weasley NaN NaN NaN NaN -11.510692 0.1334828
Okay, now I really don’t know what’s going on. Why is this returning NaN? Are the only types of brokerage here Liason? How is this calculated? I HAVE QUESTIONS!
Now I’m going to calculate on the entire network and add to the Nodes df
#return matrix of standardized brokerage scores
brokerage(old4.stat, cl = work4.nodes$name)$z.nli
w_I w_O b_IO b_OI b_O t
Adrian Pucey NaN NaN NaN NaN -0.3700275 -0.3700275
Alicia Spinnet NaN NaN NaN NaN -0.3700275 -0.3700275
Angelina Johnson NaN NaN NaN NaN -0.3700275 -0.3700275
Anthony Goldstein NaN NaN NaN NaN -0.3700275 -0.3700275
Blaise Zabini NaN NaN NaN NaN -0.3700275 -0.3700275
C. Warrington NaN NaN NaN NaN -0.3700275 -0.3700275
Cedric Diggory NaN NaN NaN NaN 0.9735500 0.9735500
Cho Chang NaN NaN NaN NaN -0.3700275 -0.3700275
Colin Creevey NaN NaN NaN NaN -0.3700275 -0.3700275
Cormac McLaggen NaN NaN NaN NaN -0.3700275 -0.3700275
Dean Thomas NaN NaN NaN NaN -0.3700275 -0.3700275
Demelza Robins NaN NaN NaN NaN -0.3700275 -0.3700275
Dennis Creevey NaN NaN NaN NaN -0.3700275 -0.3700275
Draco Malfoy NaN NaN NaN NaN -0.3700275 -0.3700275
Eddie Carmichael NaN NaN NaN NaN -0.3700275 -0.3700275
Eleanor Branstone NaN NaN NaN NaN -0.3700275 -0.3700275
Ernie Macmillan NaN NaN NaN NaN -0.3700275 -0.3700275
Euan Abercrombie NaN NaN NaN NaN -0.3700275 -0.3700275
Fred Weasley NaN NaN NaN NaN 5.0042826 5.0042826
George Weasley NaN NaN NaN NaN -0.3700275 -0.3700275
Ginny Weasley NaN NaN NaN NaN -0.3700275 -0.3700275
Graham Pritchard NaN NaN NaN NaN -0.3700275 -0.3700275
Gregory Goyle NaN NaN NaN NaN -0.3700275 -0.3700275
Hannah Abbott NaN NaN NaN NaN -0.3700275 -0.3700275
Harry James Potter NaN NaN NaN NaN 45.3116087 45.3116087
Hermione Granger NaN NaN NaN NaN 2.3171276 2.3171276
Jimmy Peakes NaN NaN NaN NaN -0.3700275 -0.3700275
Justin Finch-Fletchley NaN NaN NaN NaN -0.3700275 -0.3700275
Katie Bell NaN NaN NaN NaN -0.3700275 -0.3700275
Kevin Whitby NaN NaN NaN NaN -0.3700275 -0.3700275
Lavender Brown NaN NaN NaN NaN -0.3700275 -0.3700275
Leanne NaN NaN NaN NaN -0.3700275 -0.3700275
Lee Jordan NaN NaN NaN NaN -0.3700275 -0.3700275
Lucian Bole NaN NaN NaN NaN -0.3700275 -0.3700275
Luna Lovegood NaN NaN NaN NaN -0.3700275 -0.3700275
Malcolm Baddock NaN NaN NaN NaN -0.3700275 -0.3700275
Mandy Brocklehurst NaN NaN NaN NaN -0.3700275 -0.3700275
Marcus Belby NaN NaN NaN NaN -0.3700275 -0.3700275
Marcus Flint NaN NaN NaN NaN -0.3700275 -0.3700275
Michael Corner NaN NaN NaN NaN -0.3700275 -0.3700275
Miles Bletchley NaN NaN NaN NaN -0.3700275 -0.3700275
Millicent Bulstrode NaN NaN NaN NaN -0.3700275 -0.3700275
Natalie McDonald NaN NaN NaN NaN -0.3700275 -0.3700275
Neville Longbottom NaN NaN NaN NaN -0.3700275 -0.3700275
Oliver Wood NaN NaN NaN NaN -0.3700275 -0.3700275
Orla Quirke NaN NaN NaN NaN -0.3700275 -0.3700275
Owen Cauldwell NaN NaN NaN NaN -0.3700275 -0.3700275
Padma Patil NaN NaN NaN NaN -0.3700275 -0.3700275
Pansy Parkinson NaN NaN NaN NaN -0.3700275 -0.3700275
Parvati Patil NaN NaN NaN NaN -0.3700275 -0.3700275
Penelope Clearwater NaN NaN NaN NaN -0.3700275 -0.3700275
Percy Weasley NaN NaN NaN NaN -0.3700275 -0.3700275
Peregrine Derrick NaN NaN NaN NaN -0.3700275 -0.3700275
Roger Davies NaN NaN NaN NaN -0.3700275 -0.3700275
Romilda Vane NaN NaN NaN NaN -0.3700275 -0.3700275
Ronald Weasley NaN NaN NaN NaN 7.6914377 7.6914377
Rose Zeller NaN NaN NaN NaN -0.3700275 -0.3700275
Seamus Finnigan NaN NaN NaN NaN -0.3700275 -0.3700275
Stewart Ackerley NaN NaN NaN NaN -0.3700275 -0.3700275
Susan Bones NaN NaN NaN NaN -0.3700275 -0.3700275
Terry Boot NaN NaN NaN NaN -0.3700275 -0.3700275
Theodore Nott NaN NaN NaN NaN -0.3700275 -0.3700275
Vincent Crabbe NaN NaN NaN NaN -0.3700275 -0.3700275
Zacharias Smith NaN NaN NaN NaN -0.3700275 -0.3700275
So these numbers are different from our smaller network, and for all of them, there is no difference between the liaison brokerage score and the total. I’m quite confused by this!
I’m going to add only the total brokerage score to the nodes df
#return matrix of standardized brokerage scores
brokerage(old4.stat, cl = work4.nodes$name)$z.nli
w_I w_O b_IO b_OI b_O t
Adrian Pucey NaN NaN NaN NaN -0.3700275 -0.3700275
Alicia Spinnet NaN NaN NaN NaN -0.3700275 -0.3700275
Angelina Johnson NaN NaN NaN NaN -0.3700275 -0.3700275
Anthony Goldstein NaN NaN NaN NaN -0.3700275 -0.3700275
Blaise Zabini NaN NaN NaN NaN -0.3700275 -0.3700275
C. Warrington NaN NaN NaN NaN -0.3700275 -0.3700275
Cedric Diggory NaN NaN NaN NaN 0.9735500 0.9735500
Cho Chang NaN NaN NaN NaN -0.3700275 -0.3700275
Colin Creevey NaN NaN NaN NaN -0.3700275 -0.3700275
Cormac McLaggen NaN NaN NaN NaN -0.3700275 -0.3700275
Dean Thomas NaN NaN NaN NaN -0.3700275 -0.3700275
Demelza Robins NaN NaN NaN NaN -0.3700275 -0.3700275
Dennis Creevey NaN NaN NaN NaN -0.3700275 -0.3700275
Draco Malfoy NaN NaN NaN NaN -0.3700275 -0.3700275
Eddie Carmichael NaN NaN NaN NaN -0.3700275 -0.3700275
Eleanor Branstone NaN NaN NaN NaN -0.3700275 -0.3700275
Ernie Macmillan NaN NaN NaN NaN -0.3700275 -0.3700275
Euan Abercrombie NaN NaN NaN NaN -0.3700275 -0.3700275
Fred Weasley NaN NaN NaN NaN 5.0042826 5.0042826
George Weasley NaN NaN NaN NaN -0.3700275 -0.3700275
Ginny Weasley NaN NaN NaN NaN -0.3700275 -0.3700275
Graham Pritchard NaN NaN NaN NaN -0.3700275 -0.3700275
Gregory Goyle NaN NaN NaN NaN -0.3700275 -0.3700275
Hannah Abbott NaN NaN NaN NaN -0.3700275 -0.3700275
Harry James Potter NaN NaN NaN NaN 45.3116087 45.3116087
Hermione Granger NaN NaN NaN NaN 2.3171276 2.3171276
Jimmy Peakes NaN NaN NaN NaN -0.3700275 -0.3700275
Justin Finch-Fletchley NaN NaN NaN NaN -0.3700275 -0.3700275
Katie Bell NaN NaN NaN NaN -0.3700275 -0.3700275
Kevin Whitby NaN NaN NaN NaN -0.3700275 -0.3700275
Lavender Brown NaN NaN NaN NaN -0.3700275 -0.3700275
Leanne NaN NaN NaN NaN -0.3700275 -0.3700275
Lee Jordan NaN NaN NaN NaN -0.3700275 -0.3700275
Lucian Bole NaN NaN NaN NaN -0.3700275 -0.3700275
Luna Lovegood NaN NaN NaN NaN -0.3700275 -0.3700275
Malcolm Baddock NaN NaN NaN NaN -0.3700275 -0.3700275
Mandy Brocklehurst NaN NaN NaN NaN -0.3700275 -0.3700275
Marcus Belby NaN NaN NaN NaN -0.3700275 -0.3700275
Marcus Flint NaN NaN NaN NaN -0.3700275 -0.3700275
Michael Corner NaN NaN NaN NaN -0.3700275 -0.3700275
Miles Bletchley NaN NaN NaN NaN -0.3700275 -0.3700275
Millicent Bulstrode NaN NaN NaN NaN -0.3700275 -0.3700275
Natalie McDonald NaN NaN NaN NaN -0.3700275 -0.3700275
Neville Longbottom NaN NaN NaN NaN -0.3700275 -0.3700275
Oliver Wood NaN NaN NaN NaN -0.3700275 -0.3700275
Orla Quirke NaN NaN NaN NaN -0.3700275 -0.3700275
Owen Cauldwell NaN NaN NaN NaN -0.3700275 -0.3700275
Padma Patil NaN NaN NaN NaN -0.3700275 -0.3700275
Pansy Parkinson NaN NaN NaN NaN -0.3700275 -0.3700275
Parvati Patil NaN NaN NaN NaN -0.3700275 -0.3700275
Penelope Clearwater NaN NaN NaN NaN -0.3700275 -0.3700275
Percy Weasley NaN NaN NaN NaN -0.3700275 -0.3700275
Peregrine Derrick NaN NaN NaN NaN -0.3700275 -0.3700275
Roger Davies NaN NaN NaN NaN -0.3700275 -0.3700275
Romilda Vane NaN NaN NaN NaN -0.3700275 -0.3700275
Ronald Weasley NaN NaN NaN NaN 7.6914377 7.6914377
Rose Zeller NaN NaN NaN NaN -0.3700275 -0.3700275
Seamus Finnigan NaN NaN NaN NaN -0.3700275 -0.3700275
Stewart Ackerley NaN NaN NaN NaN -0.3700275 -0.3700275
Susan Bones NaN NaN NaN NaN -0.3700275 -0.3700275
Terry Boot NaN NaN NaN NaN -0.3700275 -0.3700275
Theodore Nott NaN NaN NaN NaN -0.3700275 -0.3700275
Vincent Crabbe NaN NaN NaN NaN -0.3700275 -0.3700275
Zacharias Smith NaN NaN NaN NaN -0.3700275 -0.3700275
#add GF brokerage scores to nodes dataframe
temp<-data.frame(brokerage(old4.stat, cl = work4.nodes$name)$z.nli)
work4.nodes <- work4.nodes %>%
mutate(broker.tot = temp$t)
#correlations <- work4.nodes %>%
# select(as.numeric(degree),as.numeric(work4.nodes$eigenCentrality), as.numeric(between), as.numeric(broker.tot)) %>%
# correlate()
# %>% rearrange()
#fashion(correlations)
#visualize correlations
#rplot(correlations)
Okay, this is just failing all over the place and I’m not really sure why? I’m going to go back to my small group without isolates and see if I can run these calculations again.
Note to self: use new4.ig, new4.stat
#Create nodes df for new and add all degree
#Confirm degree is numeric
new4.nodes<-data.frame(name=V(new4.ig)$name, degree=igraph::degree(new4.ig))
#add in degree and out degree
new4.nodes <- new4.nodes %>%
mutate(indegree=igraph::degree(new4.ig, mode="in", loops=FALSE),
outdegree=igraph::degree(new4.ig, mode="out", loops=FALSE))
#add eigenCentrality
newEigen <- centr_eigen(new4.ig, directed = T)
newEigen$vector
[1] 2.460367e-01 6.053405e-02 1.054024e-17 0.000000e+00 9.394659e-01
[6] 9.394659e-01 2.311431e-01 1.000000e+00 9.394659e-01 7.083229e-01
[11] 2.280940e-17 9.394659e-01
newEigen$centralization
[1] 0.5451
new4.eigen<-cbind(newEigen$vector,V(new4.ig)$name)
new4.eigenDF <- data_frame(new4.eigen)
#add to our nodes
new4.nodes <- new4.nodes %>%
mutate(eigenCentrality = new4.eigen)
Calculate derived and reflected centrality and add to nodes df
#Matrix to 2nd power for two step derived/reflected centrality measurement.
new4_adjacency2<-new4_adj %*% new4_adj
#Calculate the proportion of reflected centrality.
new_rc<-diag(as.matrix(new4_adjacency2))/rowSums(as.matrix(new4_adjacency2))
new_rc<-ifelse(is.nan(new_rc),0,new_rc)
head(new_rc)
Cedric Diggory Cho Chang Colin Creevey Dennis Creevey
0.1428571 0.0000000 0.1250000 0.1250000
Fred Weasley George Weasley
0.2000000 0.1904762
#Calculate the proportion of derived centrality.
new_dc<-1-diag(as.matrix(new4_adjacency2))/rowSums(as.matrix(new4_adjacency2))
new_dc<-ifelse(is.nan(new_dc),1,new_dc)
head(new_dc)
Cedric Diggory Cho Chang Colin Creevey Dennis Creevey
0.8571429 1.0000000 0.8750000 0.8750000
Fred Weasley George Weasley
0.8000000 0.8095238
Can we run the Bonacich Power measure on this?
#igraph::power_centrality(new4.ig)
Nope! There’s something wrong how our data is structured? I guess?
Error text: “Error in .local(a, b, …) : cs_lu(A) failed: near-singular A (or out of memory)” - Google is not helping.
Now we’ll add in our closesness, betweeness and brokerage scores:
#calculate closeness centrality: igraph
closeIG <- igraph::closeness(new4.ig)
head(igraph::closeness(new4.ig))
Cedric Diggory Cho Chang Colin Creevey Dennis Creevey
0.06666667 0.06250000 0.05000000 0.05000000
Fred Weasley George Weasley
0.08333333 0.07692308
[1] 0 0 0 0 0 0
closeIG_df <- data_frame(closeIG)
new4.nodes <- new4.nodes %>%
mutate(closeness = closeIG_df$closeIG)
So, this is a directed graph, which per our homework is the default for statnet, but we’re getting zeros on this eval.
[1] 0.5238095 0.5238095 0.5238095 0.5238095 0.6470588 0.6111111
if we tell it that it’s an undirected graph, then we get a measurement. I do not know what to make of this, quite honestly,so I’m going to use my igraph results.
I have no idea why Ginny and Neville’s scores are NaN?
#Betweeness Centrality
head(igraph::betweenness(new4.ig, directed=TRUE, weights=NA))
Cedric Diggory Cho Chang Colin Creevey Dennis Creevey
8 0 0 0
Fred Weasley George Weasley
9 0
betweenIG <- igraph::betweenness(new4.ig, directed=TRUE, weights=NA)
betweenIG_df <- data_frame(betweenIG)
new4.nodes <- new4.nodes %>%
mutate(between = betweenIG_df$betweenIG)
Adding constraint
constraintIG <- constraint(new4.ig)
constraintIG_df <- data_frame(constraintIG)
new4.nodes <- new4.nodes %>%
mutate(constraint = constraintIG_df$constraintIG)
Brokerage:
#return matrix of standardized brokerage scores
head(brokerage(new4.stat, cl = new4.nodes$name)$z.nli)
w_I w_O b_IO b_OI b_O t
Cedric Diggory NaN NaN NaN NaN -1.0136651 -1.0136651
Cho Chang NaN NaN NaN NaN -1.2430947 -1.2430947
Colin Creevey NaN NaN NaN NaN -1.2430947 -1.2430947
Dennis Creevey NaN NaN NaN NaN -1.2430947 -1.2430947
Fred Weasley NaN NaN NaN NaN -0.3253764 -0.3253764
George Weasley NaN NaN NaN NaN -1.2430947 -1.2430947
Quick Reminder of the types of brokerage:
b_O: Liaison role; the broker mediates contact between two individuals from different groups, neither of which is the group to which he or she belongs. Two-path structure: A -> B -> C
In this evaluation, liaison and total brokerage are the same, so we’ll add it to the nodes df as liaison/total
brokerage <- brokerage(new4.stat, cl = new4.nodes$name)$z.nli
brokerage_df <- as.data.frame(brokerage)
new4.nodes <- new4.nodes %>%
mutate(liaison_total = brokerage_df$b_O)
eigenCentraility has returned as a character, which will not work for statistical evaluations (I’m assuming)
as.numeric(new4.nodes$eigenCentrality)
[1] 2.460367e-01 6.053405e-02 1.054024e-17 0.000000e+00 9.394659e-01
[6] 9.394659e-01 2.311431e-01 1.000000e+00 9.394659e-01 7.083229e-01
[11] 2.280940e-17 9.394659e-01 NA NA NA
[16] NA NA NA NA NA
[21] NA NA NA NA
Okay I’m not sure why it won’t return as a number? I think I’m going to move on and just see if I can run some correlations. Note: I’m leaving out closeness because it has NaNs and I am not sure how to handle those.
correlations <- new4.nodes %>%
select(degree,indegree,outdegree, derCentrality, reflectedCentrality, between, constraint, liaison_total) %>%
correlate() %>%
rearrange()
fashion(correlations)
term degree indegree outdegree reflectedCentrality
1 degree .95 .93 .85
2 indegree .95 .78 .71
3 outdegree .93 .78 .91
4 reflectedCentrality .85 .71 .91
5 liaison_total .77 .84 .60 .51
6 between .72 .78 .56 .51
7 constraint -.87 -.84 -.81 -.70
8 derCentrality -.85 -.71 -.91 -1.00
liaison_total between constraint derCentrality
1 .77 .72 -.87 -.85
2 .84 .78 -.84 -.71
3 .60 .56 -.81 -.91
4 .51 .51 -.70 -1.00
5 .97 -.67 -.51
6 .97 -.63 -.51
7 -.67 -.63 .70
8 -.51 -.51 .70
#visualize correlations
rplot(correlations)
I’m not 100% sure how to interpret this?
I’m going to stop here and publish this, and then move on to the next two tutorials!
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ph.D. (2022, May 19). Data Analytics and Computational Social Science: Network Analysis Exploration - Harry Potter Version. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlbateshaus888903/
BibTeX citation
@misc{ph.d.2022network, author = {Ph.D., Lissie Bates-Haus,}, title = {Data Analytics and Computational Social Science: Network Analysis Exploration - Harry Potter Version}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlbateshaus888903/}, year = {2022} }