A closer look at Enrons Emails
ls()
[1] "network_edgelist" "network_igraph" "network_statnet"
vcount(network_igraph)
[1] 184
ecount(network_igraph)
[1] 125409
print(network_statnet)
Network attributes:
vertices = 184
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 3010
missing edges= 0
non-missing edges= 3010
Vertex attribute names:
vertex.names
Edge attribute names not shown
#print(network_igraph)
It looks like the igraph and statnet variables are showing different edges. The network igraph is showing 184 nodes and 125409 edges. The network statnet is showing 184 nodes, and 3010 edges.
is_bipartite(network_igraph)
[1] FALSE
is_directed(network_igraph)
[1] TRUE
is_weighted(network_igraph)
[1] FALSE
Using the Network Igraph set, we have a single mode network, which is directed, and is not weighted.
vertex_attr_names(network_igraph)
[1] "Email" "Name" "Note"
network::list.vertex.attributes(network_statnet)
[1] "na" "vertex.names"
edge_attr_names(network_igraph)
[1] "Time" "Reciptype" "Topic" "LDC_topic"
network::list.edge.attributes(network_statnet)
[1] "LDC_topic" "LDC_topic_desc" "LDC_topic_name"
[4] "na" "Reciptype" "Time"
[7] "Topic"
Igraph Attribute Names: Email, Name, Note
Igraph edge names: Time, Reciptype, Topic, LDC_topic
Statnet attribute names: na, vertex.names
statnet edge names: LDC_topic, LDC_topic_desc, LDC_topic_name, na, Reciptype, Time, Topic
[1] "Albert Meyers" "Thomas Martin" "Andrea Ring"
[4] "Andrew Lewis" "Andy Zipper" "Jeffrey Shankman"
[1] "albert.meyers" "a..martin" "andrea.ring" "andrew.lewis"
[5] "andy.zipper" "a..shankman"
[1] "Employee, Specialist" "Vice President"
[3] "NA" "Director"
[5] "Vice President, Enron Online" "President, Enron Global Mkts"
(network_igraph)$Carrier %>% head()
NULL
head(network_statnet %v% "vertex.names")
[1] 1 2 3 4 5 6
head(network_statnet %e% "Time")
[1] "1979-12-31 21:00:00" "1979-12-31 21:00:00" "1979-12-31 21:00:00"
[4] "1979-12-31 21:00:00" "1979-12-31 21:00:00" "1979-12-31 21:00:00"
head(network_statnet %e% "LDC_topic")
[1] "-1" "-1" "-1" "-1" "-1" "-1"
Length Class Mode
125409 character character
summary(network_statnet %e% "Distance")
Length Class Mode
0 NULL NULL
igraph::dyad.census(network_igraph)
$mut
[1] 30600
$asym
[1] 64208
$null
[1] -77972
sna::dyad.census(network_statnet)
Mut Asym Null
[1,] 913 1184 14739
The dyad census for null using igraph is coming up -77,972. This seems wrong.
igraph::triad.census(network_igraph)
[1] 700234 19530 249694 8409 2695 5176 7060 13227 1180
[10] 59 6781 1023 1137 786 2782 1611
sna::triad.census(network_statnet)
003 012 102 021D 021U 021C 111D 111U 030T 030C 201
[1,] 700234 150250 118974 8409 2695 5176 7060 13227 1180 59 6781
120D 120U 120C 210 300
[1,] 1023 1137 786 2782 1611
transitivity(network_igraph)
[1] 0.3725138
gtrans(network_statnet)
[1] 0.3580924
transitivity(network_igraph, type = "global")
[1] 0.3725138
transitivity(network_igraph, type = "average")
[1] 0.5055302
transitivity(network_igraph, type = "local") %>% head()
[1] 0.0023288309 0.0013788877 0.0008393993 0.0031740105 0.0007847921
[6] 0.0017129438
The transitivity for igraph and statnet data sets were pretty close.
The global transitivity is .3725 while the average transitivity is higher at .5. This means that actors with fewer connections will have higher transitivity. This could be due to overweighted groups or this could be similar to different departments that know a lot of people in their department, but do not know others in the other departments.
[1] "Albert Meyers" "Thomas Martin" "Andrea Ring"
[4] "Andrew Lewis" "Andy Zipper" "Jeffrey Shankman"
Local_transivity <- transitivity(network_igraph, type = "local")
transitivity_tibble <- tibble(Names = Names, Local_transivity = Local_transivity)
transitivity_tibble %>% arrange(desc(Local_transivity))
# A tibble: 184 x 2
Names Local_transivity
<chr> <dbl>
1 Thomas Martin 0.0571
2 Joe Quenet 0.0179
3 Mark Haedicke 0.0159
4 Kim Ward 0.0157
5 Peter Keavey 0.0146
6 Monika Causholli 0.0134
7 David Delainey 0.00917
8 Susan Pereira 0.00909
9 Larry Campbell 0.00810
10 NA 0.00641
# ... with 174 more rows
gtrans(network_statnet)
[1] 0.3580924
For some reason I am unable to pull local transitivity by type using the method used in HW 1 (vids = V()). I’m not sure if these local transivitys are correct. I ordered it by descending so the largest transivity would be thomas Martin at .05.
#distances(network_igraph, "Thomas Martin","Andrea Ring")
average.path.length(network_igraph)
[1] 2.390464
average.path.length(network_igraph, directed = F)
[1] 2.085787
I took these vertex names, so I’m a bit confused why these are not showing up correctly.
igraph::components(network_igraph)
$membership
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[33] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[65] 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[97] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1
[129] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[161] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
$csize
[1] 182 1 1
$no
[1] 3
#Isolates
isolates(network_statnet)
[1] 72 118
[1] 72 118
For some reason it seems that Network statnet vertex.names is only showing numbers, not actually the names. I wonder if the statnet was set up incorrectly.
There are two isolates.
graph.density(network_igraph)
[1] 3.72443
network.density(network_statnet)
[1] 0.08939178
graph.density(network_igraph, loops = TRUE)
[1] 3.704188
gden(network_statnet, diag = FALSE)
[1] 0.08939178
The Igraph density is over 1, and the network density is around 9%. Very different densities.
[1] 114 428 391 104 957 381
[1] 10 32 21 9 59 30
There is a significant difference between degrees from the igraph dataset compared to the statnet data.
network_degree <- data.frame(Name = V(network_igraph)$Name,
degree = igraph::degree(network_igraph, loops =FALSE))
network_degree %>% arrange(desc(degree)) %>% slice(1:10)
Name degree
1 Jeff Dasovich 13967
2 James Steffes 9404
3 Tana Jones 9307
4 Richard Shapiro 8994
5 NA 6591
6 Steven Kean 6384
7 John Lavorato 6177
8 Michael Grigsby 5860
9 Mark Taylor 5693
10 Louise Kitchen 5362
Jeff Dasovich has the highest number of degrees around 13K. He must be very high up in the company.
[1] 4 21 10 6 30 17
[1] 6 11 11 3 29 13
[1] 78 334 224 88 614 210
[1] 36 92 167 16 325 169
Degree_network <- data.frame(Name = V(network_igraph)$Name,
total_degrees = igraph::degree(network_igraph, loops = FALSE),
in_degree = igraph::degree(network_igraph, mode ="in", loops = FALSE),
out_degree = igraph::degree(network_igraph, mode ="out", loops = FALSE) ) %>% arrange(desc(total_degrees))
Degree_network %>% slice(1:10)
Name total_degrees in_degree out_degree
1 Jeff Dasovich 13967 2612 11355
2 James Steffes 9404 4988 4416
3 Tana Jones 9307 2268 7039
4 Richard Shapiro 8994 6893 2101
5 NA 6591 2698 3893
6 Steven Kean 6384 2676 3708
7 John Lavorato 6177 3352 2825
8 Michael Grigsby 5860 1097 4763
9 Mark Taylor 5693 3694 1999
10 Louise Kitchen 5362 2087 3275
As expected from someone high up in the company. They would mostly have out degree connections, with a select few in degree connections.
summary(Degree_network)
Name total_degrees in_degree
Length:184 Min. : 0.0 Min. : 0.0
Class :character 1st Qu.: 212.8 1st Qu.: 150.5
Mode :character Median : 512.5 Median : 314.0
Mean : 1184.0 Mean : 592.0
3rd Qu.: 1401.2 3rd Qu.: 655.2
Max. :13967.0 Max. :6893.0
out_degree
Min. : 0.00
1st Qu.: 30.75
Median : 159.00
Mean : 591.99
3rd Qu.: 600.50
Max. :11355.00
hist(Degree_network$total_degrees, main = "Enron Degree Distribution", xlab = "Degree")
hist(Degree_network$out_degree, main ="Enron Out-Degree Distribution", xlab = "Degree")
hist(Degree_network$in_degree, main = "Enron In-Degree Distribution", xlab = "Degree")
Most people in the company have limited number of degrees of connections, while their are a select few with many connections.
#centralization(network_statnet, degree, cmode= "indegree")
#centralization(network_statnet, degree, cmode = "outdegree")
centr_degree(network_igraph, loops = FALSE, mode = "in")$centralization
[1] 34.61991
centr_degree(network_igraph, loops = FALSE, mode = "out")$centralization
[1] 59.13566
There is a higher centralization for out-degree nodes compared to in-degree nodes.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Sullivan (2022, Feb. 17). Data Analytics and Computational Social Science: Networks Hw 2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscompjsulliv34864273/
BibTeX citation
@misc{sullivan2022networks, author = {Sullivan, Peter}, title = {Data Analytics and Computational Social Science: Networks Hw 2}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscompjsulliv34864273/}, year = {2022} }