Networks Hw 2

A closer look at Enrons Emails

Peter Sullivan
2022-02-09

Looking at Nodes and Edges:

ls()
[1] "network_edgelist" "network_igraph"   "network_statnet" 
vcount(network_igraph)
[1] 184
ecount(network_igraph)
[1] 125409
print(network_statnet)
 Network attributes:
  vertices = 184 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 3010 
    missing edges= 0 
    non-missing edges= 3010 

 Vertex attribute names: 
    vertex.names 

 Edge attribute names not shown 
#print(network_igraph)

It looks like the igraph and statnet variables are showing different edges. The network igraph is showing 184 nodes and 125409 edges. The network statnet is showing 184 nodes, and 3010 edges.


Weighted, Directed, Single Mode Network?

is_bipartite(network_igraph)
[1] FALSE
is_directed(network_igraph)
[1] TRUE
is_weighted(network_igraph)
[1] FALSE

Using the Network Igraph set, we have a single mode network, which is directed, and is not weighted.


Looking at Vertex and Edge Attributes:

vertex_attr_names(network_igraph)
[1] "Email" "Name"  "Note" 
network::list.vertex.attributes(network_statnet)
[1] "na"           "vertex.names"
edge_attr_names(network_igraph)
[1] "Time"      "Reciptype" "Topic"     "LDC_topic"
network::list.edge.attributes(network_statnet)
[1] "LDC_topic"      "LDC_topic_desc" "LDC_topic_name"
[4] "na"             "Reciptype"      "Time"          
[7] "Topic"         

Igraph Attribute Names: Email, Name, Note

Igraph edge names: Time, Reciptype, Topic, LDC_topic

Statnet attribute names: na, vertex.names

statnet edge names: LDC_topic, LDC_topic_desc, LDC_topic_name, na, Reciptype, Time, Topic


Accessing Attribute DATA:

V(network_igraph)$Name %>% head()
[1] "Albert Meyers"    "Thomas Martin"    "Andrea Ring"     
[4] "Andrew Lewis"     "Andy Zipper"      "Jeffrey Shankman"
V(network_igraph)$Email %>% head()
[1] "albert.meyers" "a..martin"     "andrea.ring"   "andrew.lewis" 
[5] "andy.zipper"   "a..shankman"  
V(network_igraph)$Note %>% head()
[1] "Employee, Specialist"         "Vice President"              
[3] "NA"                           "Director"                    
[5] "Vice President, Enron Online" "President, Enron Global Mkts"
(network_igraph)$Carrier %>% head()
NULL
head(network_statnet %v% "vertex.names")
[1] 1 2 3 4 5 6
head(network_statnet %e% "Time")
[1] "1979-12-31 21:00:00" "1979-12-31 21:00:00" "1979-12-31 21:00:00"
[4] "1979-12-31 21:00:00" "1979-12-31 21:00:00" "1979-12-31 21:00:00"
head(network_statnet %e% "LDC_topic")
[1] "-1" "-1" "-1" "-1" "-1" "-1"


Summarizing Attribute DATA

summary(E(network_igraph)$Time)
   Length     Class      Mode 
   125409 character character 
summary(network_statnet %e% "Distance")
Length  Class   Mode 
     0   NULL   NULL 

#### Dyad Census
igraph::dyad.census(network_igraph)
$mut
[1] 30600

$asym
[1] 64208

$null
[1] -77972
sna::dyad.census(network_statnet)
     Mut Asym  Null
[1,] 913 1184 14739

The dyad census for null using igraph is coming up -77,972. This seems wrong.


Triad Census

igraph::triad.census(network_igraph)
 [1] 700234  19530 249694   8409   2695   5176   7060  13227   1180
[10]     59   6781   1023   1137    786   2782   1611
sna::triad.census(network_statnet)
        003    012    102 021D 021U 021C 111D  111U 030T 030C  201
[1,] 700234 150250 118974 8409 2695 5176 7060 13227 1180   59 6781
     120D 120U 120C  210  300
[1,] 1023 1137  786 2782 1611


Transitivity

transitivity(network_igraph)
[1] 0.3725138
gtrans(network_statnet)
[1] 0.3580924
transitivity(network_igraph, type = "global")
[1] 0.3725138
transitivity(network_igraph, type = "average")
[1] 0.5055302
transitivity(network_igraph, type = "local") %>% head()
[1] 0.0023288309 0.0013788877 0.0008393993 0.0031740105 0.0007847921
[6] 0.0017129438

The transitivity for igraph and statnet data sets were pretty close.

The global transitivity is .3725 while the average transitivity is higher at .5. This means that actors with fewer connections will have higher transitivity. This could be due to overweighted groups or this could be similar to different departments that know a lot of people in their department, but do not know others in the other departments.


Local Transitivity

Names <- V(network_igraph)$Name
Names %>% head()
[1] "Albert Meyers"    "Thomas Martin"    "Andrea Ring"     
[4] "Andrew Lewis"     "Andy Zipper"      "Jeffrey Shankman"
Local_transivity <- transitivity(network_igraph, type = "local")

transitivity_tibble <- tibble(Names = Names, Local_transivity = Local_transivity)

transitivity_tibble %>% arrange(desc(Local_transivity))
# A tibble: 184 x 2
   Names            Local_transivity
   <chr>                       <dbl>
 1 Thomas Martin             0.0571 
 2 Joe Quenet                0.0179 
 3 Mark Haedicke             0.0159 
 4 Kim Ward                  0.0157 
 5 Peter Keavey              0.0146 
 6 Monika Causholli          0.0134 
 7 David Delainey            0.00917
 8 Susan Pereira             0.00909
 9 Larry Campbell            0.00810
10 NA                        0.00641
# ... with 174 more rows
gtrans(network_statnet)
[1] 0.3580924

For some reason I am unable to pull local transitivity by type using the method used in HW 1 (vids = V()). I’m not sure if these local transivitys are correct. I ordered it by descending so the largest transivity would be thomas Martin at .05.


Distances in the Network

#distances(network_igraph, "Thomas Martin","Andrea Ring")



average.path.length(network_igraph)
[1] 2.390464
average.path.length(network_igraph, directed = F)
[1] 2.085787

I took these vertex names, so I’m a bit confused why these are not showing up correctly.


Identifying Isolates

igraph::components(network_igraph)
$membership
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [33] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [65] 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [97] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1
[129] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[161] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

$csize
[1] 182   1   1

$no
[1] 3
#Isolates
isolates(network_statnet)
[1]  72 118
as.vector(network_statnet %v% "vertex.names")[c(isolates(network_statnet))]
[1]  72 118

For some reason it seems that Network statnet vertex.names is only showing numbers, not actually the names. I wonder if the statnet was set up incorrectly.

There are two isolates.

Density

graph.density(network_igraph)
[1] 3.72443
network.density(network_statnet)
[1] 0.08939178
graph.density(network_igraph, loops = TRUE)
[1] 3.704188
gden(network_statnet, diag = FALSE)
[1] 0.08939178

The Igraph density is over 1, and the network density is around 9%. Very different densities.


Vertex Degrees

igraph::degree(network_igraph) %>% head()
[1] 114 428 391 104 957 381
sna::degree(network_statnet) %>% head()
[1] 10 32 21  9 59 30

There is a significant difference between degrees from the igraph dataset compared to the statnet data.


network_degree <- data.frame(Name = V(network_igraph)$Name,
                             degree = igraph::degree(network_igraph, loops =FALSE))
network_degree %>% arrange(desc(degree)) %>% slice(1:10)
              Name degree
1    Jeff Dasovich  13967
2    James Steffes   9404
3       Tana Jones   9307
4  Richard Shapiro   8994
5               NA   6591
6      Steven Kean   6384
7    John Lavorato   6177
8  Michael Grigsby   5860
9      Mark Taylor   5693
10  Louise Kitchen   5362

Jeff Dasovich has the highest number of degrees around 13K. He must be very high up in the company.


Degree in Directed Networks

sna::degree(network_statnet, cmode = "indegree")%>% head()
[1]  4 21 10  6 30 17
sna::degree(network_statnet, cmode = "outdegree") %>% head()
[1]  6 11 11  3 29 13
igraph::degree(network_igraph, mode ="in", loops = FALSE) %>% head()
[1]  78 334 224  88 614 210
igraph::degree(network_igraph, mode ="out", loops = FALSE)%>%head()
[1]  36  92 167  16 325 169
Degree_network <- data.frame(Name = V(network_igraph)$Name,
           total_degrees = igraph::degree(network_igraph, loops = FALSE),
           in_degree = igraph::degree(network_igraph, mode ="in", loops = FALSE),
           out_degree = igraph::degree(network_igraph, mode ="out", loops = FALSE) ) %>% arrange(desc(total_degrees))

Degree_network %>% slice(1:10)
              Name total_degrees in_degree out_degree
1    Jeff Dasovich         13967      2612      11355
2    James Steffes          9404      4988       4416
3       Tana Jones          9307      2268       7039
4  Richard Shapiro          8994      6893       2101
5               NA          6591      2698       3893
6      Steven Kean          6384      2676       3708
7    John Lavorato          6177      3352       2825
8  Michael Grigsby          5860      1097       4763
9      Mark Taylor          5693      3694       1999
10  Louise Kitchen          5362      2087       3275

As expected from someone high up in the company. They would mostly have out degree connections, with a select few in degree connections.


Summary Statistics

summary(Degree_network)
     Name           total_degrees       in_degree     
 Length:184         Min.   :    0.0   Min.   :   0.0  
 Class :character   1st Qu.:  212.8   1st Qu.: 150.5  
 Mode  :character   Median :  512.5   Median : 314.0  
                    Mean   : 1184.0   Mean   : 592.0  
                    3rd Qu.: 1401.2   3rd Qu.: 655.2  
                    Max.   :13967.0   Max.   :6893.0  
   out_degree      
 Min.   :    0.00  
 1st Qu.:   30.75  
 Median :  159.00  
 Mean   :  591.99  
 3rd Qu.:  600.50  
 Max.   :11355.00  


Degree Distribution

hist(Degree_network$total_degrees, main = "Enron Degree Distribution", xlab = "Degree")
hist(Degree_network$out_degree, main ="Enron Out-Degree Distribution", xlab = "Degree")
hist(Degree_network$in_degree, main = "Enron In-Degree Distribution", xlab = "Degree")

Most people in the company have limited number of degrees of connections, while their are a select few with many connections.

Network Degree Centralization

#centralization(network_statnet, degree, cmode= "indegree")
#centralization(network_statnet, degree, cmode = "outdegree")


centr_degree(network_igraph, loops = FALSE, mode = "in")$centralization
[1] 34.61991
centr_degree(network_igraph, loops = FALSE, mode = "out")$centralization
[1] 59.13566

There is a higher centralization for out-degree nodes compared to in-degree nodes.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Sullivan (2022, Feb. 17). Data Analytics and Computational Social Science: Networks Hw 2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscompjsulliv34864273/

BibTeX citation

@misc{sullivan2022networks,
  author = {Sullivan, Peter},
  title = {Data Analytics and Computational Social Science: Networks Hw 2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscompjsulliv34864273/},
  year = {2022}
}