Homework 3

Week 3 Assignment: Degree and Centrality.

Yifan Li (Department of Sociology, UMass Amherst)
2022-02-09
[1] FALSE
[1] TRUE
[1] TRUE
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1870    1971    1990    1982    2003    2014 

The original dataset is the trade dataset version 4 from the Correlates of War Project. The original format is an edgelist. The nodes are countries, and the ties are the trading relations between countries from 1870 to 2014. The network is directed and weighted.

The original dataset is too dense. To identify the structure of the network more clearly, I create a subset which only keep ties with import larger than (or equal to) 100 million.

Let’s look at some basic descriptive facts.

[1] 207
[1] 1773656
[1] 17136.77
[1] 41.59411
[1] 189
[1] 117168
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     -9      -9       0     143       2  472525 
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   100.0    178.4    373.7   2139.9   1133.2 472525.2 

The original dataset has 207 nodes, i.e. 207 countries involed in the trading network. There are 1773656 egdes. It’s really a huge amount since each edge counts as the trading number for one year. On average, one country import 142 million dollars of good from another country each year.

In the subset, we only have 189 nodes, meaning there are 18 countries never have import or export larger than 100 million from 1870 to 2014. The number of egdes decrease to 117168, which is only about 6.6% of the whole dataset. It implies a skewed distribution among international trade. The much looser network give us more chance to identify characteristic of international trading network.

(I will keep comparing the subset with the original one.)

Let’s classify all dyads and traids in the network:

$mut
[1] 886828

$asym
[1] 0

$null
[1] -865507
$mut
[1] 45072

$asym
[1] 27024

$null
[1] -54330
 [1]    1884       0   73511       0       0       0       0       0
 [9]       0       0   73236       0       0       0       0 1308304
 [1] 576073  42173 270669   3983   9872  10708  48167  26529   1316
[10]    228  54882   5108   2127   3675  23094  28810
[1] 0.5201966
[1] 0.2824978
[1] 0.8923257

In the original network, we cannot find a single asymmetry dyad. It makes it look like a undirected network in some way, which constrain the finding we can get.

In the subset, we identify 45072 mutual dyads and 27024 asymmetry dyads, showing signs of trade imbalance. The triad census also show a large proportion (52%) of empty triads. 28% of them only have one egde. 89% of the triads are not triangle.

To get more sense of the pattern of the clustering, let’s calculate the coefficients.

[1] 0.9816825
[1] 0.5560647
[1] 0.9825593
[1] 0.8148533
[1] 1.051686
[1] 1.781887
[1] 1
[1] 1
[1] 207
[1] 189

For the original network, the global and average local clustering coefficient are both near 0.98, almost telling us nothing about the characteristics of the network. In the subset, the global coefficient is 0.57 and the average local one is 0.81. The local one is much larger. It shows the nodes are more clustering in local level. In other words, those trade partners of a country who has less trading relations tends to trade more to each other.

The average shortest path for the original network is 1.05, meaning almost every two country have direct trading relationship. How closely the global market is connected! Meanwhile, in the subset the average shortest path is 1.78. One country needs to take 0.78 more step to get another on average under the threshold. Still very close, isn’t it?

In both dataset, we can only identify one huge component. Global market!

[1] 41.39317
[1] 3.280087

In the original network, the density is 41.39, which is really dense. Not surprisingly, with the threshold we only get a density of 3.28, a much looser picture.

     name               degree         indegree       outdegree    
 Length:207         Min.   :  142   Min.   :   71   Min.   :   71  
 Class :character   1st Qu.:12284   1st Qu.: 6142   1st Qu.: 6142  
 Mode  :character   Median :17716   Median : 8858   Median : 8858  
                    Mean   :17137   Mean   : 8568   Mean   : 8568  
                    3rd Qu.:23300   3rd Qu.:11650   3rd Qu.:11650  
                    Max.   :27168   Max.   :13584   Max.   :13584  
     name               degree         indegree        outdegree     
 Length:189         Min.   :    1   Min.   :   0.0   Min.   :   0.0  
 Class :character   1st Qu.:  129   1st Qu.:  40.0   1st Qu.:  72.0  
 Mode  :character   Median :  463   Median : 210.0   Median : 276.0  
                    Mean   : 1240   Mean   : 619.9   Mean   : 619.9  
                    3rd Qu.: 1708   3rd Qu.: 850.0   3rd Qu.: 855.0  
                    Max.   :10305   Max.   :5366.0   Max.   :4939.0  

On average, each country has 17137 edges, or trade from 1870 to 2014. With the threshold, they only got 1240 on average. To learn more about it, let’s graph some histograms.

The original network has a distribution closer to normal distribution, while in with the threshold is highly right skewed. It’s another evidence of imbalance of economy and exchange among countries.

[1] 24.46583
[1] 24.46583
[1] 25.3793
[1] 23.09594

Surprisingly, we found the original network and the subset have almost the same centralization. The world market goes around some giants no matter you look at the trivial exchange or huge trade.

                      name degree indegree outdegree
1 United States of America  27168    13584     13584
2                   Mexico  27168    13584     13584
3                Guatemala  27168    13584     13584
4                 Colombia  27168    13584     13584
5                Venezuela  27168    13584     13584
                      name degree indegree outdegree
1 United States of America  27168    13584     13584
2                   Mexico  27168    13584     13584
3                Guatemala  27168    13584     13584
4                 Colombia  27168    13584     13584
5                Venezuela  27168    13584     13584
                      name degree indegree outdegree
1 United States of America  27168    13584     13584
2                   Mexico  27168    13584     13584
3                Guatemala  27168    13584     13584
4                 Colombia  27168    13584     13584
5                Venezuela  27168    13584     13584
                      name degree indegree outdegree
1 United States of America  10305     5366      4939
2           United Kingdom   8516     4254      4262
3                   France   7651     3995      3656
4                    Japan   6982     3881      3101
5                    Italy   6829     3438      3391
                      name degree indegree outdegree
1 United States of America  10305     5366      4939
2           United Kingdom   8516     4254      4262
3                   France   7651     3995      3656
4                    Italy   6829     3438      3391
5                    Japan   6982     3881      3101
                      name degree indegree outdegree
1 United States of America  10305     5366      4939
2           United Kingdom   8516     4254      4262
3                   France   7651     3995      3656
4                    Japan   6982     3881      3101
5                    Italy   6829     3438      3391

In the original network, the US, Mexico, Guatemala, Colombia, and Venezuela have the highest degree, or most trading relations, which is quite against our common sense. Probably it is because the publisher managed to got more data in North and South America in early age. We could constrain our data to recent decades next time.

In the subset, we got the US, the UK, France, Japan and Italy on the top 5, which is not surprising. If we constrain our data to recent decades, maybe we can expect China to show up.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Li (2022, Feb. 17). Data Analytics and Computational Social Science: Homework 3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsyli210813githubiosocialnetworkanalysishw3/

BibTeX citation

@misc{li2022homework,
  author = {Li, Yifan},
  title = {Data Analytics and Computational Social Science: Homework 3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsyli210813githubiosocialnetworkanalysishw3/},
  year = {2022}
}