Week 3 Assignment: Degree and Centrality.
[1] FALSE
[1] TRUE
[1] TRUE
Min. 1st Qu. Median Mean 3rd Qu. Max.
1870 1971 1990 1982 2003 2014
The original dataset is the trade dataset version 4 from the Correlates of War Project. The original format is an edgelist. The nodes are countries, and the ties are the trading relations between countries from 1870 to 2014. The network is directed and weighted.
The original dataset is too dense. To identify the structure of the network more clearly, I create a subset which only keep ties with import larger than (or equal to) 100 million.
Let’s look at some basic descriptive facts.
[1] 207
[1] 1773656
[1] 17136.77
[1] 41.59411
[1] 189
[1] 117168
Min. 1st Qu. Median Mean 3rd Qu. Max.
-9 -9 0 143 2 472525
Min. 1st Qu. Median Mean 3rd Qu. Max.
100.0 178.4 373.7 2139.9 1133.2 472525.2
The original dataset has 207 nodes, i.e. 207 countries involed in the trading network. There are 1773656 egdes. It’s really a huge amount since each edge counts as the trading number for one year. On average, one country import 142 million dollars of good from another country each year.
In the subset, we only have 189 nodes, meaning there are 18 countries never have import or export larger than 100 million from 1870 to 2014. The number of egdes decrease to 117168, which is only about 6.6% of the whole dataset. It implies a skewed distribution among international trade. The much looser network give us more chance to identify characteristic of international trading network.
(I will keep comparing the subset with the original one.)
Let’s classify all dyads and traids in the network:
$mut
[1] 886828
$asym
[1] 0
$null
[1] -865507
$mut
[1] 45072
$asym
[1] 27024
$null
[1] -54330
[1] 1884 0 73511 0 0 0 0 0
[9] 0 0 73236 0 0 0 0 1308304
[1] 576073 42173 270669 3983 9872 10708 48167 26529 1316
[10] 228 54882 5108 2127 3675 23094 28810
[1] 0.5201966
[1] 0.2824978
[1] 0.8923257
In the original network, we cannot find a single asymmetry dyad. It makes it look like a undirected network in some way, which constrain the finding we can get.
In the subset, we identify 45072 mutual dyads and 27024 asymmetry dyads, showing signs of trade imbalance. The triad census also show a large proportion (52%) of empty triads. 28% of them only have one egde. 89% of the triads are not triangle.
To get more sense of the pattern of the clustering, let’s calculate the coefficients.
[1] 0.9816825
[1] 0.5560647
[1] 0.9825593
[1] 0.8148533
[1] 1.051686
[1] 1.781887
[1] 1
[1] 1
[1] 207
[1] 189
For the original network, the global and average local clustering coefficient are both near 0.98, almost telling us nothing about the characteristics of the network. In the subset, the global coefficient is 0.57 and the average local one is 0.81. The local one is much larger. It shows the nodes are more clustering in local level. In other words, those trade partners of a country who has less trading relations tends to trade more to each other.
The average shortest path for the original network is 1.05, meaning almost every two country have direct trading relationship. How closely the global market is connected! Meanwhile, in the subset the average shortest path is 1.78. One country needs to take 0.78 more step to get another on average under the threshold. Still very close, isn’t it?
In both dataset, we can only identify one huge component. Global market!
[1] 41.39317
[1] 3.280087
In the original network, the density is 41.39, which is really dense. Not surprisingly, with the threshold we only get a density of 3.28, a much looser picture.
name degree indegree outdegree
Length:207 Min. : 142 Min. : 71 Min. : 71
Class :character 1st Qu.:12284 1st Qu.: 6142 1st Qu.: 6142
Mode :character Median :17716 Median : 8858 Median : 8858
Mean :17137 Mean : 8568 Mean : 8568
3rd Qu.:23300 3rd Qu.:11650 3rd Qu.:11650
Max. :27168 Max. :13584 Max. :13584
name degree indegree outdegree
Length:189 Min. : 1 Min. : 0.0 Min. : 0.0
Class :character 1st Qu.: 129 1st Qu.: 40.0 1st Qu.: 72.0
Mode :character Median : 463 Median : 210.0 Median : 276.0
Mean : 1240 Mean : 619.9 Mean : 619.9
3rd Qu.: 1708 3rd Qu.: 850.0 3rd Qu.: 855.0
Max. :10305 Max. :5366.0 Max. :4939.0
On average, each country has 17137 edges, or trade from 1870 to 2014. With the threshold, they only got 1240 on average. To learn more about it, let’s graph some histograms.
The original network has a distribution closer to normal distribution, while in with the threshold is highly right skewed. It’s another evidence of imbalance of economy and exchange among countries.
[1] 24.46583
[1] 24.46583
[1] 25.3793
[1] 23.09594
Surprisingly, we found the original network and the subset have almost the same centralization. The world market goes around some giants no matter you look at the trivial exchange or huge trade.
name degree indegree outdegree
1 United States of America 27168 13584 13584
2 Mexico 27168 13584 13584
3 Guatemala 27168 13584 13584
4 Colombia 27168 13584 13584
5 Venezuela 27168 13584 13584
name degree indegree outdegree
1 United States of America 27168 13584 13584
2 Mexico 27168 13584 13584
3 Guatemala 27168 13584 13584
4 Colombia 27168 13584 13584
5 Venezuela 27168 13584 13584
name degree indegree outdegree
1 United States of America 27168 13584 13584
2 Mexico 27168 13584 13584
3 Guatemala 27168 13584 13584
4 Colombia 27168 13584 13584
5 Venezuela 27168 13584 13584
name degree indegree outdegree
1 United States of America 10305 5366 4939
2 United Kingdom 8516 4254 4262
3 France 7651 3995 3656
4 Japan 6982 3881 3101
5 Italy 6829 3438 3391
name degree indegree outdegree
1 United States of America 10305 5366 4939
2 United Kingdom 8516 4254 4262
3 France 7651 3995 3656
4 Italy 6829 3438 3391
5 Japan 6982 3881 3101
name degree indegree outdegree
1 United States of America 10305 5366 4939
2 United Kingdom 8516 4254 4262
3 France 7651 3995 3656
4 Japan 6982 3881 3101
5 Italy 6829 3438 3391
In the original network, the US, Mexico, Guatemala, Colombia, and Venezuela have the highest degree, or most trading relations, which is quite against our common sense. Probably it is because the publisher managed to got more data in North and South America in early age. We could constrain our data to recent decades next time.
In the subset, we got the US, the UK, France, Japan and Italy on the top 5, which is not surprising. If we constrain our data to recent decades, maybe we can expect China to show up.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Li (2022, Feb. 17). Data Analytics and Computational Social Science: Homework 3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsyli210813githubiosocialnetworkanalysishw3/
BibTeX citation
@misc{li2022homework, author = {Li, Yifan}, title = {Data Analytics and Computational Social Science: Homework 3}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsyli210813githubiosocialnetworkanalysishw3/}, year = {2022} }