Data Analytics and Computational Social Science: Homework 9

Yifan Li

Descriptive Statistics

[1] FALSE

[1] TRUE

[1] TRUE

The original dataset is the trade dataset version 4 from the Correlates of War Project. In this subject I only use trade data of year 2014. The format is edgelist. The nodes are countries, and the ties are the trading relations between countries in 2014. The network is directed and weighted.

Let’s look at some basic descriptive facts.

[1] 186

[1] 22451

[1] 241.4086

[1] 0.6524557

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
     0.0      0.2      4.9    813.1     78.5 472525.2

The network in 2014 has 186 nodes, i.e. 186 countries involed in the trading network. There are 22451 egdes. Each country has 241 connected edges on average. On average, one country import 813 million dollars of good from another country each year. 65% of potential ties exit.

Let’s classify all dyads and traids in the network:

$mut
[1] 9933

$asym
[1] 2585

$null
[1] 4687

 [1]  75574  62420  92513  11021   9514  11233  60137  45246   5530
[10]    788 188669  14094   9006  15882 147383 306230

[1] 0.07161783

[1] 0.3484117

[1] 1

There are 9933 mutual bilateral trade relations, 2585 unilateral trade relations, and 4687 pairs of coutries have no trading in 2014.

7% of the triads are empty, and 65% of them are triangle. The network is quite dense.

Centrality Scores

Let’s look at the distribution of nodes centrality.

The betweenness centrality and reflected centrality are right-skewed as expected. Limited coutries locate at the central bridging positions. But the bonachic-power and closeness show nearly normal distribution, and the eigenvector centrality and deprived centrality are highly left-skewed, which might show a decentralized network. The deprived centrality contribute to largest part of eigenvector centrality. Every country kind of plays as bridge in the network.

 Network attributes:
  vertices = 186 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 22451 
    missing edges= 0 
    non-missing edges= 22451 

 Vertex attribute names: 
    vertex.names 

 Edge attribute names not shown

Using CUG-tests to test network properties

After learning statistics about the network, let’s test whether these statistics are significantly different from null hypothesis. Compare to null network conditioning on size is almost meaningless. The world market is too closely connected and dense. Let’s test against null hypothesis conditioning on density.


Univariate Conditional Uniform Graph Test

Conditioning Method: edges 
Graph Type: digraph 
Diagonal Used: FALSE 
Replications: 100 

Observed Value: 0.7765289 
Pr(X>=Obs): 0 
Pr(X<=Obs): 1

[1] 490.4112

The observed network transitivity is 0.78. We can feel confident accepting the alternative hypothesis that the observed transitivity is clearly higher than would be expected from a random network.(p<0.001)


Univariate Conditional Uniform Graph Test

Conditioning Method: edges 
Graph Type: digraph 
Diagonal Used: FALSE 
Replications: 100 

Observed Value: 11702.88 
Pr(X>=Obs): 0 
Pr(X<=Obs): 1

[1] 887366.5

The observed network degree centralization is 11702.88. We can feel confident accepting the alternative hypothesis that the observed 11702.88 is clearly higher than would be expected from a random network.(p<0.001)


Univariate Conditional Uniform Graph Test

Conditioning Method: edges 
Graph Type: digraph 
Diagonal Used: FALSE 
Replications: 50 

Observed Value: 0.007458064 
Pr(X>=Obs): 0 
Pr(X<=Obs): 1

[1] 129.5252

The observed network betweenness centralization is 0.0075. We can feel confident accepting the alternative hypothesis that the observed betweenness centralization is clearly higher than would be expected from a random network.(p<0.001)

Compare to Simulated Networks

                Observed   Simulated         SD   tvalue
density      0.652455681 0.005376344 0.00000000      Inf
transitivity 0.776528929 0.000000000 0.00000000      Inf
indegCent    0.343988313 0.142198977 0.05181549 3.894382
betwCent     0.007458064 0.002974834 0.00167135 2.682401

The density and transitivity scores are significantly higher than expected. (p>0.001) The indegree and betweenness centrality scores are also sighnificantly higher than simulation.(p>0.01)

Since the density is very high, let’s model a preferential attachment network with higher average degree.

                Observed Simulated SD tvalue
density      0.652455681 0.5000000  0    Inf
transitivity 0.776528929 1.0000000  0   -Inf
indegCent    0.343988313 0.5027027  0   -Inf
betwCent     0.007458064 0.0000000  0    Inf

Based on PA model, the observed density and betweeness centrality are still significantly high compared with simulation, while transivity and indegree centrality are lower.(p<0.001)

Notice that the mean of simulated transitivity score is 1.0, indicating a fully transitive network. I think it’s not a good null hypothesis.

Then, simulate a preferential attachment network conditional on the degree distribution we observe, using the out.seq= option。

                Observed   Simulated           SD     tvalue
density      0.652455681 0.477390294 0.0000000000        Inf
transitivity 0.776528929 0.990193508 0.0005187225 -411.90536
indegCent    0.343988313 0.513152374 0.0051016389  -33.15877
betwCent     0.007458064 0.001389953 0.0005487202   11.05866

Similarly, the observed density and betweeness centrality are still significantly high compared with simulation, while transivity and indegree centrality are lower. The mean of simulated transitivity score is also near 1.

Compare with other network.

[1] 0.9442986


QAP Test Results

Estimated p-values:
    p(f(perm) >= f(d)): 0 
    p(f(perm) <= f(d)): 1

The world trade network in 2004 is very similar with trade2014, with a correlation of 0.94. It’s significantly higher than correlation between two random networks.(p>0.001) During the ten years, the trade partnership didn’t change much.

[1] 0.5554877


QAP Test Results

Estimated p-values:
    p(f(perm) >= f(d)): 0 
    p(f(perm) <= f(d)): 1

The correlation between trade 1964 and 2014 is much lower (0.56). It indicates the trade relationship does change a lot during the 50 years. But still, the score is significantly higher than random. (p>0.001) The trade network in 2014 can be predicted a lot by network in 1964.

Comment on this article Share:

Homework 9

Descriptive Statistics

Centrality Scores

Using CUG-tests to test network properties

Compare to Simulated Networks

Compare with other network.

Reuse

Citation