Blog Post 7, Integrating ML

This post is an analysis of community structure and machine learning techniques on my medieval dataset.

Noah Milstein true
2022-03-21

An Introduction to the Project and Dataset

The project that I am doing involves conflict in the high middle ages. This was the period between 1000 and 1200

Part 1:

Describe the Dataset You Are Using:

The Dataset Being Used: The dataset that I am using is wikipedia list of wars throughout history, this article is the “List of wars: 1000–1499” which acts as a subset of the “2nd-millennium conflicts” I chose this dataset as an exemplar of popular history’s depiction of the centralization of worldwide conflict. Wikipedia, being an accessible source generally created from relevant citations makes it a good case study to see where historical writers and academics center their world are relevant conflicts.

Identify initial network format:

Answer: The initial network format is as an edge list, the first, in column contains the winners of each war while the second, out column contains the losers of each. These sets of belligerents are directed

Network Structure: Wars Startings in the 1000s

 Network attributes:
  vertices = 111 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 153 
    missing edges= 0 
    non-missing edges= 153 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Network Structure: Wars Startings in the 1100s

 Network attributes:
  vertices = 97 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 238 
    missing edges= 0 
    non-missing edges= 238 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Network Structure: Wars Starting in the 1200s

 Network attributes:
  vertices = 161 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 313 
    missing edges= 0 
    non-missing edges= 313 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Identify Nodes: Describe and identify the nodes (including how many nodes are in the dataset)

Answer: Nodes or vertices in these datasets represent belligerents in wars throughout history, the involved parties in each conflict can be a nation, province, individual, or group so long as they are listed as involved in the conflict. In the 1000s there are 117, in the 1100s there are 78 and in the 1200s there are 161.

What Constitutes a Tie: What constitutes a tie or edge (including how many ties, whether ties are directed/undirected and weighted/binary, and how to interpret the value of the tie if any)

Answer: A tie or edge in this dataset represents a war, this war can be between two nations or groups within a nation. These edges can represent a war that involved many more nations but are always tied to each and every party involved on both sides. These edges are directed and the direction indicates which side “won” the conflict (if an edge has an arrow pointing to another the node that originated that arrow won the war against them. There are 153 edges in the 1000s, 225 edges in 1100s and 313 edges in the 1200s.

Edge Attributes and Subset: Whether or not there are edge attributes that might be used to subset data or stack multiple networks (e.g., tie type, year, etc).

Answer: There are a number of attributes that could be used to subset the data, year that the conflict began or the length of time it lasted are available. Aspects like each side’s religion and the area where the conflict took place could be used to subset the data itself.

Part 2:

Brokerage and Betweeness centrality

What are betweeness and brokerage cenrrality Calculate brokerage and betweenneess centrality measures for one or more subsets of your network data, and write up the results and your interpretation of them.

Answer: I will be calculating these measures for wars in 1000-1099, 1100-1199, and 1200-1399.

Brokerage scores in the 1000s

(wars_in_1000s.nodes.stat_2%>%
  arrange(desc(broker.tot))%>%
  slice(1:10))[,c(1,11:15)] %>%kable()
name broker.tot broker.coord broker.itin broker.rep broker.gate
Byzantine Empire 22.7376579 NaN 3.1654785 NaN NaN
Holy Roman Empire 9.2813605 NaN 2.2468427 NaN NaN
Sultanate of Rum 9.2813605 NaN -0.5090648 NaN NaN
England 6.9745666 NaN 5.0036896 -0.0853606 -0.0853606
Kingdom of Sicily 5.0522384 -0.0176111 4.0866123 -0.1201631 -0.1201631
Seljuk Empire 1.9765133 -0.0176111 -0.5084146 3.4677529 -0.1201631
Kingdom of France 1.9765133 NaN -0.5090648 NaN NaN
Kingdom of Georgia 0.8231164 -0.0176111 -0.5084146 -0.1201631 -0.1201631
Papal States 0.4386507 -0.0176111 -0.5084146 -0.1201631 10.6435850
Ghaznavids 0.0541851 -0.1380791 -0.4907567 -0.2903366 -0.2903366

Brokerage scores in the 1100s

(wars_in_1100s.nodes.stat_2%>%
  arrange(desc(broker.tot))%>%
  slice(1:10))[,c(1,10:14)] %>%kable()
name broker.tot broker.coord broker.itin broker.rep broker.gate
Kingdom of Jerusalem 17.1050061 NaN 2.8705599 24.5610650 -0.1357675
Fatimid Caliphate 10.2415178 NaN -0.6472506 NaN NaN
Ayyubid Dynasty 9.3615834 NaN -0.6465587 -0.1357675 -0.1357675
Zengid Dynasty 7.4257278 NaN 0.7591543 NaN NaN
Byzantine Empire 6.8977671 NaN 0.7602887 -0.1357675 -0.1357675
England 5.8418459 NaN -0.6465587 -0.1357675 -0.1357675
Holy Roman Empire 3.0260558 NaN -0.6465587 -0.1357675 -0.1357675
Kingdom of France 1.6181608 NaN -0.6465587 -0.1357675 -0.1357675
Kingdom of Sicily 0.5622395 -0.1467125 -0.6293842 -0.3476788 -0.3476788
Papal States 0.0342789 -0.1264908 -0.6336748 -0.3236913 2.5014147

Brokerage scores in the 1200s

name broker.tot broker.coord broker.itin broker.rep broker.gate
Mongol Empire 47.964825 NaN -0.5966483 NaN NaN
Kingdom of France 28.663539 NaN -0.5966483 NaN NaN
Ayyubid Dynasty 26.995527 NaN 2.3528915 NaN NaN
Kingdom of England 21.991489 NaN 8.9893561 NaN NaN
Republic of Genoa 11.983415 NaN -0.5966483 NaN NaN
Knights Templar 10.077115 NaN 1.6155066 NaN NaN
Holy Roman Empire 4.834790 -0.0170801 -0.5961482 10.865523 10.865523
Principality of Antioch 4.834790 -0.0170801 2.3541101 13.613565 -0.126648
Kingdom of Cyprus 4.596503 58.5391124 0.1414163 13.613565 10.865523
Armenian Kingdom of Cilicia 3.881640 -0.0170801 -0.5961482 -0.126648 -0.126648
name broker.gate
Papal States 10.6435850
County of Aversa -0.0853606
County of Sicily -0.0853606
England -0.0853606
Chola Empire -0.0853606
County of Apulia -0.1201631
Kingdom of Sicily -0.1201631
Kingdom of Georgia -0.1201631
Great Seljuq Empire -0.1201631
Seljuk Empire -0.1201631
name broker.tot
Byzantine Empire 22.7376579
Holy Roman Empire 9.2813605
Sultanate of Rum 9.2813605
England 6.9745666
Kingdom of Sicily 5.0522384
Seljuk Empire 1.9765133
Kingdom of France 1.9765133
Kingdom of Georgia 0.8231164
Papal States 0.4386507
Ghaznavids 0.0541851

Option 2.A

For a Specific Research Question: If you have a specific research question, please feel free to use that to guide your analysis. Otherwise, you may want to orient your analysis as follows in order to identify a compelling question or noteworthy pattern in the data that can be interpreted.

Answer: Since I am interested in the relative power of nations by their relative position ad centrality in the worldwide conflict, network brokerage can be used to illustrate significant positions in global conflict. Below I wanted to look at 4 kinds of brokerage, these are broker.gate or gatekeeper, coordinator, liason, and itinerant. I am interested to see if these specific coordination types are primarily done by specific nations.

Total Brokerage

Explanation: Looking at total brokerage in this dataset gives a sense of which factions were responsible for highest connection of unconnected actors through conflict. Given the crusades igniting conflict between Europe and the middle east it is sensible that the Byzantine Empire in the center of both connects the most unconnected actors through conflict closely followed by the Sultanate of Rum, a major Muslim faction that fought against the crusades and third being the Holy Roman Empire who participated in many conflicts including the crusades. These are followed by England who centered the wars in the British isles and the Kingdom of Sicily who were also in a position of conflict.

name broker.tot
Byzantine Empire 22.737658
Holy Roman Empire 9.281360
Sultanate of Rum 9.281360
England 6.974567
Kingdom of Sicily 5.052238

Coordinator Brokerage

Explanation: In this case no particular country is very high above any other in terms of their coordinator brokerage, meaning that within groups no particular nations appear to be brokering more within the groups.

name broker.coord
County of Apulia -0.0176111
Kingdom of Sicily -0.0176111
Kingdom of Georgia -0.0176111
Great Seljuq Empire -0.0176111
Papal States -0.0176111

Itinerant Brokerage

Explanation: Itinerant brokerage represents when a non-group actor connects 2 actors in a group it is no in to each other, in this case England has the highest score. Looking at the network graph they do appear to connect 2 actors in a group together.

name broker.itin
England 5.0036896
Kingdom of Sicily 4.0866123
Byzantine Empire 3.1654785
Holy Roman Empire 2.2468427
Principality of Kiev 0.4812412

Representative Brokerage

Explanation: Representative brokerage indicates that the broker, or nation in question loses a war to another in their group, but wins another against a faction outside of their group. This can be though of as their directed connections to them. In this case the Seljuk Empire and Kingdom of Aragon have instances in which they lose to factions within their group before beating those outside of it.

name broker.rep
Seljuk Empire 3.4677529
Kingdom of Aragon 0.9281821
County of Aversa -0.0853606
County of Sicily -0.0853606
England -0.0853606

Gatekeeper Brokerage

Explanation: The Papal states being ranked highest in gatekeeper brokerage is an interesting observation as no other nation in the dataset appears to be close to their level as most are negative in this category. In this cae being a gatekeeper means that they are in at conflict in a group with another while the nation in a different group of conflicts is only at war with them from the group. This is an interesting observation given the Papal states role as a coordinator of the war, but not a participant in the conflcit as directly as other belligerents. (This being the crusade given the period)

name broker.gate
Papal States 10.6435850
County of Aversa -0.0853606
County of Sicily -0.0853606
England -0.0853606
Chola Empire -0.0853606

Liaison Brokerage

Explanation: A liaison broker, in this case, is a faction that loses a war to a group they do not belong to and wins a war against a different group than the first that they also do not belong to. The Byzantine Empire, Sultanate of Rum, and Holy Roman Empire are highest in this category likely owing to their frequent states of conflict beyond the crusades against a variety of groups.

name broker.lia
Byzantine Empire 28.140866
Sultanate of Rum 12.477603
Holy Roman Empire 10.961803
England 6.548214
Kingdom of Sicily 4.589419

Network 1000s Plot Grouping Determined with No Cluster Method

Network 1000s Plot Grouping Determined with the Average Cluster Method

Network 1000s Plot Grouping Determined with the Single Cluster Method

Network 1000s Plot Grouping Determined with the Ward.D Cluster Method

Network 1000s Plot igraph

Network Graphing 1100s

Network 1100s Plot Grouping Determined with No Cluster Method

Network 1100s Plot Grouping Determined with the Average Cluster Method

Network 1100s Plot Grouping Determined with the Single Cluster Method

Network 1100s Plot Grouping Determined with the Ward.D Cluster Method

Network 1100s Plot igraph

wars_in_1000s_edgelist <- as.matrix(wars_in_1000s)

wars_in_1000s_edgelist_network_edgelist <- graph.edgelist(wars_in_1000s_edgelist, directed=TRUE)

wars_in_1000s.ig<-graph_from_data_frame(wars_in_1000s)

wars_in_1000s_network <- asNetwork(wars_in_1000s.ig)
aspects_of_1000s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1000s_states.xlsx")

total_1000s <- merge(aspects_of_1000s_states, wars_in_1000s.nodes.stat_2, by="name")
total_1000s_brokerag_reg<-total_1000s

total_1000s_brokerag_reg$win_rate <- (total_1000s_brokerag_reg$outdegree/total_1000s_brokerag_reg$totdegree)

total_1000s_brokerag_reg$loss_rate <- (total_1000s_brokerag_reg$indegree/total_1000s_brokerag_reg$totdegree)

total_1000s_brokerag_reg_binom <- total_1000s_brokerag_reg %>% mutate(more_win_or_loss = case_when(
  win_rate < 0.5 ~ 0,
    win_rate >= 0.5 ~ 1))

First_1000s_regression <- glm(more_win_or_loss~.-name-totdegree-indegree-outdegree-dc-eigen.dc-win_rate-loss_rate, total_1000s_brokerag_reg_binom, family=binomial)

First_1000s_regression

Call:  glm(formula = more_win_or_loss ~ . - name - totdegree - indegree - 
    outdegree - dc - eigen.dc - win_rate - loss_rate, family = binomial, 
    data = total_1000s_brokerag_reg_binom)

Coefficients:
 (Intercept)      Catholic         Islam      Orthodox      Buddhist  
  -2.090e+01     1.446e-01    -7.108e-02    -4.043e-01    -8.572e-02  
       Pagan      Tengrism        Shinto         Hindu     Shamanism  
   5.506e-01    -5.656e+01     1.820e+00    -2.142e+00    -1.506e+00  
       eigen         close            rc      eigen.rc    broker.tot  
  -1.877e+03     5.146e+03    -3.979e+00     1.574e+03     2.378e+02  
broker.coord   broker.itin    broker.rep   broker.gate    broker.lia  
  -9.610e+01    -9.449e+01    -7.164e+01    -2.810e+01    -1.298e+02  

Degrees of Freedom: 101 Total (i.e. Null);  82 Residual
  (8 observations deleted due to missingness)
Null Deviance:      140.8 
Residual Deviance: 4.53e-09     AIC: 40
set.seed(292)

total_1000s_for_regression <- total_1000s[,-c(1, 20:25)]

total_1000s_for_regression$win_rate <- (total_1000s_for_regression$outdegree/total_1000s_for_regression$totdegree)

total_1000s_for_regression$loss_rate <- (total_1000s_for_regression$indegree/total_1000s_for_regression$totdegree)

total_1000s_for_regression <- total_1000s_for_regression %>% mutate(more_win_or_loss = case_when(
  win_rate < 0.5 ~ 0,
    win_rate >= 0.5 ~ 1))

First_1000s_regression <- glm(more_win_or_loss~.-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression, family=binomial)

First_1000s_regression

Call:  glm(formula = more_win_or_loss ~ . - loss_rate - win_rate - totdegree - 
    indegree - outdegree - dc - eigen.dc, family = binomial, 
    data = total_1000s_for_regression)

Coefficients:
(Intercept)     Catholic        Islam     Orthodox     Buddhist  
   -15.1948      13.9008      12.7531      14.6893      15.0858  
      Pagan     Tengrism       Shinto        Hindu    Shamanism  
     0.9610      11.6691      16.0623       9.1358      -0.1497  
      eigen        close           rc     eigen.rc  
   -82.1100     256.5294      -3.3322     -17.3152  

Degrees of Freedom: 109 Total (i.e. Null);  96 Residual
Null Deviance:      152.3 
Residual Deviance: 58.4     AIC: 86.4
set.seed(6738)

in_training<- sample(1:nrow(total_1000s_for_regression),  nrow(total_1000s_for_regression) * 0.7 )

training_1000s <- total_1000s_for_regression[in_training,]

test_1000s <- total_1000s_for_regression[-in_training,]

lm_1000s_binom_subset_1 <- glm(more_win_or_loss~.-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression, family=binomial, subset = in_training )

logsitic_1_1000s_prob <- predict(lm_1000s_binom_subset_1, test_1000s,
type = "response")

log_preds_1<-ifelse(logsitic_1_1000s_prob >= 0.5, 1, 0)

prediction_1_logs <-mean(log_preds_1 == test_1000s$more_win_or_loss)

prediction_1_logs %>% kable()
x
0.9090909
set.seed(246)

x_ridge <- model.matrix(more_win_or_loss ~ .-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression)[, -1] 

y_ridge <- total_1000s_for_regression$more_win_or_loss

grid <- 10^seq(10, -2, length = 100)

ridge.mod <- glmnet(x_ridge, y_ridge, alpha = 0, lambda = grid)

dim(coef(ridge.mod))
[1]  14 100
set.seed(729)
train_ridge <- sample(1:nrow(x_ridge), nrow(x_ridge)*0.8 ) 

test_ridge <- (-train_ridge)

y.test_ridge <- y_ridge[test_ridge]
set.seed(9292)

ridge.mod <- glmnet(x_ridge[train_ridge, ], y_ridge[train_ridge], 
                    alpha = 0, lambda = grid, thresh = 1e-12)

ridge.pred <- predict(ridge.mod, s = 4, newx = x_ridge[test_ridge,])

mean((ridge.pred - y.test_ridge)^2) %>% kable()
x
0.2416376
set.seed(231)
ridge.pred <- predict(ridge.mod, s = 0, newx = x_ridge[test_ridge, ], 
                      exact = T, x = x_ridge[train_ridge, ], y = y_ridge[train_ridge])

predict(ridge.mod, s = 0, exact = T, type = "coefficients", 
        x = x_ridge[train_ridge, ], y = y_ridge[train_ridge])[1:14, ]
(Intercept)    Catholic       Islam    Orthodox    Buddhist 
 0.21024033  0.21827317 -0.01160454  0.21312966  0.35601806 
      Pagan    Tengrism      Shinto       Hindu   Shamanism 
 0.08955257  0.14069809  0.38278477 -0.07034364 -0.01038790 
      eigen       close          rc    eigen.rc 
-4.61480591 12.51011844 -0.29977861  4.64835194 
set.seed(9292)

cv.out <- cv.glmnet(x_ridge[train_ridge, ], y_ridge[train_ridge], alpha = 0) 

plot(cv.out)

set.seed(9292)

bestlam <- cv.out$lambda.min

bestlam
[1] 0.415338
set.seed(9292)

ridge.pred <- predict(cv.out, s = bestlam, newx = x_ridge[test_ridge,])

mean((ridge.pred - y.test_ridge)^2) %>% kable()
x
0.174632
set.seed(2897)

x_lasso <- model.matrix(more_win_or_loss ~ .-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression)[, -1] 

y_lasso <- total_1000s_for_regression$more_win_or_loss

grid <- 10^seq(10, -2, length = 100)

lasso.mod <- glmnet(x_lasso, y_lasso, alpha = 0, lambda = grid)

dim(coef(lasso.mod))
[1]  14 100
set.seed(729)

train_lasso <- sample(1:nrow(x_ridge), nrow(x_ridge)*0.8 ) 

test_lasso <- (-train_lasso)

y.test_lasso <- y_lasso[test_lasso]
set.seed(9292)

lasso.mod <- glmnet(x_lasso[train_lasso, ], y_lasso[train_lasso], 
                    alpha = 1, lambda = grid)

plot(lasso.mod)

set.seed(1029)

cv.out_2 <- cv.glmnet(x_lasso[train_lasso, ], y_lasso[train_lasso], alpha = 1) 

plot(cv.out_2)

set.seed(1920)

bestlam_2 <- cv.out_2$lambda.min

lasso.pred <- predict(cv.out_2, s = bestlam_2, newx = x_ridge[test_ridge,])

mean((lasso.pred - y.test_ridge)^2) %>% kable()
x
0.1749583
set.seed(2739)

out <- glmnet(x_lasso[train_lasso, ], y_lasso[train_lasso], 
              alpha = 1, lambda = grid)

lasso.coef <- predict(out, type = "coefficients", s = bestlam_2)[1:14, ]

lasso.coef
(Intercept)    Catholic       Islam    Orthodox    Buddhist 
 0.42561685  0.05577020 -0.09275344  0.00000000  0.00000000 
      Pagan    Tengrism      Shinto       Hindu   Shamanism 
 0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
      eigen       close          rc    eigen.rc 
 0.00000000  3.22570629 -0.21240622  0.00000000 
aspects_of_1100s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1100s_states.xlsx")

total_1100s <- merge(aspects_of_1100s_states, wars_in_1100s.nodes.stat_2, by="name")
aspects_of_1200s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1200s_states.xlsx")

total_1200s <- merge(aspects_of_1200s_states, wars_in_1200s.nodes.stat_2, by="name")

Community Grouping

Label Propagation 1000s:

The first community cluster below is done using label propagation. This results in 39 groups

set.seed(23)
comm.lab<-label.propagation.community(wars_in_1000s.ig)
#Inspect clustering object
# igraph::groups(comm.lab)

Walktrap 1000s:

Walktrap classification as seen below results in 19 distinct communities.

set.seed(238)
#Run clustering algorithm: fast_greedy
wars_in_1000s.wt<-walktrap.community(wars_in_1000s.ig)

#igraph::groups(wars_in_1000s.wt)

Adding more steps resulted in 19 groups for both 10 and 20 steps.

#Run & inspect clustering algorithm: 10 steps
#igraph::groups(walktrap.community(wars_in_1000s.ig, steps=10)) 
#Run & inspect clustering algorithm: 20 steps
#igraph::groups(walktrap.community(wars_in_1000s.ig ,steps=20))
#Run & inspect clustering algorithm

Machine Learning, Regression and Principle Components:

total_1000s_for_PCA <- total_1000s_brokerag_reg_binom[-c(20:27)]

apply(total_1000s_for_PCA[-1], 2, mean)
        Catholic            Islam         Orthodox         Buddhist 
     0.454545455      0.181818182      0.154545455      0.063636364 
           Pagan         Tengrism           Shinto            Hindu 
     0.036363636      0.018181818      0.054545455      0.045454545 
       Shamanism        totdegree         indegree        outdegree 
     0.009090909      2.754545455      1.336363636      1.418181818 
           eigen            close               rc         eigen.rc 
     0.028058711      0.023546832      0.287358773      0.003637773 
              dc         eigen.dc more_win_or_loss 
     0.712641227      0.024420939      0.481818182 
apply(total_1000s_for_PCA[-1], 2, var)
        Catholic            Islam         Orthodox         Buddhist 
    0.2502085071     0.1501251043     0.1318598832     0.0601334445 
           Pagan         Tengrism           Shinto            Hindu 
    0.0353628023     0.0180150125     0.0520433695     0.0437864887 
       Shamanism        totdegree         indegree        outdegree 
    0.0090909091     8.9208507089     2.6656380317     6.3189324437 
           eigen            close               rc         eigen.rc 
    0.0076304265     0.0019575460     0.1260782284     0.0004728954 
              dc         eigen.dc more_win_or_loss 
    0.1260782284     0.0056490031     0.2519599666 
pr.out <- prcomp(total_1000s_for_PCA[-1], scale = TRUE)
names(pr.out)
[1] "sdev"     "rotation" "center"   "scale"    "x"       
pr.out$center
        Catholic            Islam         Orthodox         Buddhist 
     0.454545455      0.181818182      0.154545455      0.063636364 
           Pagan         Tengrism           Shinto            Hindu 
     0.036363636      0.018181818      0.054545455      0.045454545 
       Shamanism        totdegree         indegree        outdegree 
     0.009090909      2.754545455      1.336363636      1.418181818 
           eigen            close               rc         eigen.rc 
     0.028058711      0.023546832      0.287358773      0.003637773 
              dc         eigen.dc more_win_or_loss 
     0.712641227      0.024420939      0.481818182 
pr.out$scale
        Catholic            Islam         Orthodox         Buddhist 
      0.50020846       0.38745981       0.36312516       0.24522122 
           Pagan         Tengrism           Shinto            Hindu 
      0.18805000       0.13422002       0.22813016       0.20925221 
       Shamanism        totdegree         indegree        outdegree 
      0.09534626       2.98677932       1.63267818       2.51374868 
           eigen            close               rc         eigen.rc 
      0.08735231       0.04424416       0.35507496       0.02174616 
              dc         eigen.dc more_win_or_loss 
      0.35507496       0.07515985       0.50195614 
biplot(pr.out, scale = 0)

pr.out$rotation = -pr.out$rotation 

pr.out$x = -pr.out$x

biplot(pr.out, scale = 0)

pr.out$sdev
 [1] 2.217501e+00 1.681548e+00 1.239242e+00 1.211199e+00 1.065982e+00
 [6] 1.037692e+00 1.029507e+00 1.011117e+00 1.005425e+00 9.514802e-01
[11] 8.848499e-01 7.782431e-01 6.162540e-01 4.426224e-01 2.541422e-01
[16] 1.091189e-01 7.597269e-16 6.258811e-16 2.174635e-16
pr.var <- pr.out$sdev^2

pr.var
 [1] 4.917311e+00 2.827605e+00 1.535720e+00 1.467004e+00 1.136318e+00
 [6] 1.076804e+00 1.059884e+00 1.022359e+00 1.010879e+00 9.053146e-01
[11] 7.829594e-01 6.056623e-01 3.797690e-01 1.959146e-01 6.458828e-02
[16] 1.190694e-02 5.771849e-31 3.917271e-31 4.729037e-32
pve <- pr.var / sum(pr.var)

pve
 [1] 2.588059e-01 1.488213e-01 8.082739e-02 7.721075e-02 5.980622e-02
 [6] 5.667390e-02 5.578337e-02 5.380835e-02 5.320417e-02 4.764814e-02
[11] 4.120839e-02 3.187696e-02 1.998784e-02 1.031129e-02 3.399383e-03
[16] 6.266808e-04 3.037815e-32 2.061722e-32 2.488967e-33
par(mfrow = c(1, 2))
plot(pve, xlab = "Principal Component",
ylab = "Proportion of Variance Explained", ylim = c(0, 1),
type = "b")

plot(cumsum(pve), xlab = "Principal Component",
ylab = "Cumulative Proportion of Variance Explained", ylim = c(0, 1), type = "b")

names(total_1200s)
 [1] "name"         "Catholic"     "Islam"        "Orthodox"    
 [5] "Buddhist"     "Pagan"        "Tengrism"     "Shinto"      
 [9] "Hindu"        "Shamanism"    "totdegree"    "indegree"    
[13] "outdegree"    "eigen"        "rc"           "eigen.rc"    
[17] "dc"           "eigen.dc"     "broker.tot"   "broker.coord"
[21] "broker.itin"  "broker.rep"   "broker.gate"  "broker.lia"  
total_1200s_brokerag_reg<-total_1200s
total_1200s_brokerag_reg$win_rate <- (total_1200s_brokerag_reg$outdegree/total_1200s_brokerag_reg$totdegree)
total_1200s_brokerag_reg$loss_rate <- (total_1200s_brokerag_reg$indegree/total_1200s_brokerag_reg$totdegree)
total_1200s_brokerag_reg_binom <- total_1200s_brokerag_reg %>% mutate(more_win_or_loss = case_when(
  win_rate < 0.5 ~ 0,
    win_rate >= 0.5 ~ 1))
total_1200s_for_PCA <- total_1200s_brokerag_reg_binom[-c(20:27)]


apply(total_1200s_for_PCA[-1], 2, mean)
   Catholic       Islam    Orthodox    Buddhist       Pagan 
0.714285714 0.068322981 0.086956522 0.086956522 0.012422360 
   Tengrism      Shinto       Hindu   Shamanism   totdegree 
0.024844720 0.000000000 0.006211180 0.000000000 3.900621118 
   indegree   outdegree       eigen          rc    eigen.rc 
1.956521739 1.944099379 0.025409148 0.158212220 0.002179127 
         dc    eigen.dc  broker.tot 
0.841787780 0.023230021 0.333968821 
apply(total_1200s_for_PCA[-1], 2, var)
    Catholic        Islam     Orthodox     Buddhist        Pagan 
2.053571e-01 6.405280e-02 7.989130e-02 7.989130e-02 1.234472e-02 
    Tengrism       Shinto        Hindu    Shamanism    totdegree 
2.437888e-02 0.000000e+00 6.211180e-03 0.000000e+00 2.655256e+01 
    indegree    outdegree        eigen           rc     eigen.rc 
6.204348e+00 1.587811e+01 5.600340e-03 7.101398e-02 7.273423e-05 
          dc     eigen.dc   broker.tot 
7.101398e-02 4.549154e-03 2.983411e+01 
# I cannot scale variables with 

total_1200s_for_PCA<-total_1200s_for_PCA[-c(8,10)]
pr.out_2 <- prcomp(total_1200s_for_PCA[-1], scale = TRUE)
names(pr.out_2)
[1] "sdev"     "rotation" "center"   "scale"    "x"       
pr.out_2$center
   Catholic       Islam    Orthodox    Buddhist       Pagan 
0.714285714 0.068322981 0.086956522 0.086956522 0.012422360 
   Tengrism       Hindu   totdegree    indegree   outdegree 
0.024844720 0.006211180 3.900621118 1.956521739 1.944099379 
      eigen          rc    eigen.rc          dc    eigen.dc 
0.025409148 0.158212220 0.002179127 0.841787780 0.023230021 
 broker.tot 
0.333968821 
pr.out_2$scale
   Catholic       Islam    Orthodox    Buddhist       Pagan 
0.453163484 0.253086537 0.282650499 0.282650499 0.111106798 
   Tengrism       Hindu   totdegree    indegree   outdegree 
0.156137382 0.078811041 5.152917825 2.490852831 3.984734068 
      eigen          rc    eigen.rc          dc    eigen.dc 
0.074835420 0.266484487 0.008528436 0.266484487 0.067447414 
 broker.tot 
5.462061318 
biplot(pr.out_2, scale = 0)

pr.out_2$rotation = -pr.out_2$rotation 

pr.out_2$x = -pr.out_2$x

biplot(pr.out_2, scale = 0)

pr.out$sdev
 [1] 2.217501e+00 1.681548e+00 1.239242e+00 1.211199e+00 1.065982e+00
 [6] 1.037692e+00 1.029507e+00 1.011117e+00 1.005425e+00 9.514802e-01
[11] 8.848499e-01 7.782431e-01 6.162540e-01 4.426224e-01 2.541422e-01
[16] 1.091189e-01 7.597269e-16 6.258811e-16 2.174635e-16
pr.var_2 <- pr.out_2$sdev^2

pr.var_2
 [1] 4.904685e+00 2.342837e+00 1.673460e+00 1.249477e+00 1.132262e+00
 [6] 1.097150e+00 1.011239e+00 9.459795e-01 8.660928e-01 5.136566e-01
[11] 1.660835e-01 9.707769e-02 1.227998e-30 9.288233e-31 4.378653e-31
[16] 1.136377e-31
pve_2 <- pr.var_2 / sum(pr.var_2)

pve_2
 [1] 3.065428e-01 1.464273e-01 1.045912e-01 7.809228e-02 7.076635e-02
 [6] 6.857190e-02 6.320245e-02 5.912372e-02 5.413080e-02 3.210354e-02
[11] 1.038022e-02 6.067356e-03 7.674985e-32 5.805145e-32 2.736658e-32
[16] 7.102355e-33
par(mfrow = c(1, 2))
plot(pve_2, xlab = "Principal Component",
ylab = "Proportion of Variance Explained", ylim = c(0, 1),
type = "b")

plot(cumsum(pve_2), xlab = "Principal Component",
ylab = "Cumulative Proportion of Variance Explained", ylim = c(0, 1), type = "b")

(information regarding the meaning of each type of brokerage was acquired from https://edis.ifas.ufl.edu/publication/WC197)

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Milstein (2022, March 27). Data Analytics and Computational Social Science: Blog Post 7, Integrating ML. Retrieved from https://nmilsteinuma.github.io/posts/2022-03-21-blog-post-7/

BibTeX citation

@misc{milstein2022blog,
  author = {Milstein, Noah},
  title = {Data Analytics and Computational Social Science: Blog Post 7, Integrating ML},
  url = {https://nmilsteinuma.github.io/posts/2022-03-21-blog-post-7/},
  year = {2022}
}