This post is an analysis of community structure and machine learning techniques on my medieval dataset.

The Dataset Being Used: The dataset that I am using is wikipedia list of wars throughout history, this article is the “List of wars: 1000–1499” which acts as a subset of the “2nd-millennium conflicts” I chose this dataset as an exemplar of popular history’s depiction of the centralization of worldwide conflict. Wikipedia, being an accessible source generally created from relevant citations makes it a good case study to see where historical writers and academics center their world are relevant conflicts.
Answer: The initial network format is as an edge list, the first, in column contains the winners of each war while the second, out column contains the losers of each. These sets of belligerents are directed
 Network attributes:
  vertices = 111 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 153 
    missing edges= 0 
    non-missing edges= 153 
 Vertex attribute names: 
    vertex.names 
No edge attributes Network attributes:
  vertices = 97 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 238 
    missing edges= 0 
    non-missing edges= 238 
 Vertex attribute names: 
    vertex.names 
No edge attributes Network attributes:
  vertices = 161 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = TRUE 
  bipartite = FALSE 
  total edges= 313 
    missing edges= 0 
    non-missing edges= 313 
 Vertex attribute names: 
    vertex.names 
No edge attributesIdentify Nodes: Describe and identify the nodes (including how many nodes are in the dataset)
Answer: Nodes or vertices in these datasets represent belligerents in wars throughout history, the involved parties in each conflict can be a nation, province, individual, or group so long as they are listed as involved in the conflict. In the 1000s there are 117, in the 1100s there are 78 and in the 1200s there are 161.
What Constitutes a Tie: What constitutes a tie or edge (including how many ties, whether ties are directed/undirected and weighted/binary, and how to interpret the value of the tie if any)
Answer: A tie or edge in this dataset represents a war, this war can be between two nations or groups within a nation. These edges can represent a war that involved many more nations but are always tied to each and every party involved on both sides. These edges are directed and the direction indicates which side “won” the conflict (if an edge has an arrow pointing to another the node that originated that arrow won the war against them. There are 153 edges in the 1000s, 225 edges in 1100s and 313 edges in the 1200s.
Edge Attributes and Subset: Whether or not there are edge attributes that might be used to subset data or stack multiple networks (e.g., tie type, year, etc).
Answer: There are a number of attributes that could be used to subset the data, year that the conflict began or the length of time it lasted are available. Aspects like each side’s religion and the area where the conflict took place could be used to subset the data itself.
What are betweeness and brokerage cenrrality Calculate brokerage and betweenneess centrality measures for one or more subsets of your network data, and write up the results and your interpretation of them.
Answer: I will be calculating these measures for wars in 1000-1099, 1100-1199, and 1200-1399.
(wars_in_1000s.nodes.stat_2%>%
  arrange(desc(broker.tot))%>%
  slice(1:10))[,c(1,11:15)] %>%kable()
| name | broker.tot | broker.coord | broker.itin | broker.rep | broker.gate | 
|---|---|---|---|---|---|
| Byzantine Empire | 22.7376579 | NaN | 3.1654785 | NaN | NaN | 
| Holy Roman Empire | 9.2813605 | NaN | 2.2468427 | NaN | NaN | 
| Sultanate of Rum | 9.2813605 | NaN | -0.5090648 | NaN | NaN | 
| England | 6.9745666 | NaN | 5.0036896 | -0.0853606 | -0.0853606 | 
| Kingdom of Sicily | 5.0522384 | -0.0176111 | 4.0866123 | -0.1201631 | -0.1201631 | 
| Seljuk Empire | 1.9765133 | -0.0176111 | -0.5084146 | 3.4677529 | -0.1201631 | 
| Kingdom of France | 1.9765133 | NaN | -0.5090648 | NaN | NaN | 
| Kingdom of Georgia | 0.8231164 | -0.0176111 | -0.5084146 | -0.1201631 | -0.1201631 | 
| Papal States | 0.4386507 | -0.0176111 | -0.5084146 | -0.1201631 | 10.6435850 | 
| Ghaznavids | 0.0541851 | -0.1380791 | -0.4907567 | -0.2903366 | -0.2903366 | 
(wars_in_1100s.nodes.stat_2%>%
  arrange(desc(broker.tot))%>%
  slice(1:10))[,c(1,10:14)] %>%kable()
| name | broker.tot | broker.coord | broker.itin | broker.rep | broker.gate | 
|---|---|---|---|---|---|
| Kingdom of Jerusalem | 17.1050061 | NaN | 2.8705599 | 24.5610650 | -0.1357675 | 
| Fatimid Caliphate | 10.2415178 | NaN | -0.6472506 | NaN | NaN | 
| Ayyubid Dynasty | 9.3615834 | NaN | -0.6465587 | -0.1357675 | -0.1357675 | 
| Zengid Dynasty | 7.4257278 | NaN | 0.7591543 | NaN | NaN | 
| Byzantine Empire | 6.8977671 | NaN | 0.7602887 | -0.1357675 | -0.1357675 | 
| England | 5.8418459 | NaN | -0.6465587 | -0.1357675 | -0.1357675 | 
| Holy Roman Empire | 3.0260558 | NaN | -0.6465587 | -0.1357675 | -0.1357675 | 
| Kingdom of France | 1.6181608 | NaN | -0.6465587 | -0.1357675 | -0.1357675 | 
| Kingdom of Sicily | 0.5622395 | -0.1467125 | -0.6293842 | -0.3476788 | -0.3476788 | 
| Papal States | 0.0342789 | -0.1264908 | -0.6336748 | -0.3236913 | 2.5014147 | 
| name | broker.tot | broker.coord | broker.itin | broker.rep | broker.gate | 
|---|---|---|---|---|---|
| Mongol Empire | 47.964825 | NaN | -0.5966483 | NaN | NaN | 
| Kingdom of France | 28.663539 | NaN | -0.5966483 | NaN | NaN | 
| Ayyubid Dynasty | 26.995527 | NaN | 2.3528915 | NaN | NaN | 
| Kingdom of England | 21.991489 | NaN | 8.9893561 | NaN | NaN | 
| Republic of Genoa | 11.983415 | NaN | -0.5966483 | NaN | NaN | 
| Knights Templar | 10.077115 | NaN | 1.6155066 | NaN | NaN | 
| Holy Roman Empire | 4.834790 | -0.0170801 | -0.5961482 | 10.865523 | 10.865523 | 
| Principality of Antioch | 4.834790 | -0.0170801 | 2.3541101 | 13.613565 | -0.126648 | 
| Kingdom of Cyprus | 4.596503 | 58.5391124 | 0.1414163 | 13.613565 | 10.865523 | 
| Armenian Kingdom of Cilicia | 3.881640 | -0.0170801 | -0.5961482 | -0.126648 | -0.126648 | 
| name | broker.gate | 
|---|---|
| Papal States | 10.6435850 | 
| County of Aversa | -0.0853606 | 
| County of Sicily | -0.0853606 | 
| England | -0.0853606 | 
| Chola Empire | -0.0853606 | 
| County of Apulia | -0.1201631 | 
| Kingdom of Sicily | -0.1201631 | 
| Kingdom of Georgia | -0.1201631 | 
| Great Seljuq Empire | -0.1201631 | 
| Seljuk Empire | -0.1201631 | 
| name | broker.tot | 
|---|---|
| Byzantine Empire | 22.7376579 | 
| Holy Roman Empire | 9.2813605 | 
| Sultanate of Rum | 9.2813605 | 
| England | 6.9745666 | 
| Kingdom of Sicily | 5.0522384 | 
| Seljuk Empire | 1.9765133 | 
| Kingdom of France | 1.9765133 | 
| Kingdom of Georgia | 0.8231164 | 
| Papal States | 0.4386507 | 
| Ghaznavids | 0.0541851 | 
For a Specific Research Question: If you have a specific research question, please feel free to use that to guide your analysis. Otherwise, you may want to orient your analysis as follows in order to identify a compelling question or noteworthy pattern in the data that can be interpreted.
Answer: Since I am interested in the relative power of nations by their relative position ad centrality in the worldwide conflict, network brokerage can be used to illustrate significant positions in global conflict. Below I wanted to look at 4 kinds of brokerage, these are broker.gate or gatekeeper, coordinator, liason, and itinerant. I am interested to see if these specific coordination types are primarily done by specific nations.



Explanation: Looking at total brokerage in this dataset gives a sense of which factions were responsible for highest connection of unconnected actors through conflict. Given the crusades igniting conflict between Europe and the middle east it is sensible that the Byzantine Empire in the center of both connects the most unconnected actors through conflict closely followed by the Sultanate of Rum, a major Muslim faction that fought against the crusades and third being the Holy Roman Empire who participated in many conflicts including the crusades. These are followed by England who centered the wars in the British isles and the Kingdom of Sicily who were also in a position of conflict.
| name | broker.tot | 
|---|---|
| Byzantine Empire | 22.737658 | 
| Holy Roman Empire | 9.281360 | 
| Sultanate of Rum | 9.281360 | 
| England | 6.974567 | 
| Kingdom of Sicily | 5.052238 | 
Explanation: In this case no particular country is very high above any other in terms of their coordinator brokerage, meaning that within groups no particular nations appear to be brokering more within the groups.
| name | broker.coord | 
|---|---|
| County of Apulia | -0.0176111 | 
| Kingdom of Sicily | -0.0176111 | 
| Kingdom of Georgia | -0.0176111 | 
| Great Seljuq Empire | -0.0176111 | 
| Papal States | -0.0176111 | 
Explanation: Itinerant brokerage represents when a non-group actor connects 2 actors in a group it is no in to each other, in this case England has the highest score. Looking at the network graph they do appear to connect 2 actors in a group together.
| name | broker.itin | 
|---|---|
| England | 5.0036896 | 
| Kingdom of Sicily | 4.0866123 | 
| Byzantine Empire | 3.1654785 | 
| Holy Roman Empire | 2.2468427 | 
| Principality of Kiev | 0.4812412 | 
Explanation: Representative brokerage indicates that the broker, or nation in question loses a war to another in their group, but wins another against a faction outside of their group. This can be though of as their directed connections to them. In this case the Seljuk Empire and Kingdom of Aragon have instances in which they lose to factions within their group before beating those outside of it.
| name | broker.rep | 
|---|---|
| Seljuk Empire | 3.4677529 | 
| Kingdom of Aragon | 0.9281821 | 
| County of Aversa | -0.0853606 | 
| County of Sicily | -0.0853606 | 
| England | -0.0853606 | 
Explanation: The Papal states being ranked highest in gatekeeper brokerage is an interesting observation as no other nation in the dataset appears to be close to their level as most are negative in this category. In this cae being a gatekeeper means that they are in at conflict in a group with another while the nation in a different group of conflicts is only at war with them from the group. This is an interesting observation given the Papal states role as a coordinator of the war, but not a participant in the conflcit as directly as other belligerents. (This being the crusade given the period)
| name | broker.gate | 
|---|---|
| Papal States | 10.6435850 | 
| County of Aversa | -0.0853606 | 
| County of Sicily | -0.0853606 | 
| England | -0.0853606 | 
| Chola Empire | -0.0853606 | 
Explanation: A liaison broker, in this case, is a faction that loses a war to a group they do not belong to and wins a war against a different group than the first that they also do not belong to. The Byzantine Empire, Sultanate of Rum, and Holy Roman Empire are highest in this category likely owing to their frequent states of conflict beyond the crusades against a variety of groups.
| name | broker.lia | 
|---|---|
| Byzantine Empire | 28.140866 | 
| Sultanate of Rum | 12.477603 | 
| Holy Roman Empire | 10.961803 | 
| England | 6.548214 | 
| Kingdom of Sicily | 4.589419 | 
























wars_in_1000s_edgelist <- as.matrix(wars_in_1000s)
wars_in_1000s_edgelist_network_edgelist <- graph.edgelist(wars_in_1000s_edgelist, directed=TRUE)
wars_in_1000s.ig<-graph_from_data_frame(wars_in_1000s)
wars_in_1000s_network <- asNetwork(wars_in_1000s.ig)
aspects_of_1000s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1000s_states.xlsx")
total_1000s <- merge(aspects_of_1000s_states, wars_in_1000s.nodes.stat_2, by="name")
total_1000s_brokerag_reg<-total_1000s
total_1000s_brokerag_reg$win_rate <- (total_1000s_brokerag_reg$outdegree/total_1000s_brokerag_reg$totdegree)
total_1000s_brokerag_reg$loss_rate <- (total_1000s_brokerag_reg$indegree/total_1000s_brokerag_reg$totdegree)
total_1000s_brokerag_reg_binom <- total_1000s_brokerag_reg %>% mutate(more_win_or_loss = case_when(
  win_rate < 0.5 ~ 0,
    win_rate >= 0.5 ~ 1))
First_1000s_regression <- glm(more_win_or_loss~.-name-totdegree-indegree-outdegree-dc-eigen.dc-win_rate-loss_rate, total_1000s_brokerag_reg_binom, family=binomial)
First_1000s_regression
Call:  glm(formula = more_win_or_loss ~ . - name - totdegree - indegree - 
    outdegree - dc - eigen.dc - win_rate - loss_rate, family = binomial, 
    data = total_1000s_brokerag_reg_binom)
Coefficients:
 (Intercept)      Catholic         Islam      Orthodox      Buddhist  
  -2.090e+01     1.446e-01    -7.108e-02    -4.043e-01    -8.572e-02  
       Pagan      Tengrism        Shinto         Hindu     Shamanism  
   5.506e-01    -5.656e+01     1.820e+00    -2.142e+00    -1.506e+00  
       eigen         close            rc      eigen.rc    broker.tot  
  -1.877e+03     5.146e+03    -3.979e+00     1.574e+03     2.378e+02  
broker.coord   broker.itin    broker.rep   broker.gate    broker.lia  
  -9.610e+01    -9.449e+01    -7.164e+01    -2.810e+01    -1.298e+02  
Degrees of Freedom: 101 Total (i.e. Null);  82 Residual
  (8 observations deleted due to missingness)
Null Deviance:      140.8 
Residual Deviance: 4.53e-09     AIC: 40set.seed(292)
total_1000s_for_regression <- total_1000s[,-c(1, 20:25)]
total_1000s_for_regression$win_rate <- (total_1000s_for_regression$outdegree/total_1000s_for_regression$totdegree)
total_1000s_for_regression$loss_rate <- (total_1000s_for_regression$indegree/total_1000s_for_regression$totdegree)
total_1000s_for_regression <- total_1000s_for_regression %>% mutate(more_win_or_loss = case_when(
  win_rate < 0.5 ~ 0,
    win_rate >= 0.5 ~ 1))
First_1000s_regression <- glm(more_win_or_loss~.-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression, family=binomial)
First_1000s_regression
Call:  glm(formula = more_win_or_loss ~ . - loss_rate - win_rate - totdegree - 
    indegree - outdegree - dc - eigen.dc, family = binomial, 
    data = total_1000s_for_regression)
Coefficients:
(Intercept)     Catholic        Islam     Orthodox     Buddhist  
   -15.1948      13.9008      12.7531      14.6893      15.0858  
      Pagan     Tengrism       Shinto        Hindu    Shamanism  
     0.9610      11.6691      16.0623       9.1358      -0.1497  
      eigen        close           rc     eigen.rc  
   -82.1100     256.5294      -3.3322     -17.3152  
Degrees of Freedom: 109 Total (i.e. Null);  96 Residual
Null Deviance:      152.3 
Residual Deviance: 58.4     AIC: 86.4set.seed(6738)
in_training<- sample(1:nrow(total_1000s_for_regression),  nrow(total_1000s_for_regression) * 0.7 )
training_1000s <- total_1000s_for_regression[in_training,]
test_1000s <- total_1000s_for_regression[-in_training,]
lm_1000s_binom_subset_1 <- glm(more_win_or_loss~.-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression, family=binomial, subset = in_training )
logsitic_1_1000s_prob <- predict(lm_1000s_binom_subset_1, test_1000s,
type = "response")
log_preds_1<-ifelse(logsitic_1_1000s_prob >= 0.5, 1, 0)
prediction_1_logs <-mean(log_preds_1 == test_1000s$more_win_or_loss)
prediction_1_logs %>% kable()
| x | 
|---|
| 0.9090909 | 
set.seed(246)
x_ridge <- model.matrix(more_win_or_loss ~ .-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression)[, -1] 
y_ridge <- total_1000s_for_regression$more_win_or_loss
grid <- 10^seq(10, -2, length = 100)
ridge.mod <- glmnet(x_ridge, y_ridge, alpha = 0, lambda = grid)
dim(coef(ridge.mod))
[1]  14 100set.seed(9292)
ridge.mod <- glmnet(x_ridge[train_ridge, ], y_ridge[train_ridge], 
                    alpha = 0, lambda = grid, thresh = 1e-12)
ridge.pred <- predict(ridge.mod, s = 4, newx = x_ridge[test_ridge,])
mean((ridge.pred - y.test_ridge)^2) %>% kable()
| x | 
|---|
| 0.2416376 | 
set.seed(231)
ridge.pred <- predict(ridge.mod, s = 0, newx = x_ridge[test_ridge, ], 
                      exact = T, x = x_ridge[train_ridge, ], y = y_ridge[train_ridge])
predict(ridge.mod, s = 0, exact = T, type = "coefficients", 
        x = x_ridge[train_ridge, ], y = y_ridge[train_ridge])[1:14, ]
(Intercept)    Catholic       Islam    Orthodox    Buddhist 
 0.21024033  0.21827317 -0.01160454  0.21312966  0.35601806 
      Pagan    Tengrism      Shinto       Hindu   Shamanism 
 0.08955257  0.14069809  0.38278477 -0.07034364 -0.01038790 
      eigen       close          rc    eigen.rc 
-4.61480591 12.51011844 -0.29977861  4.64835194 set.seed(9292)
cv.out <- cv.glmnet(x_ridge[train_ridge, ], y_ridge[train_ridge], alpha = 0) 
plot(cv.out)

set.seed(9292)
bestlam <- cv.out$lambda.min
bestlam
[1] 0.415338set.seed(9292)
ridge.pred <- predict(cv.out, s = bestlam, newx = x_ridge[test_ridge,])
mean((ridge.pred - y.test_ridge)^2) %>% kable()
| x | 
|---|
| 0.174632 | 
set.seed(2897)
x_lasso <- model.matrix(more_win_or_loss ~ .-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression)[, -1] 
y_lasso <- total_1000s_for_regression$more_win_or_loss
grid <- 10^seq(10, -2, length = 100)
lasso.mod <- glmnet(x_lasso, y_lasso, alpha = 0, lambda = grid)
dim(coef(lasso.mod))
[1]  14 100set.seed(9292)
lasso.mod <- glmnet(x_lasso[train_lasso, ], y_lasso[train_lasso], 
                    alpha = 1, lambda = grid)
plot(lasso.mod)

set.seed(1029)
cv.out_2 <- cv.glmnet(x_lasso[train_lasso, ], y_lasso[train_lasso], alpha = 1) 
plot(cv.out_2)

set.seed(1920)
bestlam_2 <- cv.out_2$lambda.min
lasso.pred <- predict(cv.out_2, s = bestlam_2, newx = x_ridge[test_ridge,])
mean((lasso.pred - y.test_ridge)^2) %>% kable()
| x | 
|---|
| 0.1749583 | 
set.seed(2739)
out <- glmnet(x_lasso[train_lasso, ], y_lasso[train_lasso], 
              alpha = 1, lambda = grid)
lasso.coef <- predict(out, type = "coefficients", s = bestlam_2)[1:14, ]
lasso.coef
(Intercept)    Catholic       Islam    Orthodox    Buddhist 
 0.42561685  0.05577020 -0.09275344  0.00000000  0.00000000 
      Pagan    Tengrism      Shinto       Hindu   Shamanism 
 0.00000000  0.00000000  0.00000000  0.00000000  0.00000000 
      eigen       close          rc    eigen.rc 
 0.00000000  3.22570629 -0.21240622  0.00000000 aspects_of_1100s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1100s_states.xlsx")
total_1100s <- merge(aspects_of_1100s_states, wars_in_1100s.nodes.stat_2, by="name")
aspects_of_1200s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1200s_states.xlsx")
total_1200s <- merge(aspects_of_1200s_states, wars_in_1200s.nodes.stat_2, by="name")
The first community cluster below is done using label propagation. This results in 39 groups
set.seed(23)
comm.lab<-label.propagation.community(wars_in_1000s.ig)
#Inspect clustering object
# igraph::groups(comm.lab)

Walktrap classification as seen below results in 19 distinct communities.
set.seed(238)
#Run clustering algorithm: fast_greedy
wars_in_1000s.wt<-walktrap.community(wars_in_1000s.ig)
#igraph::groups(wars_in_1000s.wt)
Adding more steps resulted in 19 groups for both 10 and 20 steps.
#Run & inspect clustering algorithm: 10 steps
#igraph::groups(walktrap.community(wars_in_1000s.ig, steps=10)) 
#Run & inspect clustering algorithm: 20 steps
#igraph::groups(walktrap.community(wars_in_1000s.ig ,steps=20))
#Run & inspect clustering algorithm

total_1000s_for_PCA <- total_1000s_brokerag_reg_binom[-c(20:27)]
apply(total_1000s_for_PCA[-1], 2, mean)
        Catholic            Islam         Orthodox         Buddhist 
     0.454545455      0.181818182      0.154545455      0.063636364 
           Pagan         Tengrism           Shinto            Hindu 
     0.036363636      0.018181818      0.054545455      0.045454545 
       Shamanism        totdegree         indegree        outdegree 
     0.009090909      2.754545455      1.336363636      1.418181818 
           eigen            close               rc         eigen.rc 
     0.028058711      0.023546832      0.287358773      0.003637773 
              dc         eigen.dc more_win_or_loss 
     0.712641227      0.024420939      0.481818182 apply(total_1000s_for_PCA[-1], 2, var)
        Catholic            Islam         Orthodox         Buddhist 
    0.2502085071     0.1501251043     0.1318598832     0.0601334445 
           Pagan         Tengrism           Shinto            Hindu 
    0.0353628023     0.0180150125     0.0520433695     0.0437864887 
       Shamanism        totdegree         indegree        outdegree 
    0.0090909091     8.9208507089     2.6656380317     6.3189324437 
           eigen            close               rc         eigen.rc 
    0.0076304265     0.0019575460     0.1260782284     0.0004728954 
              dc         eigen.dc more_win_or_loss 
    0.1260782284     0.0056490031     0.2519599666 pr.out <- prcomp(total_1000s_for_PCA[-1], scale = TRUE)
names(pr.out)
[1] "sdev"     "rotation" "center"   "scale"    "x"       pr.out$center
        Catholic            Islam         Orthodox         Buddhist 
     0.454545455      0.181818182      0.154545455      0.063636364 
           Pagan         Tengrism           Shinto            Hindu 
     0.036363636      0.018181818      0.054545455      0.045454545 
       Shamanism        totdegree         indegree        outdegree 
     0.009090909      2.754545455      1.336363636      1.418181818 
           eigen            close               rc         eigen.rc 
     0.028058711      0.023546832      0.287358773      0.003637773 
              dc         eigen.dc more_win_or_loss 
     0.712641227      0.024420939      0.481818182 pr.out$scale
        Catholic            Islam         Orthodox         Buddhist 
      0.50020846       0.38745981       0.36312516       0.24522122 
           Pagan         Tengrism           Shinto            Hindu 
      0.18805000       0.13422002       0.22813016       0.20925221 
       Shamanism        totdegree         indegree        outdegree 
      0.09534626       2.98677932       1.63267818       2.51374868 
           eigen            close               rc         eigen.rc 
      0.08735231       0.04424416       0.35507496       0.02174616 
              dc         eigen.dc more_win_or_loss 
      0.35507496       0.07515985       0.50195614 biplot(pr.out, scale = 0)

pr.out$rotation = -pr.out$rotation 
pr.out$x = -pr.out$x
biplot(pr.out, scale = 0)

pr.out$sdev
 [1] 2.217501e+00 1.681548e+00 1.239242e+00 1.211199e+00 1.065982e+00
 [6] 1.037692e+00 1.029507e+00 1.011117e+00 1.005425e+00 9.514802e-01
[11] 8.848499e-01 7.782431e-01 6.162540e-01 4.426224e-01 2.541422e-01
[16] 1.091189e-01 7.597269e-16 6.258811e-16 2.174635e-16pr.var <- pr.out$sdev^2
pr.var
 [1] 4.917311e+00 2.827605e+00 1.535720e+00 1.467004e+00 1.136318e+00
 [6] 1.076804e+00 1.059884e+00 1.022359e+00 1.010879e+00 9.053146e-01
[11] 7.829594e-01 6.056623e-01 3.797690e-01 1.959146e-01 6.458828e-02
[16] 1.190694e-02 5.771849e-31 3.917271e-31 4.729037e-32pve <- pr.var / sum(pr.var)
pve
 [1] 2.588059e-01 1.488213e-01 8.082739e-02 7.721075e-02 5.980622e-02
 [6] 5.667390e-02 5.578337e-02 5.380835e-02 5.320417e-02 4.764814e-02
[11] 4.120839e-02 3.187696e-02 1.998784e-02 1.031129e-02 3.399383e-03
[16] 6.266808e-04 3.037815e-32 2.061722e-32 2.488967e-33par(mfrow = c(1, 2))
plot(pve, xlab = "Principal Component",
ylab = "Proportion of Variance Explained", ylim = c(0, 1),
type = "b")
plot(cumsum(pve), xlab = "Principal Component",
ylab = "Cumulative Proportion of Variance Explained", ylim = c(0, 1), type = "b")

names(total_1200s)
 [1] "name"         "Catholic"     "Islam"        "Orthodox"    
 [5] "Buddhist"     "Pagan"        "Tengrism"     "Shinto"      
 [9] "Hindu"        "Shamanism"    "totdegree"    "indegree"    
[13] "outdegree"    "eigen"        "rc"           "eigen.rc"    
[17] "dc"           "eigen.dc"     "broker.tot"   "broker.coord"
[21] "broker.itin"  "broker.rep"   "broker.gate"  "broker.lia"  total_1200s_brokerag_reg<-total_1200s
total_1200s_brokerag_reg$win_rate <- (total_1200s_brokerag_reg$outdegree/total_1200s_brokerag_reg$totdegree)
total_1200s_brokerag_reg$loss_rate <- (total_1200s_brokerag_reg$indegree/total_1200s_brokerag_reg$totdegree)
total_1200s_brokerag_reg_binom <- total_1200s_brokerag_reg %>% mutate(more_win_or_loss = case_when(
  win_rate < 0.5 ~ 0,
    win_rate >= 0.5 ~ 1))
total_1200s_for_PCA <- total_1200s_brokerag_reg_binom[-c(20:27)]
apply(total_1200s_for_PCA[-1], 2, mean)
   Catholic       Islam    Orthodox    Buddhist       Pagan 
0.714285714 0.068322981 0.086956522 0.086956522 0.012422360 
   Tengrism      Shinto       Hindu   Shamanism   totdegree 
0.024844720 0.000000000 0.006211180 0.000000000 3.900621118 
   indegree   outdegree       eigen          rc    eigen.rc 
1.956521739 1.944099379 0.025409148 0.158212220 0.002179127 
         dc    eigen.dc  broker.tot 
0.841787780 0.023230021 0.333968821 apply(total_1200s_for_PCA[-1], 2, var)
    Catholic        Islam     Orthodox     Buddhist        Pagan 
2.053571e-01 6.405280e-02 7.989130e-02 7.989130e-02 1.234472e-02 
    Tengrism       Shinto        Hindu    Shamanism    totdegree 
2.437888e-02 0.000000e+00 6.211180e-03 0.000000e+00 2.655256e+01 
    indegree    outdegree        eigen           rc     eigen.rc 
6.204348e+00 1.587811e+01 5.600340e-03 7.101398e-02 7.273423e-05 
          dc     eigen.dc   broker.tot 
7.101398e-02 4.549154e-03 2.983411e+01 # I cannot scale variables with 
total_1200s_for_PCA<-total_1200s_for_PCA[-c(8,10)]
pr.out_2 <- prcomp(total_1200s_for_PCA[-1], scale = TRUE)
names(pr.out_2)
[1] "sdev"     "rotation" "center"   "scale"    "x"       pr.out_2$center
   Catholic       Islam    Orthodox    Buddhist       Pagan 
0.714285714 0.068322981 0.086956522 0.086956522 0.012422360 
   Tengrism       Hindu   totdegree    indegree   outdegree 
0.024844720 0.006211180 3.900621118 1.956521739 1.944099379 
      eigen          rc    eigen.rc          dc    eigen.dc 
0.025409148 0.158212220 0.002179127 0.841787780 0.023230021 
 broker.tot 
0.333968821 pr.out_2$scale
   Catholic       Islam    Orthodox    Buddhist       Pagan 
0.453163484 0.253086537 0.282650499 0.282650499 0.111106798 
   Tengrism       Hindu   totdegree    indegree   outdegree 
0.156137382 0.078811041 5.152917825 2.490852831 3.984734068 
      eigen          rc    eigen.rc          dc    eigen.dc 
0.074835420 0.266484487 0.008528436 0.266484487 0.067447414 
 broker.tot 
5.462061318 biplot(pr.out_2, scale = 0)

pr.out_2$rotation = -pr.out_2$rotation 
pr.out_2$x = -pr.out_2$x
biplot(pr.out_2, scale = 0)

pr.out$sdev
 [1] 2.217501e+00 1.681548e+00 1.239242e+00 1.211199e+00 1.065982e+00
 [6] 1.037692e+00 1.029507e+00 1.011117e+00 1.005425e+00 9.514802e-01
[11] 8.848499e-01 7.782431e-01 6.162540e-01 4.426224e-01 2.541422e-01
[16] 1.091189e-01 7.597269e-16 6.258811e-16 2.174635e-16pr.var_2 <- pr.out_2$sdev^2
pr.var_2
 [1] 4.904685e+00 2.342837e+00 1.673460e+00 1.249477e+00 1.132262e+00
 [6] 1.097150e+00 1.011239e+00 9.459795e-01 8.660928e-01 5.136566e-01
[11] 1.660835e-01 9.707769e-02 1.227998e-30 9.288233e-31 4.378653e-31
[16] 1.136377e-31pve_2 <- pr.var_2 / sum(pr.var_2)
pve_2
 [1] 3.065428e-01 1.464273e-01 1.045912e-01 7.809228e-02 7.076635e-02
 [6] 6.857190e-02 6.320245e-02 5.912372e-02 5.413080e-02 3.210354e-02
[11] 1.038022e-02 6.067356e-03 7.674985e-32 5.805145e-32 2.736658e-32
[16] 7.102355e-33par(mfrow = c(1, 2))
plot(pve_2, xlab = "Principal Component",
ylab = "Proportion of Variance Explained", ylim = c(0, 1),
type = "b")
plot(cumsum(pve_2), xlab = "Principal Component",
ylab = "Cumulative Proportion of Variance Explained", ylim = c(0, 1), type = "b")

(information regarding the meaning of each type of brokerage was acquired from https://edis.ifas.ufl.edu/publication/WC197)
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Milstein (2022, March 27). Data Analytics and Computational Social Science: Blog Post 7, Integrating ML. Retrieved from https://nmilsteinuma.github.io/posts/2022-03-21-blog-post-7/
BibTeX citation
@misc{milstein2022blog,
  author = {Milstein, Noah},
  title = {Data Analytics and Computational Social Science: Blog Post 7, Integrating ML},
  url = {https://nmilsteinuma.github.io/posts/2022-03-21-blog-post-7/},
  year = {2022}
}