This post is an analysis of community structure and machine learning techniques on my medieval dataset.
The Dataset Being Used: The dataset that I am using is wikipedia list of wars throughout history, this article is the “List of wars: 1000–1499” which acts as a subset of the “2nd-millennium conflicts” I chose this dataset as an exemplar of popular history’s depiction of the centralization of worldwide conflict. Wikipedia, being an accessible source generally created from relevant citations makes it a good case study to see where historical writers and academics center their world are relevant conflicts.
Answer: The initial network format is as an edge list, the first, in column contains the winners of each war while the second, out column contains the losers of each. These sets of belligerents are directed
Network attributes:
vertices = 111
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = TRUE
bipartite = FALSE
total edges= 153
missing edges= 0
non-missing edges= 153
Vertex attribute names:
vertex.names
No edge attributes
Network attributes:
vertices = 97
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = TRUE
bipartite = FALSE
total edges= 238
missing edges= 0
non-missing edges= 238
Vertex attribute names:
vertex.names
No edge attributes
Network attributes:
vertices = 161
directed = TRUE
hyper = FALSE
loops = FALSE
multiple = TRUE
bipartite = FALSE
total edges= 313
missing edges= 0
non-missing edges= 313
Vertex attribute names:
vertex.names
No edge attributes
Identify Nodes: Describe and identify the nodes (including how many nodes are in the dataset)
Answer: Nodes or vertices in these datasets represent belligerents in wars throughout history, the involved parties in each conflict can be a nation, province, individual, or group so long as they are listed as involved in the conflict. In the 1000s there are 117, in the 1100s there are 78 and in the 1200s there are 161.
What Constitutes a Tie: What constitutes a tie or edge (including how many ties, whether ties are directed/undirected and weighted/binary, and how to interpret the value of the tie if any)
Answer: A tie or edge in this dataset represents a war, this war can be between two nations or groups within a nation. These edges can represent a war that involved many more nations but are always tied to each and every party involved on both sides. These edges are directed and the direction indicates which side “won” the conflict (if an edge has an arrow pointing to another the node that originated that arrow won the war against them. There are 153 edges in the 1000s, 225 edges in 1100s and 313 edges in the 1200s.
Edge Attributes and Subset: Whether or not there are edge attributes that might be used to subset data or stack multiple networks (e.g., tie type, year, etc).
Answer: There are a number of attributes that could be used to subset the data, year that the conflict began or the length of time it lasted are available. Aspects like each side’s religion and the area where the conflict took place could be used to subset the data itself.
What are betweeness and brokerage cenrrality Calculate brokerage and betweenneess centrality measures for one or more subsets of your network data, and write up the results and your interpretation of them.
Answer: I will be calculating these measures for wars in 1000-1099, 1100-1199, and 1200-1399.
(wars_in_1000s.nodes.stat_2%>%
arrange(desc(broker.tot))%>%
slice(1:10))[,c(1,11:15)] %>%kable()
name | broker.tot | broker.coord | broker.itin | broker.rep | broker.gate |
---|---|---|---|---|---|
Byzantine Empire | 22.7376579 | NaN | 3.1654785 | NaN | NaN |
Holy Roman Empire | 9.2813605 | NaN | 2.2468427 | NaN | NaN |
Sultanate of Rum | 9.2813605 | NaN | -0.5090648 | NaN | NaN |
England | 6.9745666 | NaN | 5.0036896 | -0.0853606 | -0.0853606 |
Kingdom of Sicily | 5.0522384 | -0.0176111 | 4.0866123 | -0.1201631 | -0.1201631 |
Seljuk Empire | 1.9765133 | -0.0176111 | -0.5084146 | 3.4677529 | -0.1201631 |
Kingdom of France | 1.9765133 | NaN | -0.5090648 | NaN | NaN |
Kingdom of Georgia | 0.8231164 | -0.0176111 | -0.5084146 | -0.1201631 | -0.1201631 |
Papal States | 0.4386507 | -0.0176111 | -0.5084146 | -0.1201631 | 10.6435850 |
Ghaznavids | 0.0541851 | -0.1380791 | -0.4907567 | -0.2903366 | -0.2903366 |
(wars_in_1100s.nodes.stat_2%>%
arrange(desc(broker.tot))%>%
slice(1:10))[,c(1,10:14)] %>%kable()
name | broker.tot | broker.coord | broker.itin | broker.rep | broker.gate |
---|---|---|---|---|---|
Kingdom of Jerusalem | 17.1050061 | NaN | 2.8705599 | 24.5610650 | -0.1357675 |
Fatimid Caliphate | 10.2415178 | NaN | -0.6472506 | NaN | NaN |
Ayyubid Dynasty | 9.3615834 | NaN | -0.6465587 | -0.1357675 | -0.1357675 |
Zengid Dynasty | 7.4257278 | NaN | 0.7591543 | NaN | NaN |
Byzantine Empire | 6.8977671 | NaN | 0.7602887 | -0.1357675 | -0.1357675 |
England | 5.8418459 | NaN | -0.6465587 | -0.1357675 | -0.1357675 |
Holy Roman Empire | 3.0260558 | NaN | -0.6465587 | -0.1357675 | -0.1357675 |
Kingdom of France | 1.6181608 | NaN | -0.6465587 | -0.1357675 | -0.1357675 |
Kingdom of Sicily | 0.5622395 | -0.1467125 | -0.6293842 | -0.3476788 | -0.3476788 |
Papal States | 0.0342789 | -0.1264908 | -0.6336748 | -0.3236913 | 2.5014147 |
name | broker.tot | broker.coord | broker.itin | broker.rep | broker.gate |
---|---|---|---|---|---|
Mongol Empire | 47.964825 | NaN | -0.5966483 | NaN | NaN |
Kingdom of France | 28.663539 | NaN | -0.5966483 | NaN | NaN |
Ayyubid Dynasty | 26.995527 | NaN | 2.3528915 | NaN | NaN |
Kingdom of England | 21.991489 | NaN | 8.9893561 | NaN | NaN |
Republic of Genoa | 11.983415 | NaN | -0.5966483 | NaN | NaN |
Knights Templar | 10.077115 | NaN | 1.6155066 | NaN | NaN |
Holy Roman Empire | 4.834790 | -0.0170801 | -0.5961482 | 10.865523 | 10.865523 |
Principality of Antioch | 4.834790 | -0.0170801 | 2.3541101 | 13.613565 | -0.126648 |
Kingdom of Cyprus | 4.596503 | 58.5391124 | 0.1414163 | 13.613565 | 10.865523 |
Armenian Kingdom of Cilicia | 3.881640 | -0.0170801 | -0.5961482 | -0.126648 | -0.126648 |
name | broker.gate |
---|---|
Papal States | 10.6435850 |
County of Aversa | -0.0853606 |
County of Sicily | -0.0853606 |
England | -0.0853606 |
Chola Empire | -0.0853606 |
County of Apulia | -0.1201631 |
Kingdom of Sicily | -0.1201631 |
Kingdom of Georgia | -0.1201631 |
Great Seljuq Empire | -0.1201631 |
Seljuk Empire | -0.1201631 |
name | broker.tot |
---|---|
Byzantine Empire | 22.7376579 |
Holy Roman Empire | 9.2813605 |
Sultanate of Rum | 9.2813605 |
England | 6.9745666 |
Kingdom of Sicily | 5.0522384 |
Seljuk Empire | 1.9765133 |
Kingdom of France | 1.9765133 |
Kingdom of Georgia | 0.8231164 |
Papal States | 0.4386507 |
Ghaznavids | 0.0541851 |
For a Specific Research Question: If you have a specific research question, please feel free to use that to guide your analysis. Otherwise, you may want to orient your analysis as follows in order to identify a compelling question or noteworthy pattern in the data that can be interpreted.
Answer: Since I am interested in the relative power of nations by their relative position ad centrality in the worldwide conflict, network brokerage can be used to illustrate significant positions in global conflict. Below I wanted to look at 4 kinds of brokerage, these are broker.gate or gatekeeper, coordinator, liason, and itinerant. I am interested to see if these specific coordination types are primarily done by specific nations.
Explanation: Looking at total brokerage in this dataset gives a sense of which factions were responsible for highest connection of unconnected actors through conflict. Given the crusades igniting conflict between Europe and the middle east it is sensible that the Byzantine Empire in the center of both connects the most unconnected actors through conflict closely followed by the Sultanate of Rum, a major Muslim faction that fought against the crusades and third being the Holy Roman Empire who participated in many conflicts including the crusades. These are followed by England who centered the wars in the British isles and the Kingdom of Sicily who were also in a position of conflict.
name | broker.tot |
---|---|
Byzantine Empire | 22.737658 |
Holy Roman Empire | 9.281360 |
Sultanate of Rum | 9.281360 |
England | 6.974567 |
Kingdom of Sicily | 5.052238 |
Explanation: In this case no particular country is very high above any other in terms of their coordinator brokerage, meaning that within groups no particular nations appear to be brokering more within the groups.
name | broker.coord |
---|---|
County of Apulia | -0.0176111 |
Kingdom of Sicily | -0.0176111 |
Kingdom of Georgia | -0.0176111 |
Great Seljuq Empire | -0.0176111 |
Papal States | -0.0176111 |
Explanation: Itinerant brokerage represents when a non-group actor connects 2 actors in a group it is no in to each other, in this case England has the highest score. Looking at the network graph they do appear to connect 2 actors in a group together.
name | broker.itin |
---|---|
England | 5.0036896 |
Kingdom of Sicily | 4.0866123 |
Byzantine Empire | 3.1654785 |
Holy Roman Empire | 2.2468427 |
Principality of Kiev | 0.4812412 |
Explanation: Representative brokerage indicates that the broker, or nation in question loses a war to another in their group, but wins another against a faction outside of their group. This can be though of as their directed connections to them. In this case the Seljuk Empire and Kingdom of Aragon have instances in which they lose to factions within their group before beating those outside of it.
name | broker.rep |
---|---|
Seljuk Empire | 3.4677529 |
Kingdom of Aragon | 0.9281821 |
County of Aversa | -0.0853606 |
County of Sicily | -0.0853606 |
England | -0.0853606 |
Explanation: The Papal states being ranked highest in gatekeeper brokerage is an interesting observation as no other nation in the dataset appears to be close to their level as most are negative in this category. In this cae being a gatekeeper means that they are in at conflict in a group with another while the nation in a different group of conflicts is only at war with them from the group. This is an interesting observation given the Papal states role as a coordinator of the war, but not a participant in the conflcit as directly as other belligerents. (This being the crusade given the period)
name | broker.gate |
---|---|
Papal States | 10.6435850 |
County of Aversa | -0.0853606 |
County of Sicily | -0.0853606 |
England | -0.0853606 |
Chola Empire | -0.0853606 |
Explanation: A liaison broker, in this case, is a faction that loses a war to a group they do not belong to and wins a war against a different group than the first that they also do not belong to. The Byzantine Empire, Sultanate of Rum, and Holy Roman Empire are highest in this category likely owing to their frequent states of conflict beyond the crusades against a variety of groups.
name | broker.lia |
---|---|
Byzantine Empire | 28.140866 |
Sultanate of Rum | 12.477603 |
Holy Roman Empire | 10.961803 |
England | 6.548214 |
Kingdom of Sicily | 4.589419 |
wars_in_1000s_edgelist <- as.matrix(wars_in_1000s)
wars_in_1000s_edgelist_network_edgelist <- graph.edgelist(wars_in_1000s_edgelist, directed=TRUE)
wars_in_1000s.ig<-graph_from_data_frame(wars_in_1000s)
wars_in_1000s_network <- asNetwork(wars_in_1000s.ig)
aspects_of_1000s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1000s_states.xlsx")
total_1000s <- merge(aspects_of_1000s_states, wars_in_1000s.nodes.stat_2, by="name")
total_1000s_brokerag_reg<-total_1000s
total_1000s_brokerag_reg$win_rate <- (total_1000s_brokerag_reg$outdegree/total_1000s_brokerag_reg$totdegree)
total_1000s_brokerag_reg$loss_rate <- (total_1000s_brokerag_reg$indegree/total_1000s_brokerag_reg$totdegree)
total_1000s_brokerag_reg_binom <- total_1000s_brokerag_reg %>% mutate(more_win_or_loss = case_when(
win_rate < 0.5 ~ 0,
win_rate >= 0.5 ~ 1))
First_1000s_regression <- glm(more_win_or_loss~.-name-totdegree-indegree-outdegree-dc-eigen.dc-win_rate-loss_rate, total_1000s_brokerag_reg_binom, family=binomial)
First_1000s_regression
Call: glm(formula = more_win_or_loss ~ . - name - totdegree - indegree -
outdegree - dc - eigen.dc - win_rate - loss_rate, family = binomial,
data = total_1000s_brokerag_reg_binom)
Coefficients:
(Intercept) Catholic Islam Orthodox Buddhist
-2.090e+01 1.446e-01 -7.108e-02 -4.043e-01 -8.572e-02
Pagan Tengrism Shinto Hindu Shamanism
5.506e-01 -5.656e+01 1.820e+00 -2.142e+00 -1.506e+00
eigen close rc eigen.rc broker.tot
-1.877e+03 5.146e+03 -3.979e+00 1.574e+03 2.378e+02
broker.coord broker.itin broker.rep broker.gate broker.lia
-9.610e+01 -9.449e+01 -7.164e+01 -2.810e+01 -1.298e+02
Degrees of Freedom: 101 Total (i.e. Null); 82 Residual
(8 observations deleted due to missingness)
Null Deviance: 140.8
Residual Deviance: 4.53e-09 AIC: 40
set.seed(292)
total_1000s_for_regression <- total_1000s[,-c(1, 20:25)]
total_1000s_for_regression$win_rate <- (total_1000s_for_regression$outdegree/total_1000s_for_regression$totdegree)
total_1000s_for_regression$loss_rate <- (total_1000s_for_regression$indegree/total_1000s_for_regression$totdegree)
total_1000s_for_regression <- total_1000s_for_regression %>% mutate(more_win_or_loss = case_when(
win_rate < 0.5 ~ 0,
win_rate >= 0.5 ~ 1))
First_1000s_regression <- glm(more_win_or_loss~.-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression, family=binomial)
First_1000s_regression
Call: glm(formula = more_win_or_loss ~ . - loss_rate - win_rate - totdegree -
indegree - outdegree - dc - eigen.dc, family = binomial,
data = total_1000s_for_regression)
Coefficients:
(Intercept) Catholic Islam Orthodox Buddhist
-15.1948 13.9008 12.7531 14.6893 15.0858
Pagan Tengrism Shinto Hindu Shamanism
0.9610 11.6691 16.0623 9.1358 -0.1497
eigen close rc eigen.rc
-82.1100 256.5294 -3.3322 -17.3152
Degrees of Freedom: 109 Total (i.e. Null); 96 Residual
Null Deviance: 152.3
Residual Deviance: 58.4 AIC: 86.4
set.seed(6738)
in_training<- sample(1:nrow(total_1000s_for_regression), nrow(total_1000s_for_regression) * 0.7 )
training_1000s <- total_1000s_for_regression[in_training,]
test_1000s <- total_1000s_for_regression[-in_training,]
lm_1000s_binom_subset_1 <- glm(more_win_or_loss~.-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression, family=binomial, subset = in_training )
logsitic_1_1000s_prob <- predict(lm_1000s_binom_subset_1, test_1000s,
type = "response")
log_preds_1<-ifelse(logsitic_1_1000s_prob >= 0.5, 1, 0)
prediction_1_logs <-mean(log_preds_1 == test_1000s$more_win_or_loss)
prediction_1_logs %>% kable()
x |
---|
0.9090909 |
set.seed(246)
x_ridge <- model.matrix(more_win_or_loss ~ .-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression)[, -1]
y_ridge <- total_1000s_for_regression$more_win_or_loss
grid <- 10^seq(10, -2, length = 100)
ridge.mod <- glmnet(x_ridge, y_ridge, alpha = 0, lambda = grid)
dim(coef(ridge.mod))
[1] 14 100
set.seed(9292)
ridge.mod <- glmnet(x_ridge[train_ridge, ], y_ridge[train_ridge],
alpha = 0, lambda = grid, thresh = 1e-12)
ridge.pred <- predict(ridge.mod, s = 4, newx = x_ridge[test_ridge,])
mean((ridge.pred - y.test_ridge)^2) %>% kable()
x |
---|
0.2416376 |
set.seed(231)
ridge.pred <- predict(ridge.mod, s = 0, newx = x_ridge[test_ridge, ],
exact = T, x = x_ridge[train_ridge, ], y = y_ridge[train_ridge])
predict(ridge.mod, s = 0, exact = T, type = "coefficients",
x = x_ridge[train_ridge, ], y = y_ridge[train_ridge])[1:14, ]
(Intercept) Catholic Islam Orthodox Buddhist
0.21024033 0.21827317 -0.01160454 0.21312966 0.35601806
Pagan Tengrism Shinto Hindu Shamanism
0.08955257 0.14069809 0.38278477 -0.07034364 -0.01038790
eigen close rc eigen.rc
-4.61480591 12.51011844 -0.29977861 4.64835194
set.seed(9292)
bestlam <- cv.out$lambda.min
bestlam
[1] 0.415338
set.seed(9292)
ridge.pred <- predict(cv.out, s = bestlam, newx = x_ridge[test_ridge,])
mean((ridge.pred - y.test_ridge)^2) %>% kable()
x |
---|
0.174632 |
set.seed(2897)
x_lasso <- model.matrix(more_win_or_loss ~ .-loss_rate-win_rate-totdegree-indegree-outdegree-dc-eigen.dc, total_1000s_for_regression)[, -1]
y_lasso <- total_1000s_for_regression$more_win_or_loss
grid <- 10^seq(10, -2, length = 100)
lasso.mod <- glmnet(x_lasso, y_lasso, alpha = 0, lambda = grid)
dim(coef(lasso.mod))
[1] 14 100
set.seed(1920)
bestlam_2 <- cv.out_2$lambda.min
lasso.pred <- predict(cv.out_2, s = bestlam_2, newx = x_ridge[test_ridge,])
mean((lasso.pred - y.test_ridge)^2) %>% kable()
x |
---|
0.1749583 |
set.seed(2739)
out <- glmnet(x_lasso[train_lasso, ], y_lasso[train_lasso],
alpha = 1, lambda = grid)
lasso.coef <- predict(out, type = "coefficients", s = bestlam_2)[1:14, ]
lasso.coef
(Intercept) Catholic Islam Orthodox Buddhist
0.42561685 0.05577020 -0.09275344 0.00000000 0.00000000
Pagan Tengrism Shinto Hindu Shamanism
0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
eigen close rc eigen.rc
0.00000000 3.22570629 -0.21240622 0.00000000
aspects_of_1100s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1100s_states.xlsx")
total_1100s <- merge(aspects_of_1100s_states, wars_in_1100s.nodes.stat_2, by="name")
aspects_of_1200s_states <- read_excel("~/Desktop/Spring 2022/Networks/aspects_of_1200s_states.xlsx")
total_1200s <- merge(aspects_of_1200s_states, wars_in_1200s.nodes.stat_2, by="name")
The first community cluster below is done using label propagation. This results in 39 groups
set.seed(23)
comm.lab<-label.propagation.community(wars_in_1000s.ig)
#Inspect clustering object
# igraph::groups(comm.lab)
Walktrap classification as seen below results in 19 distinct communities.
set.seed(238)
#Run clustering algorithm: fast_greedy
wars_in_1000s.wt<-walktrap.community(wars_in_1000s.ig)
#igraph::groups(wars_in_1000s.wt)
Adding more steps resulted in 19 groups for both 10 and 20 steps.
#Run & inspect clustering algorithm: 10 steps
#igraph::groups(walktrap.community(wars_in_1000s.ig, steps=10))
#Run & inspect clustering algorithm: 20 steps
#igraph::groups(walktrap.community(wars_in_1000s.ig ,steps=20))
#Run & inspect clustering algorithm
total_1000s_for_PCA <- total_1000s_brokerag_reg_binom[-c(20:27)]
apply(total_1000s_for_PCA[-1], 2, mean)
Catholic Islam Orthodox Buddhist
0.454545455 0.181818182 0.154545455 0.063636364
Pagan Tengrism Shinto Hindu
0.036363636 0.018181818 0.054545455 0.045454545
Shamanism totdegree indegree outdegree
0.009090909 2.754545455 1.336363636 1.418181818
eigen close rc eigen.rc
0.028058711 0.023546832 0.287358773 0.003637773
dc eigen.dc more_win_or_loss
0.712641227 0.024420939 0.481818182
apply(total_1000s_for_PCA[-1], 2, var)
Catholic Islam Orthodox Buddhist
0.2502085071 0.1501251043 0.1318598832 0.0601334445
Pagan Tengrism Shinto Hindu
0.0353628023 0.0180150125 0.0520433695 0.0437864887
Shamanism totdegree indegree outdegree
0.0090909091 8.9208507089 2.6656380317 6.3189324437
eigen close rc eigen.rc
0.0076304265 0.0019575460 0.1260782284 0.0004728954
dc eigen.dc more_win_or_loss
0.1260782284 0.0056490031 0.2519599666
pr.out <- prcomp(total_1000s_for_PCA[-1], scale = TRUE)
names(pr.out)
[1] "sdev" "rotation" "center" "scale" "x"
pr.out$center
Catholic Islam Orthodox Buddhist
0.454545455 0.181818182 0.154545455 0.063636364
Pagan Tengrism Shinto Hindu
0.036363636 0.018181818 0.054545455 0.045454545
Shamanism totdegree indegree outdegree
0.009090909 2.754545455 1.336363636 1.418181818
eigen close rc eigen.rc
0.028058711 0.023546832 0.287358773 0.003637773
dc eigen.dc more_win_or_loss
0.712641227 0.024420939 0.481818182
pr.out$scale
Catholic Islam Orthodox Buddhist
0.50020846 0.38745981 0.36312516 0.24522122
Pagan Tengrism Shinto Hindu
0.18805000 0.13422002 0.22813016 0.20925221
Shamanism totdegree indegree outdegree
0.09534626 2.98677932 1.63267818 2.51374868
eigen close rc eigen.rc
0.08735231 0.04424416 0.35507496 0.02174616
dc eigen.dc more_win_or_loss
0.35507496 0.07515985 0.50195614
biplot(pr.out, scale = 0)
pr.out$rotation = -pr.out$rotation
pr.out$x = -pr.out$x
biplot(pr.out, scale = 0)
pr.out$sdev
[1] 2.217501e+00 1.681548e+00 1.239242e+00 1.211199e+00 1.065982e+00
[6] 1.037692e+00 1.029507e+00 1.011117e+00 1.005425e+00 9.514802e-01
[11] 8.848499e-01 7.782431e-01 6.162540e-01 4.426224e-01 2.541422e-01
[16] 1.091189e-01 7.597269e-16 6.258811e-16 2.174635e-16
pr.var <- pr.out$sdev^2
pr.var
[1] 4.917311e+00 2.827605e+00 1.535720e+00 1.467004e+00 1.136318e+00
[6] 1.076804e+00 1.059884e+00 1.022359e+00 1.010879e+00 9.053146e-01
[11] 7.829594e-01 6.056623e-01 3.797690e-01 1.959146e-01 6.458828e-02
[16] 1.190694e-02 5.771849e-31 3.917271e-31 4.729037e-32
pve <- pr.var / sum(pr.var)
pve
[1] 2.588059e-01 1.488213e-01 8.082739e-02 7.721075e-02 5.980622e-02
[6] 5.667390e-02 5.578337e-02 5.380835e-02 5.320417e-02 4.764814e-02
[11] 4.120839e-02 3.187696e-02 1.998784e-02 1.031129e-02 3.399383e-03
[16] 6.266808e-04 3.037815e-32 2.061722e-32 2.488967e-33
names(total_1200s)
[1] "name" "Catholic" "Islam" "Orthodox"
[5] "Buddhist" "Pagan" "Tengrism" "Shinto"
[9] "Hindu" "Shamanism" "totdegree" "indegree"
[13] "outdegree" "eigen" "rc" "eigen.rc"
[17] "dc" "eigen.dc" "broker.tot" "broker.coord"
[21] "broker.itin" "broker.rep" "broker.gate" "broker.lia"
total_1200s_brokerag_reg<-total_1200s
total_1200s_brokerag_reg$win_rate <- (total_1200s_brokerag_reg$outdegree/total_1200s_brokerag_reg$totdegree)
total_1200s_brokerag_reg$loss_rate <- (total_1200s_brokerag_reg$indegree/total_1200s_brokerag_reg$totdegree)
total_1200s_brokerag_reg_binom <- total_1200s_brokerag_reg %>% mutate(more_win_or_loss = case_when(
win_rate < 0.5 ~ 0,
win_rate >= 0.5 ~ 1))
total_1200s_for_PCA <- total_1200s_brokerag_reg_binom[-c(20:27)]
apply(total_1200s_for_PCA[-1], 2, mean)
Catholic Islam Orthodox Buddhist Pagan
0.714285714 0.068322981 0.086956522 0.086956522 0.012422360
Tengrism Shinto Hindu Shamanism totdegree
0.024844720 0.000000000 0.006211180 0.000000000 3.900621118
indegree outdegree eigen rc eigen.rc
1.956521739 1.944099379 0.025409148 0.158212220 0.002179127
dc eigen.dc broker.tot
0.841787780 0.023230021 0.333968821
apply(total_1200s_for_PCA[-1], 2, var)
Catholic Islam Orthodox Buddhist Pagan
2.053571e-01 6.405280e-02 7.989130e-02 7.989130e-02 1.234472e-02
Tengrism Shinto Hindu Shamanism totdegree
2.437888e-02 0.000000e+00 6.211180e-03 0.000000e+00 2.655256e+01
indegree outdegree eigen rc eigen.rc
6.204348e+00 1.587811e+01 5.600340e-03 7.101398e-02 7.273423e-05
dc eigen.dc broker.tot
7.101398e-02 4.549154e-03 2.983411e+01
# I cannot scale variables with
total_1200s_for_PCA<-total_1200s_for_PCA[-c(8,10)]
pr.out_2 <- prcomp(total_1200s_for_PCA[-1], scale = TRUE)
names(pr.out_2)
[1] "sdev" "rotation" "center" "scale" "x"
pr.out_2$center
Catholic Islam Orthodox Buddhist Pagan
0.714285714 0.068322981 0.086956522 0.086956522 0.012422360
Tengrism Hindu totdegree indegree outdegree
0.024844720 0.006211180 3.900621118 1.956521739 1.944099379
eigen rc eigen.rc dc eigen.dc
0.025409148 0.158212220 0.002179127 0.841787780 0.023230021
broker.tot
0.333968821
pr.out_2$scale
Catholic Islam Orthodox Buddhist Pagan
0.453163484 0.253086537 0.282650499 0.282650499 0.111106798
Tengrism Hindu totdegree indegree outdegree
0.156137382 0.078811041 5.152917825 2.490852831 3.984734068
eigen rc eigen.rc dc eigen.dc
0.074835420 0.266484487 0.008528436 0.266484487 0.067447414
broker.tot
5.462061318
biplot(pr.out_2, scale = 0)
pr.out_2$rotation = -pr.out_2$rotation
pr.out_2$x = -pr.out_2$x
biplot(pr.out_2, scale = 0)
pr.out$sdev
[1] 2.217501e+00 1.681548e+00 1.239242e+00 1.211199e+00 1.065982e+00
[6] 1.037692e+00 1.029507e+00 1.011117e+00 1.005425e+00 9.514802e-01
[11] 8.848499e-01 7.782431e-01 6.162540e-01 4.426224e-01 2.541422e-01
[16] 1.091189e-01 7.597269e-16 6.258811e-16 2.174635e-16
pr.var_2 <- pr.out_2$sdev^2
pr.var_2
[1] 4.904685e+00 2.342837e+00 1.673460e+00 1.249477e+00 1.132262e+00
[6] 1.097150e+00 1.011239e+00 9.459795e-01 8.660928e-01 5.136566e-01
[11] 1.660835e-01 9.707769e-02 1.227998e-30 9.288233e-31 4.378653e-31
[16] 1.136377e-31
pve_2 <- pr.var_2 / sum(pr.var_2)
pve_2
[1] 3.065428e-01 1.464273e-01 1.045912e-01 7.809228e-02 7.076635e-02
[6] 6.857190e-02 6.320245e-02 5.912372e-02 5.413080e-02 3.210354e-02
[11] 1.038022e-02 6.067356e-03 7.674985e-32 5.805145e-32 2.736658e-32
[16] 7.102355e-33
(information regarding the meaning of each type of brokerage was acquired from https://edis.ifas.ufl.edu/publication/WC197)
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Milstein (2022, March 27). Data Analytics and Computational Social Science: Blog Post 7, Integrating ML. Retrieved from https://nmilsteinuma.github.io/posts/2022-03-21-blog-post-7/
BibTeX citation
@misc{milstein2022blog, author = {Milstein, Noah}, title = {Data Analytics and Computational Social Science: Blog Post 7, Integrating ML}, url = {https://nmilsteinuma.github.io/posts/2022-03-21-blog-post-7/}, year = {2022} }