GOT Analysis and Visualization
We are using a Game of Thrones dataset that keeps record of all the deaths that occur in seasons one through eight. There are a total of 11 variables that describe this dataset. They are the following:
order => just serial number
season => which tells us the number of the season for that tuple
episode => this gives us the episode number in that particular season that we are viewing deaths for
character_killed => the name of the character killed
killer => the killer that killed the corresponding character
method => how the killer killed the character
method_cat => this gives us the category of the method used for killing. For example: if the method is antler, the method_cat would be animal.
reason => why the killer killed the victim 9.location => where the victim was killed by the killer
allegiance => what house or community they support
importance => how important the character killed is, higher the value, greater the importance. Range = [1,4]
Import GOT data set and take a look at the first few rows just to get an idea of the dataset.
GOT_data <- read_excel("GOTdata.xlsx")
head(GOT_data)
# A tibble: 6 x 11
order season episode character_killed killer method method_cat
<dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 1 1 Waymar Royce White Walker Ice s~ Blade
2 2 1 1 Gared White Walker Ice s~ Blade
3 3 1 1 Will Ned Stark Sword~ Blade
4 4 1 1 Stag Direwolf Direw~ Animal
5 5 1 1 Direwolf Stag Antler Animal
6 6 1 1 Jon Arryn Lysa Arryn Poison Poison
# ... with 4 more variables: reason <chr>, location <chr>,
# allegiance <chr>, importance <dbl>
Number of Deaths in every Episode: We want to see how many characters die in every season. To do so, we group by season and then episode. Hence, we get a breakdown of the count of dead characters in every episode of every season.
deaths_by_season <- GOT_data %>%
group_by(season, episode) %>%
summarise(count = n())
deaths_by_season$season<-sub("^","Season ",deaths_by_season$season)
deaths_by_season$episode<-sub("^","e",deaths_by_season$episode)
deaths_by_season
# A tibble: 69 x 3
# Groups: season [8]
season episode count
<chr> <chr> <int>
1 Season 1 e1 7
2 Season 1 e2 3
3 Season 1 e4 1
4 Season 1 e5 17
5 Season 1 e6 5
6 Season 1 e7 5
7 Season 1 e8 11
8 Season 1 e9 7
9 Season 1 e10 3
10 Season 2 e1 7
# ... with 59 more rows
ggplot(data = deaths_by_season, aes(x = episode, y= count)) +
geom_bar(stat="identity") +
facet_wrap(vars(season), scales="free") +
theme_bw() +
labs(title="Total Deaths in 8 seasons")
We plot bar graphs to represent the deaths of characters in each episode of the show. We have a separate graph for each season and the bars indicate total deaths in every episode for that particular season. We observe that Season 8 episode 3 has the most deaths, which would be the Long Night episode when the war of the living vs the dead occurs. There are progressively more and more deaths as the season number increases, showing how much bloodshed increased over the course of the show.
Number of Deaths by Location: we want to see how many deaths occur in each location. Hence we group by location.
death_location <- GOT_data %>% group_by(location) %>% summarise(count_deaths = n()) %>%
arrange(desc(count_deaths))
death_location
# A tibble: 42 x 2
location count_deaths
<chr> <int>
1 Winterfell 3709
2 King’s Landing 1357
3 Beyond the Wall 993
4 Meereen 154
5 Goldroad 116
6 Hardhome 99
7 The Twins 84
8 Castle Black 66
9 Narrow Sea 36
10 Riverlands 31
# ... with 32 more rows
We observe that most deaths occur in Winterfell. That is because the Battle between the White Walkers and the Humans during The Long Night episode takes place in Winterfell, when most deaths in the show happened. This death count is also substantiated by the Battle of Bastards between Jon Snow and Ramsey Snow. The second most deaths occur in King’s Landing, mostly when Daenerys takes control of King’s Landing and burns the Red Keep as well as the many deaths taht Cersei plots such as the demolishing of the Sept of Baelor. The least deaths occur in the Riverlands, which are known to be a fairly calm region.
We now want to see how many important characters die over the course of the show. We define the ‘importance’ of characters by the following descriptions:
Death By Importance / Status: Labels : 1 - Soldiers, Knight with least screen time 2 - Less Screen time but nobels or knights like Lannister cousins, Karstarks 3 - Advisors and close to the Lords like Ser Rodrik, Spice Kings beyond the Sea 4 - Main characters, Lords and Ladies of Kingdoms which include Ned Stark, Robert Baratheon, Khal Drogo
death_importance <- GOT_data %>% group_by(importance) %>% summarise(count_importance = n()) %>%
arrange(desc(count_importance))
death_importance
# A tibble: 5 x 2
importance count_importance
<dbl> <int>
1 1 6682
2 2 85
3 3 75
4 4 44
5 NA 1
We see that characters of importance 1 have most deaths as these are generally the extras while characters of importance 4 are significantly lower as not as many important characters died as compared to the extras. Even though it felt like a lot while watching the show.
Let’s work on the main cast:
GOT_maincast <- GOT_data %>% filter(importance==4)
GOT_maincast <- GOT_maincast %>% separate(character_killed, c('Name', 'House')) %>% na.omit()
head(GOT_maincast)
# A tibble: 6 x 12
order season episode Name House killer method method_cat reason
<dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 33 1 6 Viserys Targar~ Khal ~ Molte~ Fire/Burn~ Threa~
2 34 1 7 Robert Barath~ Boar Tusk Animal Hunte~
3 56 1 9 Ned Stark Ilyn ~ Sword~ Blade Execu~
4 58 1 10 Khal Drogo Daene~ Pillow Household~ Kille~
5 79 2 5 Renly Barath~ Melis~ Shado~ Magic Kille~
6 199 3 4 Jeor Mormont Rast Knife Blade Attac~
# ... with 3 more variables: location <chr>, allegiance <chr>,
# importance <dbl>
GOT_maincast %>% group_by(House) %>% summarise(count_house = n()) %>%
arrange(desc(count_house)) %>%
ggplot(aes(x=House, y=count_house)) +
geom_bar(stat="identity") +
scale_x_discrete(guide = guide_axis(n.dodge=4)) +
theme_bw() +
labs(title = "Death Count of House Leads")
Looking at only the major houses of Game of Thrones and how many individuals died that had allegiances to these houses, we see that Baratheons and Starks have the most deaths. It is interesting to note that Joffrey, Myrcella and Tommen are considered Baratheons in this dataset and not Lannisters, hence the high death count for Baratheons.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Prabhu (2022, May 19). Data Analytics and Computational Social Science: 601 HW 6. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomsnehalhw6/
BibTeX citation
@misc{prabhu2022601, author = {Prabhu, Shruti Shelke and Snehal}, title = {Data Analytics and Computational Social Science: 601 HW 6}, url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomsnehalhw6/}, year = {2022} }