Game Of Thrones Data Analysis and Visualization
We are using a Game of Thrones dataset that keeps record of all the deaths that occur in seasons one through eight. There are a total of 11 variables that describe this dataset. They are the following:
order => order of deaths in the show
season => which tells us the number of the season for that tuple
episode => this gives us the episode number in that particular season that we are viewing deaths for
character_killed => the name of the character killed
killer => the killer that killed the corresponding character
method => how the killer killed the character
method_cat => this gives us the category of the method used for killing. For example: if the method is antler, the method_cat would be animal.
reason => why the killer killed the victim 9.location => where the victim was killed by the killer
allegiance => what house or community they support
importance => how important the character killed is, higher the value, greater the importance. Range = [1,4]
Game of thrones is one of our favorite shows. We have invested so much time while watching all the 8 seasons and reading some of the books as well. We wanted to do a fun data analysis project on Game of Thrones where we tried to analyse the deaths of the characters and the battles all the popular houses fought and died for. We tried to plot that information on bar graphs and answer some of the most common questions people have who have watched the show throughout the years and rooted for their favorite house to survive and sit on the iron throne.
Import GOT data set and take a look at the first few rows just to get an idea of the dataset.
library(kableExtra)
GOT_data <- read_excel("GOTdata.xlsx")
kable(head(GOT_data))
order | season | episode | character_killed | killer | method | method_cat | reason | location | allegiance | importance |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | Waymar Royce | White Walker | Ice sword | Blade | Unknown | Beyond the Wall | House Royce, Night’s Watch | 2 |
2 | 1 | 1 | Gared | White Walker | Ice sword | Blade | Unknown | Beyond the Wall | Night’s Watch | 2 |
3 | 1 | 1 | Will | Ned Stark | Sword (Ice) | Blade | Deserting the Night’s Watch | Winterfell | Night’s Watch | 2 |
4 | 1 | 1 | Stag | Direwolf | Direwolf teeth | Animal | Unknown | Winterfell | None | 1 |
5 | 1 | 1 | Direwolf | Stag | Antler | Animal | Unknown | Winterfell | None | 1 |
6 | 1 | 1 | Jon Arryn | Lysa Arryn | Poison | Poison | Petyr Baelish persuaded Lysa to do so for reasons unknown | King’s Landing | House Arryn | 2 |
We want to see how many characters die in every season. To do so, we group by season and then episode. Hence, we get a breakdown of the count of dead characters in every episode of every season.
deaths_by_season <- GOT_data %>%
group_by(season, episode) %>%
summarise(count = n())
deaths_by_season$season<-sub("^","Season ",deaths_by_season$season)
deaths_by_season$episode<-sub("^","e",deaths_by_season$episode)
kable(head(deaths_by_season))
season | episode | count |
---|---|---|
Season 1 | e1 | 7 |
Season 1 | e2 | 3 |
Season 1 | e4 | 1 |
Season 1 | e5 | 17 |
Season 1 | e6 | 5 |
Season 1 | e7 | 5 |
library(forcats)
ggplot(data = deaths_by_season, aes(x = episode, y= count)) +
geom_bar(stat="identity", fill="purple") +
facet_wrap(vars(season), scales="free") +
theme_bw() +
labs(title="Total Deaths in 8 seasons")
We plot bar graphs to represent the deaths of characters in each episode of the show. We have a separate graph for each season and the bars indicate total deaths in every episode for that particular season. We observe that Season 8 episode 3 has the most deaths, which would be the Long Night episode when the war of the living vs the dead occurs. There are progressively more and more deaths as the season number increases, showing how much bloodshed increased over the course of the show.
we want to see how many deaths occur in each location. Hence we group by location.
death_location <- GOT_data %>% group_by(location) %>% summarise(count_deaths = n()) %>%
arrange(desc(count_deaths))
death_location
# A tibble: 42 x 2
location count_deaths
<chr> <int>
1 Winterfell 3709
2 King’s Landing 1357
3 Beyond the Wall 993
4 Meereen 154
5 Goldroad 116
6 Hardhome 99
7 The Twins 84
8 Castle Black 66
9 Narrow Sea 36
10 Riverlands 31
# ... with 32 more rows
We observe that most deaths occur in Winterfell. That is because the Battle between the White Walkers and the Humans during The Long Night episode takes place in Winterfell, when most deaths in the show happened. This death count is also substantiated by the Battle of Bastards between Jon Snow and Ramsey Snow. The second most deaths occur in King’s Landing, mostly when Daenerys takes control of King’s Landing and burns the Red Keep as well as the many deaths taht Cersei plots such as the demolishing of the Sept of Baelor. The least deaths occur in the Riverlands, which are known to be a fairly calm region.
We now want to see how many important characters die over the course of the show. We define the ‘importance’ of characters by the following descriptions:
Death By Importance / Status: Labels : 1 - Soldiers, Knight with least screen time 2 - Less Screen time but nobels or knights like Lannister cousins, Karstarks 3 - Advisors and close to the Lords like Ser Rodrik, Spice Kings beyond the Sea 4 - Main characters, Lords and Ladies of Kingdoms which include Ned Stark, Robert Baratheon, Khal Drogo
death_importance <- GOT_data %>% group_by(importance) %>% summarise(count_importance = n()) %>%
arrange(desc(count_importance))
kable(death_importance)
importance | count_importance |
---|---|
1 | 6682 |
2 | 85 |
3 | 75 |
4 | 44 |
NA | 1 |
We see that characters of importance 1 have most deaths as these are generally the extras while characters of importance 4 are significantly lower as not as many important characters died as compared to the extras. Even though it felt like a lot while watching the show.
Let’s work on the main cast:
GOT_maincast <- GOT_data %>% filter(importance==4)
GOT_maincast <- GOT_maincast %>% separate(character_killed, c('Name', 'House')) %>% na.omit()
kable(head(GOT_maincast))
order | season | episode | Name | House | killer | method | method_cat | reason | location | allegiance | importance |
---|---|---|---|---|---|---|---|---|---|---|---|
33 | 1 | 6 | Viserys | Targaryen | Khal Drogo | Molten gold | Fire/Burning | Threatened Daenerys Targaryen and her unborn child, drew his sword in the sacred city | Vaes Dothrak | House Targaryen | 4 |
34 | 1 | 7 | Robert | Baratheon | Boar | Tusk | Animal | Hunted the boar while drunk | King’s Landing | House Baratheon of King’s Landing | 4 |
56 | 1 | 9 | Ned | Stark | Ilyn Payne | Sword (Ice) | Blade | Executed on Joffrey Baratheon’s orders after Ned claimed Joffrey wasn’t the true heir to the throne | King’s Landing | House Stark | 4 |
58 | 1 | 10 | Khal | Drogo | Daenerys Targaryen | Pillow | Household item | Killed after being put into a vegetative state by Mirri Maz Duur | Red Waste | Dothraki | 4 |
79 | 2 | 5 | Renly | Baratheon | Melisandre “the Red Woman” of Asshai | Shadow Demon | Magic | Killed so that Stannis Baratheon would have fewer enemies | Storm’s End | House Baratheon of Storm’s End | 4 |
199 | 3 | 4 | Jeor | Mormont | Rast | Knife | Blade | Attacked in a mutiny | Beyond the Wall | Night’s Watch, House Mormont | 4 |
GOT_maincast %>% group_by(House) %>% summarise(count_house = n()) %>%
arrange(desc(count_house)) %>%
ggplot(aes(x=House, y=count_house)) +
geom_bar(stat="identity", fill="lightpink") +
scale_x_discrete(guide = guide_axis(n.dodge=2)) +
theme_bw() +
labs(title = "Death Count of House Leads")
Looking at only the major houses of Game of Thrones and how many individuals died that had allegiances to these houses, we see that Baratheons and Starks have the most deaths. It is interesting to note that Joffrey, Myrcella and Tommen are considered Baratheons in this dataset and not Lannisters, hence the high death count for Baratheons.
We also tried to plot the top killers/ assassins of all the characters in game of thrones. We see that the white walkers have killed the maximum number of people in the show followed by Drogon in all of 8 seasons.
killers <- GOT_data %>% group_by(killer) %>% summarise(count_killer = n()) %>%
arrange(desc(count_killer)) %>%
filter(count_killer >=91)
kable(killers)
killer | count_killer |
---|---|
Wight | 1602 |
Drogon | 1426 |
Arya Stark | 1278 |
None | 477 |
Rhaegal | 273 |
Cersei Lannister | 199 |
Jon Snow | 112 |
Stark soldier | 96 |
Bolton soldier | 91 |
ggplot(killers,aes(x=killer, y=count_killer)) +
geom_col(stat='identity', fill='lightblue') +
scale_x_discrete(guide = guide_axis(n.dodge=2)) +
theme_minimal() +
labs(x = "Killer Name", y = "Kill Count",title = "Count of kills by GOT characters")
We plotted some graphs to answer the questions about the most death in the houses in the show. We also plotted a bar graph to note the battle with the highest casualties, and the best assassins of the show.
The dataset that we worked on is an idle dataset to create networks which we would like to learn and explore more on R in future and where we create nodes to connect the characters on their alliances and blood relations to visualize the connections between characters and houses in the story. In future we also plan to add the information about the books and if that story line is different from the show and if it has any noteworthy events not covered by the series.
Dataset: https://www.washingtonpost.com/graphics/entertainment/game-of-thrones/
Valar Morghulis.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Prabhu (2022, May 19). Data Analytics and Computational Social Science: 601 Project. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsnehalproject/
BibTeX citation
@misc{prabhu2022601, author = {Prabhu, Shruti Shelke and Snehal}, title = {Data Analytics and Computational Social Science: 601 Project}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsnehalproject/}, year = {2022} }