601 Project

Game Of Thrones Data Analysis and Visualization

Shruti Shelke and Snehal Prabhu
5/11/2022

Game of Thrones Dataset

We are using a Game of Thrones dataset that keeps record of all the deaths that occur in seasons one through eight. There are a total of 11 variables that describe this dataset. They are the following:

  1. order => order of deaths in the show

  2. season => which tells us the number of the season for that tuple

  3. episode => this gives us the episode number in that particular season that we are viewing deaths for

  4. character_killed => the name of the character killed

  5. killer => the killer that killed the corresponding character

  6. method => how the killer killed the character

  7. method_cat => this gives us the category of the method used for killing. For example: if the method is antler, the method_cat would be animal.

  8. reason => why the killer killed the victim 9.location => where the victim was killed by the killer

  9. allegiance => what house or community they support

  10. importance => how important the character killed is, higher the value, greater the importance. Range = [1,4]

Research Question

Game of thrones is one of our favorite shows. We have invested so much time while watching all the 8 seasons and reading some of the books as well. We wanted to do a fun data analysis project on Game of Thrones where we tried to analyse the deaths of the characters and the battles all the popular houses fought and died for. We tried to plot that information on bar graphs and answer some of the most common questions people have who have watched the show throughout the years and rooted for their favorite house to survive and sit on the iron throne.

Import the data

Import GOT data set and take a look at the first few rows just to get an idea of the dataset.

library(kableExtra)
GOT_data <- read_excel("GOTdata.xlsx")
kable(head(GOT_data))
order season episode character_killed killer method method_cat reason location allegiance importance
1 1 1 Waymar Royce White Walker Ice sword Blade Unknown Beyond the Wall House Royce, Night’s Watch 2
2 1 1 Gared White Walker Ice sword Blade Unknown Beyond the Wall Night’s Watch 2
3 1 1 Will Ned Stark Sword (Ice) Blade Deserting the Night’s Watch Winterfell Night’s Watch 2
4 1 1 Stag Direwolf Direwolf teeth Animal Unknown Winterfell None 1
5 1 1 Direwolf Stag Antler Animal Unknown Winterfell None 1
6 1 1 Jon Arryn Lysa Arryn Poison Poison Petyr Baelish persuaded Lysa to do so for reasons unknown King’s Landing House Arryn 2

Number of Deaths in every Episode:

We want to see how many characters die in every season. To do so, we group by season and then episode. Hence, we get a breakdown of the count of dead characters in every episode of every season.

deaths_by_season <- GOT_data %>%
                    group_by(season, episode)  %>%
                    summarise(count = n())
deaths_by_season$season<-sub("^","Season ",deaths_by_season$season)
deaths_by_season$episode<-sub("^","e",deaths_by_season$episode)
kable(head(deaths_by_season))
season episode count
Season 1 e1 7
Season 1 e2 3
Season 1 e4 1
Season 1 e5 17
Season 1 e6 5
Season 1 e7 5
library(forcats)
ggplot(data = deaths_by_season, aes(x = episode, y= count)) +
  geom_bar(stat="identity", fill="purple") +
  facet_wrap(vars(season), scales="free") +
  theme_bw() +
  labs(title="Total Deaths in 8 seasons")

We plot bar graphs to represent the deaths of characters in each episode of the show. We have a separate graph for each season and the bars indicate total deaths in every episode for that particular season. We observe that Season 8 episode 3 has the most deaths, which would be the Long Night episode when the war of the living vs the dead occurs. There are progressively more and more deaths as the season number increases, showing how much bloodshed increased over the course of the show.

Number of Deaths by Location

we want to see how many deaths occur in each location. Hence we group by location.

death_location <- GOT_data %>% group_by(location) %>% summarise(count_deaths = n()) %>%
                   arrange(desc(count_deaths))
death_location
# A tibble: 42 x 2
   location        count_deaths
   <chr>                  <int>
 1 Winterfell              3709
 2 King’s Landing          1357
 3 Beyond the Wall          993
 4 Meereen                  154
 5 Goldroad                 116
 6 Hardhome                  99
 7 The Twins                 84
 8 Castle Black              66
 9 Narrow Sea                36
10 Riverlands                31
# ... with 32 more rows

We observe that most deaths occur in Winterfell. That is because the Battle between the White Walkers and the Humans during The Long Night episode takes place in Winterfell, when most deaths in the show happened. This death count is also substantiated by the Battle of Bastards between Jon Snow and Ramsey Snow. The second most deaths occur in King’s Landing, mostly when Daenerys takes control of King’s Landing and burns the Red Keep as well as the many deaths taht Cersei plots such as the demolishing of the Sept of Baelor. The least deaths occur in the Riverlands, which are known to be a fairly calm region.

Main Cast

We now want to see how many important characters die over the course of the show. We define the ‘importance’ of characters by the following descriptions:

Death By Importance / Status: Labels : 1 - Soldiers, Knight with least screen time 2 - Less Screen time but nobels or knights like Lannister cousins, Karstarks 3 - Advisors and close to the Lords like Ser Rodrik, Spice Kings beyond the Sea 4 - Main characters, Lords and Ladies of Kingdoms which include Ned Stark, Robert Baratheon, Khal Drogo

death_importance <- GOT_data %>% group_by(importance) %>% summarise(count_importance = n()) %>%
                   arrange(desc(count_importance))

kable(death_importance)
importance count_importance
1 6682
2 85
3 75
4 44
NA 1

We see that characters of importance 1 have most deaths as these are generally the extras while characters of importance 4 are significantly lower as not as many important characters died as compared to the extras. Even though it felt like a lot while watching the show.

Let’s work on the main cast:

GOT_maincast <- GOT_data %>% filter(importance==4)

GOT_maincast <- GOT_maincast %>% separate(character_killed, c('Name', 'House')) %>% na.omit()

kable(head(GOT_maincast))
order season episode Name House killer method method_cat reason location allegiance importance
33 1 6 Viserys Targaryen Khal Drogo Molten gold Fire/Burning Threatened Daenerys Targaryen and her unborn child, drew his sword in the sacred city Vaes Dothrak House Targaryen 4
34 1 7 Robert Baratheon Boar Tusk Animal Hunted the boar while drunk King’s Landing House Baratheon of King’s Landing 4
56 1 9 Ned Stark Ilyn Payne Sword (Ice) Blade Executed on Joffrey Baratheon’s orders after Ned claimed Joffrey wasn’t the true heir to the throne King’s Landing House Stark 4
58 1 10 Khal Drogo Daenerys Targaryen Pillow Household item Killed after being put into a vegetative state by Mirri Maz Duur Red Waste Dothraki 4
79 2 5 Renly Baratheon Melisandre “the Red Woman” of Asshai Shadow Demon Magic Killed so that Stannis Baratheon would have fewer enemies Storm’s End House Baratheon of Storm’s End 4
199 3 4 Jeor Mormont Rast Knife Blade Attacked in a mutiny Beyond the Wall Night’s Watch, House Mormont 4
GOT_maincast %>% group_by(House) %>% summarise(count_house = n()) %>%
                arrange(desc(count_house)) %>%
                ggplot(aes(x=House, y=count_house)) +
                geom_bar(stat="identity", fill="lightpink") +
                scale_x_discrete(guide = guide_axis(n.dodge=2)) +
                theme_bw() +
                labs(title = "Death Count of House Leads")

Looking at only the major houses of Game of Thrones and how many individuals died that had allegiances to these houses, we see that Baratheons and Starks have the most deaths. It is interesting to note that Joffrey, Myrcella and Tommen are considered Baratheons in this dataset and not Lannisters, hence the high death count for Baratheons.

Assasins

We also tried to plot the top killers/ assassins of all the characters in game of thrones. We see that the white walkers have killed the maximum number of people in the show followed by Drogon in all of 8 seasons.

killers <- GOT_data %>% group_by(killer) %>% summarise(count_killer = n()) %>%
                arrange(desc(count_killer)) %>% 
  filter(count_killer >=91)
kable(killers)
killer count_killer
Wight 1602
Drogon 1426
Arya Stark 1278
None 477
Rhaegal 273
Cersei Lannister 199
Jon Snow 112
Stark soldier 96
Bolton soldier 91
ggplot(killers,aes(x=killer, y=count_killer)) +
geom_col(stat='identity', fill='lightblue') +
scale_x_discrete(guide = guide_axis(n.dodge=2)) +
theme_minimal() +
labs(x = "Killer Name", y = "Kill Count",title = "Count of kills by GOT characters")

Reflection

We plotted some graphs to answer the questions about the most death in the houses in the show. We also plotted a bar graph to note the battle with the highest casualties, and the best assassins of the show.
The dataset that we worked on is an idle dataset to create networks which we would like to learn and explore more on R in future and where we create nodes to connect the characters on their alliances and blood relations to visualize the connections between characters and houses in the story. In future we also plan to add the information about the books and if that story line is different from the show and if it has any noteworthy events not covered by the series.

Biblography

Dataset: https://www.washingtonpost.com/graphics/entertainment/game-of-thrones/

Valar Morghulis.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Prabhu (2022, May 19). Data Analytics and Computational Social Science: 601 Project. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsnehalproject/

BibTeX citation

@misc{prabhu2022601,
  author = {Prabhu, Shruti Shelke and Snehal},
  title = {Data Analytics and Computational Social Science: 601 Project},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsnehalproject/},
  year = {2022}
}