601 HW 6

GOT Analysis and Visualization

Shruti Shelke and Snehal Prabhu
5/11/2022

We are using a Game of Thrones dataset that keeps record of all the deaths that occur in seasons one through eight. There are a total of 11 variables that describe this dataset. They are the following:

  1. order => just serial number

  2. season => which tells us the number of the season for that tuple

  3. episode => this gives us the episode number in that particular season that we are viewing deaths for

  4. character_killed => the name of the character killed

  5. killer => the killer that killed the corresponding character

  6. method => how the killer killed the character

  7. method_cat => this gives us the category of the method used for killing. For example: if the method is antler, the method_cat would be animal.

  8. reason => why the killer killed the victim 9.location => where the victim was killed by the killer

  9. allegiance => what house or community they support

  10. importance => how important the character killed is, higher the value, greater the importance. Range = [1,4]

Import GOT data set and take a look at the first few rows just to get an idea of the dataset.

GOT_data <- read_excel("GOTdata.xlsx")
head(GOT_data)
# A tibble: 6 x 11
  order season episode character_killed killer       method method_cat
  <dbl>  <dbl>   <dbl> <chr>            <chr>        <chr>  <chr>     
1     1      1       1 Waymar Royce     White Walker Ice s~ Blade     
2     2      1       1 Gared            White Walker Ice s~ Blade     
3     3      1       1 Will             Ned Stark    Sword~ Blade     
4     4      1       1 Stag             Direwolf     Direw~ Animal    
5     5      1       1 Direwolf         Stag         Antler Animal    
6     6      1       1 Jon Arryn        Lysa Arryn   Poison Poison    
# ... with 4 more variables: reason <chr>, location <chr>,
#   allegiance <chr>, importance <dbl>

Number of Deaths in every Episode: We want to see how many characters die in every season. To do so, we group by season and then episode. Hence, we get a breakdown of the count of dead characters in every episode of every season.

deaths_by_season <- GOT_data %>%
                    group_by(season, episode)  %>%
                    summarise(count = n())
deaths_by_season$season<-sub("^","Season ",deaths_by_season$season)
deaths_by_season$episode<-sub("^","e",deaths_by_season$episode)
deaths_by_season
# A tibble: 69 x 3
# Groups:   season [8]
   season   episode count
   <chr>    <chr>   <int>
 1 Season 1 e1          7
 2 Season 1 e2          3
 3 Season 1 e4          1
 4 Season 1 e5         17
 5 Season 1 e6          5
 6 Season 1 e7          5
 7 Season 1 e8         11
 8 Season 1 e9          7
 9 Season 1 e10         3
10 Season 2 e1          7
# ... with 59 more rows
ggplot(data = deaths_by_season, aes(x = episode, y= count)) +
  geom_bar(stat="identity") +
  facet_wrap(vars(season), scales="free") +
  theme_bw() +
  labs(title="Total Deaths in 8 seasons")

We plot bar graphs to represent the deaths of characters in each episode of the show. We have a separate graph for each season and the bars indicate total deaths in every episode for that particular season. We observe that Season 8 episode 3 has the most deaths, which would be the Long Night episode when the war of the living vs the dead occurs. There are progressively more and more deaths as the season number increases, showing how much bloodshed increased over the course of the show.

Number of Deaths by Location: we want to see how many deaths occur in each location. Hence we group by location.

death_location <- GOT_data %>% group_by(location) %>% summarise(count_deaths = n()) %>%
                   arrange(desc(count_deaths))
death_location
# A tibble: 42 x 2
   location        count_deaths
   <chr>                  <int>
 1 Winterfell              3709
 2 King’s Landing          1357
 3 Beyond the Wall          993
 4 Meereen                  154
 5 Goldroad                 116
 6 Hardhome                  99
 7 The Twins                 84
 8 Castle Black              66
 9 Narrow Sea                36
10 Riverlands                31
# ... with 32 more rows

We observe that most deaths occur in Winterfell. That is because the Battle between the White Walkers and the Humans during The Long Night episode takes place in Winterfell, when most deaths in the show happened. This death count is also substantiated by the Battle of Bastards between Jon Snow and Ramsey Snow. The second most deaths occur in King’s Landing, mostly when Daenerys takes control of King’s Landing and burns the Red Keep as well as the many deaths taht Cersei plots such as the demolishing of the Sept of Baelor. The least deaths occur in the Riverlands, which are known to be a fairly calm region.

We now want to see how many important characters die over the course of the show. We define the ‘importance’ of characters by the following descriptions:

Death By Importance / Status: Labels : 1 - Soldiers, Knight with least screen time 2 - Less Screen time but nobels or knights like Lannister cousins, Karstarks 3 - Advisors and close to the Lords like Ser Rodrik, Spice Kings beyond the Sea 4 - Main characters, Lords and Ladies of Kingdoms which include Ned Stark, Robert Baratheon, Khal Drogo

death_importance <- GOT_data %>% group_by(importance) %>% summarise(count_importance = n()) %>%
                   arrange(desc(count_importance))

death_importance
# A tibble: 5 x 2
  importance count_importance
       <dbl>            <int>
1          1             6682
2          2               85
3          3               75
4          4               44
5         NA                1

We see that characters of importance 1 have most deaths as these are generally the extras while characters of importance 4 are significantly lower as not as many important characters died as compared to the extras. Even though it felt like a lot while watching the show.

Let’s work on the main cast:

GOT_maincast <- GOT_data %>% filter(importance==4)

GOT_maincast <- GOT_maincast %>% separate(character_killed, c('Name', 'House')) %>% na.omit()

head(GOT_maincast)
# A tibble: 6 x 12
  order season episode Name    House   killer method method_cat reason
  <dbl>  <dbl>   <dbl> <chr>   <chr>   <chr>  <chr>  <chr>      <chr> 
1    33      1       6 Viserys Targar~ Khal ~ Molte~ Fire/Burn~ Threa~
2    34      1       7 Robert  Barath~ Boar   Tusk   Animal     Hunte~
3    56      1       9 Ned     Stark   Ilyn ~ Sword~ Blade      Execu~
4    58      1      10 Khal    Drogo   Daene~ Pillow Household~ Kille~
5    79      2       5 Renly   Barath~ Melis~ Shado~ Magic      Kille~
6   199      3       4 Jeor    Mormont Rast   Knife  Blade      Attac~
# ... with 3 more variables: location <chr>, allegiance <chr>,
#   importance <dbl>
GOT_maincast %>% group_by(House) %>% summarise(count_house = n()) %>%
                arrange(desc(count_house)) %>%
                ggplot(aes(x=House, y=count_house)) +
                geom_bar(stat="identity") +
                scale_x_discrete(guide = guide_axis(n.dodge=4)) +
                theme_bw() +
                labs(title = "Death Count of House Leads")

Looking at only the major houses of Game of Thrones and how many individuals died that had allegiances to these houses, we see that Baratheons and Starks have the most deaths. It is interesting to note that Joffrey, Myrcella and Tommen are considered Baratheons in this dataset and not Lannisters, hence the high death count for Baratheons.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Prabhu (2022, May 19). Data Analytics and Computational Social Science: 601 HW 6. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsnehalhw6/

BibTeX citation

@misc{prabhu2022601,
  author = {Prabhu, Shruti Shelke and Snehal},
  title = {Data Analytics and Computational Social Science: 601 HW 6},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsnehalhw6/},
  year = {2022}
}