finalpart1
DACSS 603 Project Proposal
Author

Alexis Gamez

Published

March 18, 2023

Code
library(tidyverse)
library(readr)

knitr::opts_chunk$set(echo = T)

Introduction

The current state of video games supports massive online and local communities, generating millions in revenue a year. No longer can the perception of video games continue to be that of a counter-culture media, but something more widely accepted, definitively present, and enjoyed among diverse communities.

Stated eloquently by Nielsen, Smith & Tosca in, Understanding Video Games: The Essential Introduction (2012), No cultural form exists in isolation; rather, it is integrated within a complex system of meanings shaped by society and its institutions. Compared to other cultural forms, such as literature, the medium of the video game is a new member of this fascinating ecology. It is certainly true that the history of cultural media shows an almost instinctive skepticism leveled at new media. It has been true of radio, it has been true of movies, and it has certainly been true of television, which has long fought against the perception that its role was to entertain, rather than to enlighten.

Now, over 10 years later, we can see that the skepticism once surrounding video games has waned, so much so that engaging with them has now been accepted as a common hobby and even profession. I believe that video games are here to stay and rather than reject the new norm, we should grow with it. Video games will only continue to evolve in their application, especially when taking into consideration recent developments in virtual reality technology. Understanding and utilizing video games as the unique source of cultural media that they are can provide us insight into the nature of popular media and the very real weight of their market. What makes a game good? Does a game have to be good to be “successful”? Does fighting among platform communities contribute to the success, or lack there of, of a video game? Do critic scores play a role? How about user scores? Or maybe, it comes down to the publisher and whether they have a large enough budget to advertise their games to the masses? These are all questions I’d like to address during the scope of my project. However, the main objective I’ve tasked myself with, is to address the following research question:

Of the six selected variables (Platform, Genre, Publisher, Rating, Critic Scores & User Scores), what impact does each have on the commercial success of a video game?

Hypothesis

While there exists a diverse range of articles and blog posts related to console wars, game of the year announcements, loot box market structures, etc. there has been a noticeable oversight regarding the correlation between public critique and generated revenue. An often overlooked predicament of video game development occurs post-release. A game may be applauded for all the characteristics the gaming community has grown to love, but if that game’s sales aren’t comparable to the costs it generated, then what reward is there for the developers? Would that game still be considered successful? Would it be lucrative for developers to switch to a more widely accepted genre or platform to guarantee economic success?

These questions have led to the development of the following hypothesis: As of 2017, independent variables “Platform” and “Genre” will have the most significant impact on Global sales.

From personal knowledge, it’s known that popularity by platform may fluctuate over the course of a series’ lifespan. The Nintendo developed Gamecube and Wii were widely popular upon release and are still commonly used today. The same can be said for Sony’s Playstation 1 & 2 or Microsoft’s Xbox 360. Similarly, there are also platforms like the Wii U, Playstation 3 and Xbox One that had tumultuous receptions upon release and led to committed users switching to other platform series. A common example would be the transition from Playstation to Xbox and vice versa. With this in mind, I hypothesize that Nintendo and Sony based platforms will have the highest impact on sales for video games based on prior knowledge concerning the success of select platforms like the Playstation 2, Gamecube and Wii.

When it comes to genre, the unrelenting success of the Call of Duty series within the time frame of this data serves as the core component of my belief that the genre Shooters will have the largest impact on sales among the video game genres. More and more shooters continue to be made in attempts to emulate the euphoria achieved when playing games like Call of Duty: Modern Warfare 2 and Black Ops. I also want to acknowledge other major successes such as Grand Theft Auto for example, that are major staples within their own given genres too (in the case of GTA, it’s the Role Playing genre). However, it’s known from personal experience that first person shooters revolutionized the gaming industry after the release of Call of Duty 4: Modern Warfare and I hypothesize that while while the modern gaming market may be over-saturated with shooters, they continue to play a large role in the commercial success of video games upon release.

Descriptive Statistics

Description and Summary of the Data

This data set was pulled from the Kaggle online database and it’s description reads as follows, This data set contains a list of video games with sales greater than 100,000 copies along with critic and user ratings.

Code
# reading in our data set
Video_Game_Sales <- read_csv("_data/final_project/Video_Game_Sales_as_of_Jan_2017.csv")
head(Video_Game_Sales)
# A tibble: 6 × 15
  Name       Platform Year_of_Release Genre Publisher NA_Sales EU_Sales JP_Sales
  <chr>      <chr>              <dbl> <chr> <chr>        <dbl>    <dbl>    <dbl>
1 Wii Sports Wii                 2006 Spor… Nintendo      41.4    29.0      3.77
2 Super Mar… NES                 1985 Plat… Nintendo      29.1     3.58     6.81
3 Mario Kar… Wii                 2008 Raci… Nintendo      15.7    12.8      3.79
4 Wii Sport… Wii                 2009 Spor… Nintendo      15.6    11.0      3.28
5 Pokemon R… G                   1996 Role… Nintendo      11.3     8.89    10.2 
6 Tetris     G                   1989 Puzz… Nintendo      23.2     2.26     4.22
# … with 7 more variables: Other_Sales <dbl>, Global_Sales <dbl>,
#   Critic_Score <dbl>, Critic_Count <dbl>, User_Score <dbl>, User_Count <dbl>,
#   Rating <chr>

With this updated data set provided by the collector, we are given 15 variables and approximately 17,500 entries. The variables are as follows:

  • Name [game’s name]
  • Platform [platform of game release]
  • Year of Release [game’s release date]
  • Genre [genre of game]
  • Publisher [publisher of game]
  • NA Sales [sales in North America in millions]
  • EU Sales [sales in Europe in millions]
  • JPN Sales [sales in Japan in millions]
  • Other Sales [sales in rest of the world in millions]
  • Global Sales [total worldwide sales in millions]
  • Critic Score [aggregate score compiled by Metacritic staff]
  • Critic Count [the number of critis used in creating the critic score]
  • User Score [score according to Metacritic subscribers]
  • User Count [number of users who gave the user score]
  • Rating [ESRB rating for the game]

How was the data collected?

Referencing the data set’s description once again, it states that, It is a combined web scrape from VGChartz and Metacritic along with manually entered year of release values for most games with a missing year of release.

The original code the collector utilized was created by Rush Kirubi, but it’s made apparent that the original set limited the data to only include a subset of video game platforms. Additionally, not all the listed video games have information on Metacritic, so there are a significant amount of missing values under the critic & user scores/counts variables.

This provides valuable context concerning Metacritic, the forum utilized by critics and users to rate their favorite games, and the numerous missing values within the data set. Metacritic was established in 1999. As a result, all entries pre-dating early 2000 lack critic and user scores, as it had not been as well established at the time.

Code
# summarizing our data
summary(Video_Game_Sales)
     Name             Platform         Year_of_Release    Genre          
 Length:17416       Length:17416       Min.   :1976    Length:17416      
 Class :character   Class :character   1st Qu.:2003    Class :character  
 Mode  :character   Mode  :character   Median :2008    Mode  :character  
                                       Mean   :2007                      
                                       3rd Qu.:2011                      
                                       Max.   :2017                      
                                       NA's   :8                         
  Publisher            NA_Sales          EU_Sales          JP_Sales       
 Length:17416       Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.00000  
 Class :character   1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: 0.00000  
 Mode  :character   Median : 0.0700   Median : 0.0200   Median : 0.00000  
                    Mean   : 0.2545   Mean   : 0.1407   Mean   : 0.07502  
                    3rd Qu.: 0.2300   3rd Qu.: 0.1000   3rd Qu.: 0.03000  
                    Max.   :41.3600   Max.   :28.9600   Max.   :10.22000  
                                                                          
  Other_Sales        Global_Sales      Critic_Score    Critic_Count   
 Min.   : 0.00000   Min.   : 0.0100   Min.   :13.00   Min.   :  3.00  
 1st Qu.: 0.00000   1st Qu.: 0.0500   1st Qu.:60.00   1st Qu.: 11.00  
 Median : 0.01000   Median : 0.1600   Median :71.00   Median : 21.00  
 Mean   : 0.04591   Mean   : 0.5165   Mean   :68.91   Mean   : 26.19  
 3rd Qu.: 0.03000   3rd Qu.: 0.4500   3rd Qu.:79.00   3rd Qu.: 36.00  
 Max.   :10.57000   Max.   :82.5400   Max.   :98.00   Max.   :113.00  
                                      NA's   :9080    NA's   :9080    
   User_Score      User_Count         Rating         
 Min.   :0.000   Min.   :    4.0   Length:17416      
 1st Qu.:6.400   1st Qu.:   10.0   Class :character  
 Median :7.500   Median :   25.0   Mode  :character  
 Mean   :7.117   Mean   :  162.7                     
 3rd Qu.:8.200   3rd Qu.:   81.0                     
 Max.   :9.700   Max.   :10766.0                     
 NA's   :9618    NA's   :9618                        

Summarizing our data shows that 9,080 entries lack critic scores and 9,618 of them lack user scores. Even with 9,618 entries omitted, there are still over 7,700 complete entries to analyze and I do not fear that the omission will negatively impact the analysis.

What are the important variables of interest?

Of the 15 variables provided, 11 will be heavily utilized throughout the scope of this project. 6 are to be considered independent variables and the remaining 5 will be dependent.

The 6 independent variables are as follows:

  • Platform
  • Genre
  • Publisher
  • Rating
  • Critic Scores
  • User Scores

The 5 dependent variables are:

  • NA Sales
  • EU Sales
  • JPN Sales
  • Other Sales
  • Global Sales

As stated previously, of the 6 independent variables, I believe that Genre and Platform will have the most significant impact on commercial success than any other of 4 remaining independent variables. However, I’d also like to add that I believe that the Critic Score variable will have little to no correlation with the commercial success of a video game.

Next Steps

Unfortunately, I did not have enough time prior to submitting this proposal to visualize and alter the data in more creative ways.

Moving forward, I would like to create histograms/boxplots that visualize the distribution of games according to genre and platform to show which categories seem the most significant. Ultimately, I’d like to do the same for all listed independent variables and observe whether or not they have an impact on sales. Finally, I’d also like to challenge my understanding of global success by attempting to analyze the effects the independent variables have on regional success as well.

Lastly, I would also like to make some alterations to the data set in order to more efficiently complete my analysis. I would like to re-code the platform variable so that devices are linked by platform series and not only by individual device (i.e. all variations of Playstation belong to Sony, all Xbox’s belong to Microsoft, etc.). I believe this will make it easier to visualize platform success over long periods of time. Similarly, I would like to re-code the Publisher variable into 3 separate groups, Major, Intermediate and independent. Each representing the scale and size of the referenced publishing studio which I believe may help determine the significance publishing studios have on sales.

Thank you for reading and have a wonderful day.

References

Egenfeldt-Nielsen, Simon, et al. Understanding Video Games : The Essential Introduction, Taylor & Francis Group, 2012. ProQuest Ebook Central, https://ebookcentral.proquest.com/lib/uma/detail.action?docID=1181119.

Etchells, Pete. Lost in a Good Game: Why We Play Video Games and What They Can Do for Us. Icon Books, 2019.

McCullough, Hayley. (2019). From Zelda to Stanley: Comparing the Integrative Complexity of Six Video Game Genres. Press Start. 5. 137-149.

Gillies, Kendall. “Video Game Sales and Ratings.” Kaggle, 25 Jan. 2017, https://www.kaggle.com/datasets/kendallgillies/video-game-sales-and-ratings?resource=download.