Homework 3 : Samhith Barlaya

Submission for Homework 3

Samhith Barlaya
2022-01-03

1. Identify the dataset you will be using for the final project.

For the final project, I chose the IMDB 5000 dataset available in Kaggle (https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset). This dataset contains data on 5000 movies and their respective IMDB ratings. Following code demonstrates how to read in the dataset.

movie_data <- read.csv("C:/Users/gbsam/Desktop/movie_metadata.csv")

You should also identify the variables in the dataset, including what type of data each variable is.

The following are the variables and their respective types (as given by column ‘Mode’):

summary.default(movie_data)
                          Length Class  Mode     
color                     5043   -none- character
director_name             5043   -none- character
num_critic_for_reviews    5043   -none- numeric  
duration                  5043   -none- numeric  
director_facebook_likes   5043   -none- numeric  
actor_3_facebook_likes    5043   -none- numeric  
actor_2_name              5043   -none- character
actor_1_facebook_likes    5043   -none- numeric  
gross                     5043   -none- numeric  
genres                    5043   -none- character
actor_1_name              5043   -none- character
movie_title               5043   -none- character
num_voted_users           5043   -none- numeric  
cast_total_facebook_likes 5043   -none- numeric  
actor_3_name              5043   -none- character
facenumber_in_poster      5043   -none- numeric  
plot_keywords             5043   -none- character
movie_imdb_link           5043   -none- character
num_user_for_reviews      5043   -none- numeric  
language                  5043   -none- character
country                   5043   -none- character
content_rating            5043   -none- character
budget                    5043   -none- numeric  
title_year                5043   -none- numeric  
actor_2_facebook_likes    5043   -none- numeric  
imdb_score                5043   -none- numeric  
aspect_ratio              5043   -none- numeric  
movie_facebook_likes      5043   -none- numeric  

2. Read in/ clean the dataset

Here is a glimpse of the dataset :

head(movie_data)
  color     director_name num_critic_for_reviews duration
1 Color     James Cameron                    723      178
2 Color    Gore Verbinski                    302      169
3 Color        Sam Mendes                    602      148
4 Color Christopher Nolan                    813      164
5             Doug Walker                     NA       NA
6 Color    Andrew Stanton                    462      132
  director_facebook_likes actor_3_facebook_likes     actor_2_name
1                       0                    855 Joel David Moore
2                     563                   1000    Orlando Bloom
3                       0                    161     Rory Kinnear
4                   22000                  23000   Christian Bale
5                     131                     NA       Rob Walker
6                     475                    530  Samantha Morton
  actor_1_facebook_likes     gross                          genres
1                   1000 760505847 Action|Adventure|Fantasy|Sci-Fi
2                  40000 309404152        Action|Adventure|Fantasy
3                  11000 200074175       Action|Adventure|Thriller
4                  27000 448130642                 Action|Thriller
5                    131        NA                     Documentary
6                    640  73058679         Action|Adventure|Sci-Fi
     actor_1_name
1     CCH Pounder
2     Johnny Depp
3 Christoph Waltz
4       Tom Hardy
5     Doug Walker
6    Daryl Sabara
                                               movie_title
1                                                 Avatar 
2               Pirates of the Caribbean: At World's End 
3                                                Spectre 
4                                  The Dark Knight Rises 
5 Star Wars: Episode VII - The Force Awakens             
6                                            John Carter 
  num_voted_users cast_total_facebook_likes         actor_3_name
1          886204                      4834            Wes Studi
2          471220                     48350       Jack Davenport
3          275868                     11700     Stephanie Sigman
4         1144337                    106759 Joseph Gordon-Levitt
5               8                       143                     
6          212204                      1873         Polly Walker
  facenumber_in_poster
1                    0
2                    0
3                    1
4                    0
5                    0
6                    1
                                                     plot_keywords
1                           avatar|future|marine|native|paraplegic
2     goddess|marriage ceremony|marriage proposal|pirate|singapore
3                              bomb|espionage|sequel|spy|terrorist
4 deception|imprisonment|lawlessness|police officer|terrorist plot
5                                                                 
6               alien|american civil war|male nipple|mars|princess
                                       movie_imdb_link
1 http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1
2 http://www.imdb.com/title/tt0449088/?ref_=fn_tt_tt_1
3 http://www.imdb.com/title/tt2379713/?ref_=fn_tt_tt_1
4 http://www.imdb.com/title/tt1345836/?ref_=fn_tt_tt_1
5 http://www.imdb.com/title/tt5289954/?ref_=fn_tt_tt_1
6 http://www.imdb.com/title/tt0401729/?ref_=fn_tt_tt_1
  num_user_for_reviews language country content_rating    budget
1                 3054  English     USA          PG-13 237000000
2                 1238  English     USA          PG-13 300000000
3                  994  English      UK          PG-13 245000000
4                 2701  English     USA          PG-13 250000000
5                   NA                                        NA
6                  738  English     USA          PG-13 263700000
  title_year actor_2_facebook_likes imdb_score aspect_ratio
1       2009                    936        7.9         1.78
2       2007                   5000        7.1         2.35
3       2015                    393        6.8         2.35
4       2012                  23000        8.5         2.35
5         NA                     12        7.1           NA
6       2012                    632        6.6         2.35
  movie_facebook_likes
1                33000
2                    0
3                85000
4               164000
5                    0
6                24000

3. Identify potential research questions that your dataset can help answer.

Some of the research questions that this dataset can help answer :

  1. Does the duration of a movie impact its popularity?
  2. Does it matter to a movie if its cast is popular among Facebook users?
  3. What genre is a director expected to succeed in, given his past ratings?
  4. What country do the most popular movies belong to?
  5. Does presence of an actor boost a movie’s ratings?

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Barlaya (2022, Jan. 8). Data Analytics and Computational Social Science: Homework 3 : Samhith Barlaya. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsbarlayahw3/

BibTeX citation

@misc{barlaya2022homework,
  author = {Barlaya, Samhith},
  title = {Data Analytics and Computational Social Science: Homework 3 : Samhith Barlaya},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsbarlayahw3/},
  year = {2022}
}