Submission for Homework 3
For the final project, I chose the IMDB 5000 dataset available in Kaggle (https://www.kaggle.com/carolzhangdc/imdb-5000-movie-dataset). This dataset contains data on 5000 movies and their respective IMDB ratings. Following code demonstrates how to read in the dataset.
movie_data <- read.csv("C:/Users/gbsam/Desktop/movie_metadata.csv")
The following are the variables and their respective types (as given by column ‘Mode’):
summary.default(movie_data)
Length Class Mode
color 5043 -none- character
director_name 5043 -none- character
num_critic_for_reviews 5043 -none- numeric
duration 5043 -none- numeric
director_facebook_likes 5043 -none- numeric
actor_3_facebook_likes 5043 -none- numeric
actor_2_name 5043 -none- character
actor_1_facebook_likes 5043 -none- numeric
gross 5043 -none- numeric
genres 5043 -none- character
actor_1_name 5043 -none- character
movie_title 5043 -none- character
num_voted_users 5043 -none- numeric
cast_total_facebook_likes 5043 -none- numeric
actor_3_name 5043 -none- character
facenumber_in_poster 5043 -none- numeric
plot_keywords 5043 -none- character
movie_imdb_link 5043 -none- character
num_user_for_reviews 5043 -none- numeric
language 5043 -none- character
country 5043 -none- character
content_rating 5043 -none- character
budget 5043 -none- numeric
title_year 5043 -none- numeric
actor_2_facebook_likes 5043 -none- numeric
imdb_score 5043 -none- numeric
aspect_ratio 5043 -none- numeric
movie_facebook_likes 5043 -none- numeric
Here is a glimpse of the dataset :
head(movie_data)
color director_name num_critic_for_reviews duration
1 Color James Cameron 723 178
2 Color Gore Verbinski 302 169
3 Color Sam Mendes 602 148
4 Color Christopher Nolan 813 164
5 Doug Walker NA NA
6 Color Andrew Stanton 462 132
director_facebook_likes actor_3_facebook_likes actor_2_name
1 0 855 Joel David Moore
2 563 1000 Orlando Bloom
3 0 161 Rory Kinnear
4 22000 23000 Christian Bale
5 131 NA Rob Walker
6 475 530 Samantha Morton
actor_1_facebook_likes gross genres
1 1000 760505847 Action|Adventure|Fantasy|Sci-Fi
2 40000 309404152 Action|Adventure|Fantasy
3 11000 200074175 Action|Adventure|Thriller
4 27000 448130642 Action|Thriller
5 131 NA Documentary
6 640 73058679 Action|Adventure|Sci-Fi
actor_1_name
1 CCH Pounder
2 Johnny Depp
3 Christoph Waltz
4 Tom Hardy
5 Doug Walker
6 Daryl Sabara
movie_title
1 AvatarÂ
2 Pirates of the Caribbean: At World's EndÂ
3 SpectreÂ
4 The Dark Knight RisesÂ
5 Star Wars: Episode VII - The Force AwakensÂ
6 John CarterÂ
num_voted_users cast_total_facebook_likes actor_3_name
1 886204 4834 Wes Studi
2 471220 48350 Jack Davenport
3 275868 11700 Stephanie Sigman
4 1144337 106759 Joseph Gordon-Levitt
5 8 143
6 212204 1873 Polly Walker
facenumber_in_poster
1 0
2 0
3 1
4 0
5 0
6 1
plot_keywords
1 avatar|future|marine|native|paraplegic
2 goddess|marriage ceremony|marriage proposal|pirate|singapore
3 bomb|espionage|sequel|spy|terrorist
4 deception|imprisonment|lawlessness|police officer|terrorist plot
5
6 alien|american civil war|male nipple|mars|princess
movie_imdb_link
1 http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1
2 http://www.imdb.com/title/tt0449088/?ref_=fn_tt_tt_1
3 http://www.imdb.com/title/tt2379713/?ref_=fn_tt_tt_1
4 http://www.imdb.com/title/tt1345836/?ref_=fn_tt_tt_1
5 http://www.imdb.com/title/tt5289954/?ref_=fn_tt_tt_1
6 http://www.imdb.com/title/tt0401729/?ref_=fn_tt_tt_1
num_user_for_reviews language country content_rating budget
1 3054 English USA PG-13 237000000
2 1238 English USA PG-13 300000000
3 994 English UK PG-13 245000000
4 2701 English USA PG-13 250000000
5 NA NA
6 738 English USA PG-13 263700000
title_year actor_2_facebook_likes imdb_score aspect_ratio
1 2009 936 7.9 1.78
2 2007 5000 7.1 2.35
3 2015 393 6.8 2.35
4 2012 23000 8.5 2.35
5 NA 12 7.1 NA
6 2012 632 6.6 2.35
movie_facebook_likes
1 33000
2 0
3 85000
4 164000
5 0
6 24000
Some of the research questions that this dataset can help answer :
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Barlaya (2022, Jan. 8). Data Analytics and Computational Social Science: Homework 3 : Samhith Barlaya. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsbarlayahw3/
BibTeX citation
@misc{barlaya2022homework, author = {Barlaya, Samhith}, title = {Data Analytics and Computational Social Science: Homework 3 : Samhith Barlaya}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsbarlayahw3/}, year = {2022} }