Submission for Homework 6
library(dplyr)
library(ggplot2)
movie_data <- read.csv("C:/Users/gbsam/Desktop/movie_metadata.csv")
country_summary <- movie_data %>%
group_by(country) %>%
summarise(median_rating = median(imdb_score),
sd_rating = sd(imdb_score),
n_ = n()) %>% top_n(10, median_rating)
barPlot <- ggplot(country_summary, aes(reorder(country, -median_rating), median_rating)) +
geom_col() +
geom_errorbar(aes(ymin = median_rating - sd_rating, ymax = median_rating + sd_rating), width=0.2)
barPlot + labs(y="Movie IMDB Rating, with uncertainity", x = "Country")
country_summary <- movie_data %>%
group_by(country) %>%
summarise(median_rating = median(imdb_score),
sd_rating = sd(imdb_score),
n_ = n()) %>% top_n(10, n_)
barPlot <- ggplot(country_summary, aes(reorder(country, -median_rating), median_rating)) +
geom_col() +
geom_errorbar(aes(ymin = median_rating - sd_rating, ymax = median_rating + sd_rating), width=0.2)
barPlot + labs(y="Movie IMDB Rating, with uncertainity", x = "Country")
ggplot(data=movie_data, aes(x=duration, y=imdb_score, group=1)) + geom_smooth()
There are too many points here to make any conclusions. Hence, I try to categorize the movies by their language.
ggplot(subset(movie_data, language %in% c('English', 'Cantonese', 'French','German', 'Japanese', 'Italian', 'Mandarin', 'Spanish')), aes(x=duration, y=imdb_score, group=1)) + geom_smooth() + facet_wrap(vars(language))
We see an interesting similarity of trend between the first and second plot - that is, the movie rating seems to have a similar variation in IMDB rating with the increase in either Actor 1’s facebook likes or Actor 2’s facebook likes.
ggplot(data=movie_data, aes(x=actor_1_facebook_likes, y=imdb_score, group=1)) + geom_smooth()
ggplot(data=movie_data, aes(x=actor_2_facebook_likes, y=imdb_score, group=1)) + geom_smooth()
ggplot(data=movie_data, aes(x=actor_3_facebook_likes, y=imdb_score, group=1)) + geom_smooth()
In order to answer this, I try to get the top 15 most popular actors when being listed as Actor 1, Actor 2 or Actor 3 in a movie. We can see that some names like Morgan Freeman, Steve Buscami, Bruce Willis appear on mulitple plots, which seem to indicate that their presence in a movie has some affect on its rating.
country_summary <- movie_data %>%
group_by(actor_1_name) %>%
summarise(median_rating = median(imdb_score),
sd_rating = sd(imdb_score),
n_ = n()) %>% top_n(15, n_)
ggplot(country_summary, aes(reorder(actor_1_name, -median_rating), median_rating)) +
geom_col() + theme(axis.text.x=element_text(angle=90)) +
geom_errorbar(aes(ymin = median_rating - sd_rating, ymax = median_rating + sd_rating), width=0.2) + ggtitle("Plot of top 15 actors by median IMDB rating of their movies ") +
xlab("Actors") + ylab("IMDB Median Rating")
country_summary <- movie_data %>%
group_by(actor_2_name) %>%
summarise(median_rating = median(imdb_score),
sd_rating = sd(imdb_score),
n_ = n()) %>% top_n(15, n_)
ggplot(country_summary, aes(reorder(actor_2_name, -median_rating), median_rating)) +
geom_col() + theme(axis.text.x=element_text(angle=90)) +
geom_errorbar(aes(ymin = median_rating - sd_rating, ymax = median_rating + sd_rating), width=0.2) + ggtitle("Plot of top 15 actors by median IMDB rating of their movies ") +
xlab("Actors") + ylab("IMDB Median Rating")
country_summary <- movie_data %>%
group_by(actor_3_name) %>%
summarise(median_rating = median(imdb_score),
sd_rating = sd(imdb_score),
n_ = n()) %>% top_n(15, n_)
ggplot(country_summary, aes(reorder(actor_3_name, -median_rating), median_rating)) +
geom_col() + theme(axis.text.x=element_text(angle=90)) +
geom_errorbar(aes(ymin = median_rating - sd_rating, ymax = median_rating + sd_rating), width=0.2) + ggtitle("Plot of top 15 actors by median IMDB rating of their movies ") +
xlab("Actors") + ylab("IMDB Median Rating")
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Barlaya (2022, Jan. 20). Data Analytics and Computational Social Science: Homework 6 : Samhith Barlaya. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsbarlayahw6/
BibTeX citation
@misc{barlaya2022homework, author = {Barlaya, Samhith}, title = {Data Analytics and Computational Social Science: Homework 6 : Samhith Barlaya}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomsbarlayahw6/}, year = {2022} }