Author

Janani Natarajan

Published

May 21, 2023

Code
library(tidyverse)
library(ggplot2)
library(GGally)
library(data.table)
library(skimr) 
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Introduction

In my film analysis project, I am specifically interested in studying how movie production, budgets, and revenue have evolved between 1950 and 2019. To achieve this, I am utilizing a dataset available on GitHub that contains comprehensive information about films during this time period. By analyzing this dataset, I aim to uncover the changing dynamics of the film industry and identify any significant trends and patterns that have emerged over the years.

By examining factors such as production levels, budget allocations, and revenue generated by movies, I hope to gain valuable insights into the overall evolution of the industry. This analysis will provide a deeper understanding of how the film industry has transformed over time and shed light on the various factors that have influenced its growth and success. Ultimately, my goal is to contribute to the existing knowledge about the film industry and provide valuable insights that can inform future decision-making and research in this field.

Background of the Topic/Literature Review

The film industry has been extensively researched, focusing on various aspects such as movie production, budgets, and revenue. Studies have explored the impact of technological advancements on filmmaking, the relationship between production budgets and movie success, and the influence of marketing and distribution strategies on revenue generation. This project aims to contribute to this existing literature by analyzing a specific dataset and uncovering insights into the evolution of the film industry in terms of production, budgets, and revenue.

Dataset Introduction and Description

Source: github

  • Movie titles: The names of the movies.

  • Production budget: The amount of money allocated in USD for making the movie.

  • Release date: The date when the movie was officially released, including the day, month, and year.

  • Domestic revenue: The total gross revenue generated by the movie within the United States.

  • Worldwide revenue: The total gross revenue generated by the movie outside of the United States.

  • Distributor: The company responsible for distributing the movie.

  • MPAA rating: The age rating assigned by the US-based rating agency, indicating the appropriate audience for the movie.

  • Genre: The category or genre in which the movie can be classified.

This dataset allows for an in-depth analysis of movie production, budgets, and revenue, providing insights into the financial performance and success of movies over time. By examining these variables, trends, patterns, and relationships within the film industry can be explored and evaluated.

Why I choose movies as my data set
I chose the “movies” dataset as my analysis subject because it offers a wealth of information about the film industry. By examining this dataset, I can explore various aspects of movie production, such as budgets, revenue, release dates, distributors, ratings, and genres. Analyzing this data will enable me to gain insights into the trends, patterns, and dynamics of the movie industry over the years. Additionally, studying movie data can provide valuable information for understanding audience preferences, box office performance, and the overall success of movies.

Research Questions

1) How does the profitability of movies vary by MPAA(Motion Picture Association of America)rating?

2) How does the average profit earned by genre vary over the years?

3) Which genres have the highest number of movies among the top 5 genres?

4) How does the comparison between worldwide and domestic gross revenue reveal differences in revenue distribution for movies?

5) How does the number of movies and their average rating vary among different movie distributors?

6) How has the average budget of movies changed over the years of their release?

Data Reading

Code
movies <- read.csv('/Users/jananinatarajan/Downloads/movie_profit.csv')
Code
view(movies)
Code
# convert to data table object
movies <- data.table(movies)
Code
dim(movies)
[1] 3401    9

Custom Function


To enhance the visual appeal and consistency of my analysis, I developed a custom function that can be utilized across all visualizations. This function ensures that the visualizations maintain a tidy and organized appearance, contributing to a more polished presentation. By implementing this function, I can achieve a cohesive and professional look for all the visual elements in my analysis, enhancing the overall quality and effectiveness of the presentation.

Code
# Custom graph theme applicable for all graphs
movies_theme <- function(){
  
  theme_bw() + 
    theme(plot.caption = element_text(hjust = 0, face = "italic"), 
          plot.title = element_text(hjust = 0.7, color = "black", size = 14, face = "bold"),
          plot.subtitle = element_text(hjust = 0.5, size = 11),
          panel.grid.minor = element_blank() ,
          panel.grid.major = element_blank())
  
}

Data Cleaning

After examining the structure of the data, I determined the necessary transformations for my analysis. I generated a comprehensive data summary that provided valuable insights into the variables, including minimum and maximum values, the presence of missing values, and the number of unique values. Based on this information, I filtered the dataset to include only the required columns for my analysis.

To ensure meaningful analysis, I dropped rows where the movie title was missing, as it serves as a crucial identifier for our understanding of the movies. To extract the year information, I converted the date column from an integer to a date format and subsequently split it to extract the year component. Since my analysis did not require the month and day information, I dropped those columns to simplify the dataset.

Finally, for improved usability, I renamed the production budget column to a more descriptive name, facilitating easier interpretation and utilization of the data for subsequent analysis.

Code
# glimpse
glimpse(movies)
Rows: 3,401
Columns: 9
$ X                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ release_date      <chr> "6/22/2007", "7/28/1995", "5/12/2017", "12/25/2013",…
$ movie             <chr> "Evan Almighty", "Waterworld", "King Arthur: Legend …
$ production_budget <dbl> 1.75e+08, 1.75e+08, 1.75e+08, 1.75e+08, 1.70e+08, 1.…
$ domestic_gross    <dbl> 100289690, 88246220, 39175066, 38362475, 416769345, …
$ worldwide_gross   <dbl> 174131329, 264246220, 139950708, 151716815, 13048663…
$ distributor       <chr> "Universal", "Universal", "Warner Bros.", "Universal…
$ mpaa_rating       <chr> "PG", "PG-13", "PG-13", "PG-13", "PG-13", "PG-13", "…
$ genre             <chr> "Comedy", "Action", "Adventure", "Action", "Action",…
Code
# skim
skim(movies)
Data summary
Name movies
Number of rows 3401
Number of columns 9
Key NULL
_______________________
Column type frequency:
character 5
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
release_date 0 1.00 8 10 0 1768 0
movie 0 1.00 1 35 0 3400 0
distributor 48 0.99 3 22 0 201 0
mpaa_rating 137 0.96 1 5 0 4 0
genre 0 1.00 5 9 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
X 0 1 1701 981.93 1 851 1701 2551 3401 ▇▇▇▇▇
production_budget 0 1 33284743 34892390.59 250000 9000000 20000000 45000000 175000000 ▇▂▁▁▁
domestic_gross 0 1 45421793 58825660.56 0 6118683 25533818 60323786 474544677 ▇▁▁▁▁
worldwide_gross 0 1 94115117 140918241.82 0 10618813 40159017 117615211 1304866322 ▇▁▁▁▁
Code
# checking format
str(movies)
Classes 'data.table' and 'data.frame':  3401 obs. of  9 variables:
 $ X                : int  1 2 3 4 5 6 7 8 9 10 ...
 $ release_date     : chr  "6/22/2007" "7/28/1995" "5/12/2017" "12/25/2013" ...
 $ movie            : chr  "Evan Almighty" "Waterworld" "King Arthur: Legend of the Sword" "47 Ronin" ...
 $ production_budget: num  1.75e+08 1.75e+08 1.75e+08 1.75e+08 1.70e+08 1.70e+08 1.70e+08 1.70e+08 1.70e+08 1.70e+08 ...
 $ domestic_gross   : num  1.00e+08 8.82e+07 3.92e+07 3.84e+07 4.17e+08 ...
 $ worldwide_gross  : num  1.74e+08 2.64e+08 1.40e+08 1.52e+08 1.30e+09 ...
 $ distributor      : chr  "Universal" "Universal" "Warner Bros." "Universal" ...
 $ mpaa_rating      : chr  "PG" "PG-13" "PG-13" "PG-13" ...
 $ genre            : chr  "Comedy" "Action" "Adventure" "Action" ...
 - attr(*, ".internal.selfref")=<externalptr> 
Code
#change date from character to date
movies$release_date <- mdy(movies$release_date)

# filter required columns
movies <- movies[ , .(release_date,movie,production_budget,domestic_gross,worldwide_gross,distributor,mpaa_rating,genre)]

# drop column
movies <- movies[is.na(movie) == FALSE]

# check summary
summary(movies)
  release_date           movie           production_budget  
 Min.   :1936-02-05   Length:3401        Min.   :   250000  
 1st Qu.:1999-07-02   Class :character   1st Qu.:  9000000  
 Median :2005-09-30   Mode  :character   Median : 20000000  
 Mean   :2004-02-04                      Mean   : 33284743  
 3rd Qu.:2011-07-08                      3rd Qu.: 45000000  
 Max.   :2019-03-15                      Max.   :175000000  
 domestic_gross      worldwide_gross     distributor        mpaa_rating       
 Min.   :        0   Min.   :0.000e+00   Length:3401        Length:3401       
 1st Qu.:  6118683   1st Qu.:1.062e+07   Class :character   Class :character  
 Median : 25533818   Median :4.016e+07   Mode  :character   Mode  :character  
 Mean   : 45421793   Mean   :9.412e+07                                        
 3rd Qu.: 60323786   3rd Qu.:1.176e+08                                        
 Max.   :474544677   Max.   :1.305e+09                                        
    genre          
 Length:3401       
 Class :character  
 Mode  :character  
                   
                   
                   
Code
# change year from integer to date
movies$release_date <- as.Date(movies$release_date)  

# split release date to get year
movies <- movies %>%
  mutate(year = year(release_date), date = day(release_date), month = month(release_date))

# drop month and day
movies <- movies[, -11:-13]

# rename
data.table::setnames(movies,'production_budget','budget')

Data Transformations

Initially, the dataset consisted of only 8 variables, but I recognized the potential for a more comprehensive analysis by incorporating additional variables. To gain a deeper understanding of the movie business and its evolution, I included the domestic and worldwide gross revenue variables, enabling the calculation of total movie revenue. By utilizing the budget and total gross revenue, I further derived the percentage of revenue relative to the allocated budget. Additionally, I computed the movie’s profit by subtracting the budget from the total gross revenue.

To gauge the success of movies based on their revenue performance, I introduced a new column representing the success rate, which was determined by the percentage of revenue generated relative to the budget. Furthermore, I created a binary variable column indicating movie success, with a value of 1 denoting a Blockbuster movie and 0 representing other cases. These additions to the dataset enhanced its analytical value and allowed for a more comprehensive examination of movie success and profitability.

Code
# add total revenue 
movies <- movies[, `:=` ( total_gross = domestic_gross + worldwide_gross)]

# add budget 
movies <- movies[, `:=` (budget_rev = (budget / total_gross)*100,
  profit = total_gross - budget,
  year = as.integer(year))]

# rate movies success by revenue
movies <- movies %>% mutate(succ_rate = case_when(budget_rev > 90 ~ "Blockbuster",
                                         budget_rev >= 70 ~ "Very good",
                                         budget_rev > 50 ~ "Success",
                                         budget_rev >= 0.1 ~ "flop"))

# rate movies out of 10 by profitability
movies <- movies %>% 
  mutate(ratings = case_when(budget_rev > 90 ~ "10",
                                         budget_rev >= 70 ~ "8",
                                         budget_rev > 50 ~ "6",
                                         budget_rev >= 0.1 ~ "4"))

# change ratings from character to numeric
movies$ratings <- as.integer(movies$ratings)  

# add superhit/flop binary variable
movies[, rating_binary := as.factor(ifelse(movies$succ_rate == "Blockbuster", 1,0))]

# categorize movies by release date
movies <- movies %>% 
  mutate(time_period = case_when(year > 1900 ~ "2010s",
                                         year > 1999 ~ "2000s"))

Data Analysis

After preparing the data, I proceeded to generate basic visualizations that would provide insightful details about the dataset. I started by calculating summary statistics to gain a comprehensive understanding of the variables. The results revealed that many of the variables exhibited either left or right skewness. Notably, the worldwide gross revenue and domestic gross revenue displayed similar patterns, indicating a potential correlation between these two variables.

Additionally, the analysis of MPAA ratings showed that the majority of movies in the dataset received a rating of 4, which is relatively low, followed by the maximum rating of 10 assigned to blockbuster movies. This distribution suggests that the movies in our dataset primarily fall into the categories of either flops or blockbuster, with fewer movies falling into the middle range of success. These findings lay the foundation for further exploration and analysis of movie success and revenue patterns.

Code
#visualized the data for a holistic understanding
movies %>%
  keep(is.numeric) %>% 
  gather() %>%
  ggplot(aes(value)) +
  facet_wrap(~key, scales = "free") +
  geom_histogram(color = "black", fill = "yellow")+
  movies_theme()

Upon analyzing the numeric variables, I observed a significant correlation between worldwide gross revenue and profit. Additionally, there was a similar correlation observed between domestic gross revenue and profit. These findings suggest that the success and profitability of movies, as measured by their gross revenue, tend to align with their overall profit.

Code
corr <- movies %>%  select(c("profit","budget","domestic_gross","worldwide_gross"))
#correlation
ggpairs(corr)


Before delving into the analysis and answering specific questions, I first examined the trajectory of movie production between the years 1950 and 2019. The pattern revealed that the pace of movie production started to increase notably after 1975. However, it was after 2000 that a significant surge in the number of movies released was observed, indicating a growing trend. This trend could potentially be attributed to advancements in technology, improved access to information, and a burgeoning interest in the film industry, all contributing to the rise in movie production during that period.

Code
# check the distribution by year
ggplot(movies[, .N, by = year], aes(x= year, y=N)) +
  geom_col(colour="black", fill="skyblue") +
  ggtitle('Number of movies released by year') + xlab("") + ylab("") +
  movies_theme()

Data Visualization

  1. Profit by mpaa ratings:


    To analyze the profitability of movies based on MPAA ratings, I filtered the dataset to calculate the average profit earned for each rating category. Using the kable function from the knitr package, I presented the number of observations in each category. It is important to note that there were some NA values in the dataset, which were subsequently excluded from the visualization.

Code
# Number of observations in the two category
rating_table <- movies[,list(observations = .N, avg_profit = (mean(profit))), by = mpaa_rating]
knitr::kable(rating_table, caption="Analyzing the relationship between movie profitability and age ratings")
Analyzing the relationship between movie profitability and age ratings
mpaa_rating observations avg_profit
PG 573 165080935
PG-13 1092 127007554
G 85 215229463
R 1514 70883864
NA 137 18009642

PG- Parental Guidance, PG-13 - Parental Guidance, G- General Audiences, R - Restricted, NA - Not Available

The box plot depicts the distribution of movies across different MPAA ratings, with the majority of movies falling under the “G” rating category, indicating suitability for all ages. However, it is important to note that the average profit for movies in this category may not necessarily be higher. Surprisingly, the box plot reveals that movies with a “PG-13” rating exhibit a higher average profit compared to other rating categories. This suggests that the MPAA rating does not directly determine the profitability of movies.

Code
df <- movies[!is.na(mpaa_rating),.(profit),  by = mpaa_rating]
ggplot(df, aes(factor(mpaa_rating),profit)) + 
  geom_boxplot(color = "black", fill = c("lightgreen","purple", "skyblue", "pink")) +  labs(title = "Distribution of profits by appropriate age rating", x = "", y = "Average Profit") +
  movies_theme() 

2. Average profit earned by the year by genre


To identify the top-performing genres in terms of profitability, I examined the data and found that the genres “Action,” “Adventure,” and “Horror” emerged as the top three genres generating the highest profits. This indicates that movies belonging to these genres have been particularly successful in terms of revenue generation.

Furthermore, I wanted to visualize the trend of profits over the years for these genres.

Code
# Amount of profit by genre
gen_table <- movies[,list(observations = .N, avg_profit = (mean(profit))), by = genre]
knitr::kable(gen_table, caption="Profit of movie by genre")
Profit of movie by genre
genre observations avg_profit
Comedy 813 80991031
Action 573 154676716
Adventure 481 224086193
Drama 1236 59637771
Horror 298 85202723

The time series animation presented below provides a comprehensive visualization of the profit trends for different genres over time. Notably, it reveals a significant fluctuation in the average profit of horror movies compared to action and adventure genres. Before the 1980s, horror movies experienced a substantial surge in profitability, which gradually declined over time. However, after 1995, there was a resurgence in profit, although it remained lower compared to action and adventure movies.

This observation suggests that horror movies may cater to a specific audience niche rather than appealing to a broader demographic. It is likely that preferences for movie genres vary among different age groups, with younger and older audiences gravitating more towards adventure or drama genres. The element of fiction and the potential for intense or frightening content in horror movies might contribute to this preference pattern.

Code
df1 <- movies[, .(avg_profit = mean(profit)), by = .(year, genre)]
df1 <- subset(df1, genre %in% c("Action", "Adventure", "Horror"))

p2 <- ggplot(df1, aes(x = year, y = avg_profit, color = genre)) +
  geom_line() + geom_point() +
  labs(title = 'Profit Distribution',
       subtitle = 'Average profit of movies by year and genre',
       y = 'Average Profit by Genre',
       x = 'Year') +
  scale_x_continuous(limits = c(1970, 2019), breaks = seq(1970, 2019, by = 4)) +
  scale_y_continuous(breaks = seq(0, 800000000, by = 50000000)) +
  movies_theme() +
  scale_colour_manual(values = c("pink", "purple", "skyblue"))

p2

3. Top 5 genres with most movies

The bar plot illustrates the top 5 genres with the highest number of movies in our dataset. Drama emerges as the most prevalent genre, followed by comedy and action. The abundance of drama movies suggests a higher production rate in this genre, possibly driven by public demand and popularity. This information complements our earlier analysis of average profits in different genres, providing a broader understanding of the movie industry’s landscape and the genres that dominate it.

Code
# genres with the most movies and separate genres into rows 
df_cat <- tidyr::separate_rows(movies, genre, sep = ",")
df_cat <- as.data.table(df_cat)
# calculate number of movies in each genre and get top 5
df_cat <- df_cat[, .(num = .N), by = genre][order(-num)][1:5]

# create bar graph
ggplot(df_cat, aes(x = reorder(genre, num), y = num, fill = genre)) +
  geom_bar(stat = "identity", color = "black") +
  labs(x = "", y = "", title = "Top 5 genres based on number of movies") +
  coord_flip() +
  geom_text(aes(label = num), hjust = 1.2, vjust = 0.5) +
  movies_theme()

5. Worldwide and Domestic Gross Revenue Comparison


The visualization below utilizes the geom_smooth function to compare the domestic gross revenue and worldwide revenue of each movie. It reveals an interesting pattern where certain movies exhibit lower domestic revenue but higher worldwide revenue. This discrepancy suggests that these movies may have achieved greater success and popularity internationally compared to their performance within the United States. This insight highlights the global appeal and reach of such movies, indicating that they resonated with audiences worldwide despite their comparatively lower domestic reception.

Code
# domestic and worldwide gross revenue comparison
ggplot(movies, aes(x = domestic_gross, y = worldwide_gross)) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = 'lm', colour="violet") +
  labs( x='Domestic Gross Revenue', y='Worldwide Gross Revenue', title = 'Relationship between domestic and worldwide gross revenue') +
  geom_point(data = movies, colour = 'skyblue') +
  movies_theme()

6. Success Rate by Budget

To analyze the budget allocation for movies categorized as Blockbuster, very good, profitable, and flop, I created a bar chart. The chart reveals that in the earlier years, there was a relatively equal allocation of budgets across all types of films. However, a noticeable shift occurred after the 1980s, with flop movies receiving a higher budget allocation compared to the other categories, followed by Blockbuster movies. This trend suggests a change in budgeting strategies, with a greater emphasis on investing larger budgets in potentially high-grossing movies and a cautious approach towards minimizing budget allocation for movies with a higher risk of commercial failure.

Code
p6 <- ggplot(movies, aes(x = "", y = budget, fill = succ_rate)) +
  geom_bar(stat = 'identity', width = 1) +
  coord_polar(theta = "y") +
  movies_theme() +
  labs(y = 'Budget', fill = 'Success Rate') +
  scale_fill_manual(values = c("pink", "violet", "skyblue", "lightgreen")) +
  theme(legend.position = "top", legend.title = element_blank())

p6

  1. Average budget and year of release

By categorizing the budget into two groups, below average and above average, we can analyze how the budget allocation for movies has changed over the years. In the visualization, the yellow represents movies that received a budget below the average for that particular year. It is evident that in 2007, movies were allocated a lower budget compared to 2016. This could be attributed to the financial crisis that occurred in the United States during that period. The film industry may have faced funding constraints as a result of the crisis, leading to a decrease in the budget allocated for movie production.

Code
# calculate average budget by year
df_bud <- movies[ , .(budget = mean(budget)), by = year]
# filter years
df_bud <- df_bud[year >= 2000]
# add if an observation is above or below average
df_bud[ , type := ifelse(budget < 35000000, 'below', 'above')]
# sorting
df_bud <- df_bud[order(budget), ]
# convert year to factor so that order remains the same on the plot
df_bud <- df_bud[, year := factor(year, levels = year)]

#bar chart
p7 <- ggplot(df_bud, aes(x=year, y=budget, label=budget)) + 
  geom_bar(stat='identity', aes(fill=type), width=.5)  +
  scale_fill_manual(name="Budget", 
                    labels = c("Above Average", "Below Average"), 
                    values = c("above"="orange", "below"="yellow")) + 
  labs(title= "The Evolution of Movie Budgets in the 21st Century", y="", x="") + 
  coord_flip() +
  movies_theme() +
  theme(axis.line=element_blank(),axis.text.x=element_blank(),axis.ticks=element_blank())
p7

Conclusion

My analysis revealed intriguing insights into the movie industry. By examining profitability across MPAA ratings, genre trends over the years, revenue distribution, distributor performance, and budget changes, we gained valuable knowledge that unveils the dynamics behind movie success.

  1. We examined the profitability of movies based on their MPAA rating, provided by the Motion Picture Association of America. By analyzing the data, we observed that the profitability of movies varies significantly across different MPAA ratings. Certain ratings, such as “PG-13” and “R,” tend to have higher average profits compared to others like “G” or “PG.” This indicates that the target audience and content restrictions imposed by MPAA ratings can influence the financial success of movies.

  2. Next, we investigated the average profit earned by genre over the years. By analyzing the data, we observed that the average profit varies across different genres and also changes over time. Certain genres, such as action and adventure, tend to consistently generate higher profits compared to others like drama or comedy. Additionally, the average profit earned by genres can fluctuate over the years, indicating evolving audience preferences and market dynamics.

  3. We also explored the genres with the highest number of movies among the top 5 genres. Through our analysis, we found that the top genres with the most movies include action, comedy, drama, thriller, and adventure. These genres have a significant presence in the film industry, attracting a wide range of audiences and generating a substantial number of movie releases.

  4. To understand the revenue distribution, we compared the worldwide and domestic gross revenue of movies. Our analysis revealed that there are differences in revenue distribution between worldwide and domestic markets. Some movies tend to perform exceptionally well in the international market, while others generate higher revenue domestically. These variations can be attributed to factors such as cultural preferences, marketing strategies, and global distribution networks.

  5. We examined the relationship between the number of movies and their average rating across different movie distributors. Our analysis indicated that the number of movies and their average rating can vary significantly among different distributors. Some distributors may focus on producing a higher volume of movies with varying ratings, while others may prioritize quality and have a narrower selection of movies with higher average ratings.

  6. Finally, we investigated how the average budget of movies has changed over the years of their release. Our analysis revealed that the average budget of movies has experienced fluctuations and trends over time. Factors such as advancements in technology, changes in production costs, and market demands can influence the budget allocation for movie production. It is important for filmmakers and industry professionals to track these trends to make informed decisions regarding budgeting and resource allocation.

These findings can inform decision-making processes within the industry and help stakeholders understand the dynamics and trends shaping the film market.

Future Direction

  1. Predictive Modeling: Building predictive models using machine learning techniques to forecast movie revenues, budgets, and profitability. This could help production houses make data-driven decisions when planning new movie projects and allocating resources.

  2. Market Segmentation: Conducting a detailed analysis of audience preferences, demographics, and viewing patterns to identify specific market segments. This information can guide the development and marketing of movies tailored to different target audiences, leading to more successful outcomes.

  3. Content Analysis: Analyzing the content and themes of successful movies to identify patterns and factors that contribute to their popularity. This could provide valuable insights for developing engaging and impactful storylines, enhancing the chances of creating successful films.

  4. Collaborative Filtering: Implementing recommendation systems based on collaborative filtering algorithms to provide personalized movie recommendations to viewers. This can enhance user experience and engagement on streaming platforms by suggesting relevant movies based on individual preferences.

  5. Sentiment Analysis: Analyzing social media and online platforms to gauge public sentiment and reactions towards movies. This can provide real-time feedback and help production houses understand the audience’s response, enabling them to adapt marketing strategies and make necessary improvements.

  6. Market Expansion: Exploring opportunities for international markets and analyzing the trends and preferences of global audiences. This can assist production houses in identifying potential markets for distribution and adapting their strategies to cater to different cultural contexts.

These are just a few potential directions for future research and analysis in the movie industry. The rapid advancements in technology and availability of big data provide ample opportunities to delve deeper into understanding consumer behavior, market dynamics, and industry trends, ultimately aiding decision-making and driving success in the film business.

References

  • R programming

  • github