Code
#installing the required packages
# !pip install bertopic
# !pip install google-play-scraper
#!pip install keybert
Niyati Sharma
November 22, 2022
Nowadays, most organizations and businesses develop online services, which add value to their business and even increase their customer base. Surveys and reviews have changed the dynamics of digital marketing. The feedback platform gave the power to customers to post, share, and review content. Customers can directly interact with other customers and companies.There are many companies that use public opinion to be able to achieve the goals of the company. 20-30 companies in the United States of America offer sentiment analysis as one of the tools to help corporate decision-making. Therefore, we use dynamic sentiment to discover additional information. This research uses dynamic sentiment because it can gather more precise and detailed result.
Along with the increasing number of internet, social media users and mobile devices will certainly impact on the increasing amount of data or user-generated content. The simple forms to collect opinions are application review and rating people give for their experience. With a massive information flows from social media, a highly effective approach is needed to summarize and retrieve information in a real-time situation. Several classification methods are suitable to analyze the data, such as Support Vector Machine (SVM), Naïve Bayes (NB), Nearest Neighbors (NN), Logistic Regression and Decision Tree. The model used to summarize is Naïve Bayes classification method.
We use topic modeling to determine topics that contained in the data. Numerous way of topic distribution is applicable such as Clustering, Feature Generation, and Dimensionality Reduction. Dimensionality reduction has an advantage compared to the other because each document’s distribution over topics gives a summary of the document. Compare them in this reduce feature space can be more meaningful than comparing in the original feature space.
Uber is one of the largest and greatest innovation in transport with its fast development, the interaction among user in the platform is high.In this research, we aim to dig further information from the dataset. We see the topics from the public opinion perspective in application reviews and ratings, especially their opinion regarding Uber, the sentiment is surely changing every day then which topic has positive or negative sentiment. This is useful for data processing to be more effective and fast.
In this research, we will analyze huge user-generated content which can be used by organizations for their customer engagement strategies.The purpose of this study is to to map the public opinion towards certain topic by analyzing the sentiment of the text and create a topic model. The reviews feedback provide a lot of information about the product experience, any technical or operational gaps, and even their general sentiment towards the product company. The analysis will help in identifying the gaps in the priorities of the stakeholders. With the right customer engagement strategies, companies can make benefit.
We pick Uber as the case study, viewed as one of the most favored transportation methods in most part of the world to do the following search question.
Collected the data from play store reviews. I scrapped the latest 50,000 reviews from there irrespective of the rating. https://play.google.com/store/apps/details?id=com.ubercab
#scraping reviews from google play store
result, continuation_token = reviews(
appURL, #app url
lang='en', #language
country='us', #country
sort=Sort.NEWEST,
count = 50000,
# filter_score_with = 1 # defaults to None(means all score)
)
result, _ = reviews(
appURL,
continuation_token=continuation_token
)
Unnamed: 0 | reviewId | userName | userImage | content | score | thumbsUpCount | reviewCreatedVersion | at | replyContent | repliedAt | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | e6e1dcfb-b7b1-4e90-86cf-27d306340ba1 | Ionut Stoian | https://play-lh.googleusercontent.com/a-/ACNPE... | They basically became slaves of the taxi drive... | 1 | 0 | 4.433.10001 | 2022-08-12 13:40:44 | NaN | NaN |
1 | 1 | e3a302e2-6191-4b7f-9af3-524f6a2a4d14 | Dhananjay Kamble | https://play-lh.googleusercontent.com/a-/ACNPE... | Good | 5 | 0 | 4.425.10001 | 2022-08-12 13:35:29 | NaN | NaN |
2 | 2 | 87952d14-174e-4c7b-bf39-af67f104253f | Soumya Mishra | https://play-lh.googleusercontent.com/a-/ACNPE... | Great help in urgent time | 5 | 0 | 4.432.10000 | 2022-08-12 13:33:57 | NaN | NaN |
3 | 3 | cabb2ff9-3f73-4173-be69-f6c94112fc25 | Edwin Thapa | https://play-lh.googleusercontent.com/a/ALm5wu... | For the auto transportations this app is horri... | 1 | 0 | 4.433.10001 | 2022-08-12 13:33:28 | Hi Edwin, we're extremely sorry to hear about ... | 2022-09-02 07:18:41 |
4 | 4 | fbdfc361-066c-4867-9c6a-c281e684a61f | Jose Martinez | https://play-lh.googleusercontent.com/a-/ACNPE... | Better way than calling a regular taxi | 5 | 0 | 4.433.10001 | 2022-08-12 13:31:38 | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
49995 | 49995 | 6de199ee-050a-40b4-8bb1-81d8344b5890 | P S | https://play-lh.googleusercontent.com/a/ALm5wu... | Everything nice but my credit card payment opt... | 5 | 0 | 4.425.10001 | 2022-06-12 17:02:13 | NaN | NaN |
49996 | 49996 | b2c6f888-9c52-4b36-863f-5381a9135682 | kannan Nair | https://play-lh.googleusercontent.com/a-/ACNPE... | Good 👍 | 5 | 0 | 4.425.10001 | 2022-06-12 17:01:45 | NaN | NaN |
49997 | 49997 | 78b291ee-158a-4fa4-a3df-96d0e6db1c10 | Aditya UN | https://play-lh.googleusercontent.com/a-/ACNPE... | Most of my Mumbai - Pune Trips through kuber h... | 2 | 0 | NaN | 2022-06-12 17:00:33 | NaN | NaN |
49998 | 49998 | 18ce8bb1-a762-4226-905b-1e6b2837ef7a | Harrison Dozman | https://play-lh.googleusercontent.com/a/ALm5wu... | I think I love riding with uber | 5 | 0 | 4.425.10001 | 2022-06-12 17:00:00 | NaN | NaN |
49999 | 49999 | 1b4ce6ec-b972-48e3-91f1-a300574f6050 | Abhishek Ram | https://play-lh.googleusercontent.com/a/ALm5wu... | Very helpful | 5 | 0 | 4.425.10001 | 2022-06-12 16:57:51 | NaN | NaN |
50000 rows × 11 columns