Blog Post 2

Author

Niyati Sharma

Published

November 22, 2022

0.1 Introduction

Nowadays, most organizations and businesses develop online services, which add value to their business and even increase their customer base. Surveys and reviews have changed the dynamics of digital marketing. The feedback platform gave the power to customers to post, share, and review content. Customers can directly interact with other customers and companies.There are many companies that use public opinion to be able to achieve the goals of the company. 20-30 companies in the United States of America offer sentiment analysis as one of the tools to help corporate decision-making. Therefore, we use dynamic sentiment to discover additional information. This research uses dynamic sentiment because it can gather more precise and detailed result.

Along with the increasing number of internet, social media users and mobile devices will certainly impact on the increasing amount of data or user-generated content. The simple forms to collect opinions are application review and rating people give for their experience. With a massive information flows from social media, a highly effective approach is needed to summarize and retrieve information in a real-time situation. Several classification methods are suitable to analyze the data, such as Support Vector Machine (SVM), Naïve Bayes (NB), Nearest Neighbors (NN), Logistic Regression and Decision Tree. The model used to summarize is Naïve Bayes classification method.

We use topic modeling to determine topics that contained in the data. Numerous way of topic distribution is applicable such as Clustering, Feature Generation, and Dimensionality Reduction. Dimensionality reduction has an advantage compared to the other because each document’s distribution over topics gives a summary of the document. Compare them in this reduce feature space can be more meaningful than comparing in the original feature space.

Uber is one of the largest and greatest innovation in transport with its fast development, the interaction among user in the platform is high.In this research, we aim to dig further information from the dataset. We see the topics from the public opinion perspective in application reviews and ratings, especially their opinion regarding Uber, the sentiment is surely changing every day then which topic has positive or negative sentiment. This is useful for data processing to be more effective and fast.

0.2 What are research questions?

In this research, we will analyze huge user-generated content which can be used by organizations for their customer engagement strategies.The purpose of this study is to to map the public opinion towards certain topic by analyzing the sentiment of the text and create a topic model. The reviews feedback provide a lot of information about the product experience, any technical or operational gaps, and even their general sentiment towards the product company. The analysis will help in identifying the gaps in the priorities of the stakeholders. With the right customer engagement strategies, companies can make benefit.

We pick Uber as the case study, viewed as one of the most favored transportation methods in most part of the world to do the following search question.

  1. Analyse the user’s sentiments with Uber cabs.
  2. Problems faced by customers.

1 How are the data collected?

Collected the data from play store reviews. I scrapped the latest 50,000 reviews from there irrespective of the rating. https://play.google.com/store/apps/details?id=com.ubercab

Code
#installing the required packages
# !pip install bertopic
# !pip install google-play-scraper 
#!pip install keybert
Code
#importing the required packages
from google_play_scraper import Sort, reviews
import pandas as pd
from bertopic import BERTopic
from google.colab import files
Code
#strings
#go to google play store: https://play.google.com/store/apps
#go to playstore app page
#copy the app id id={appid}
appURL = 'com.ubercab'
Code
#scraping reviews from google play store
result, continuation_token = reviews(
    appURL, #app url
    lang='en', #language
    country='us', #country
    sort=Sort.NEWEST,
    count = 50000,
  #  filter_score_with = 1 # defaults to None(means all score)
)

result, _ = reviews(
    appURL,
    continuation_token=continuation_token
)
Code
#putting everything into a dataframe
df = pd.DataFrame(result)
Code
import pandas as pd
Code
df1 = pd.read_csv('_data/uberData.csv')
df1
Unnamed: 0 reviewId userName userImage content score thumbsUpCount reviewCreatedVersion at replyContent repliedAt
0 0 e6e1dcfb-b7b1-4e90-86cf-27d306340ba1 Ionut Stoian https://play-lh.googleusercontent.com/a-/ACNPE... They basically became slaves of the taxi drive... 1 0 4.433.10001 2022-08-12 13:40:44 NaN NaN
1 1 e3a302e2-6191-4b7f-9af3-524f6a2a4d14 Dhananjay Kamble https://play-lh.googleusercontent.com/a-/ACNPE... Good 5 0 4.425.10001 2022-08-12 13:35:29 NaN NaN
2 2 87952d14-174e-4c7b-bf39-af67f104253f Soumya Mishra https://play-lh.googleusercontent.com/a-/ACNPE... Great help in urgent time 5 0 4.432.10000 2022-08-12 13:33:57 NaN NaN
3 3 cabb2ff9-3f73-4173-be69-f6c94112fc25 Edwin Thapa https://play-lh.googleusercontent.com/a/ALm5wu... For the auto transportations this app is horri... 1 0 4.433.10001 2022-08-12 13:33:28 Hi Edwin, we're extremely sorry to hear about ... 2022-09-02 07:18:41
4 4 fbdfc361-066c-4867-9c6a-c281e684a61f Jose Martinez https://play-lh.googleusercontent.com/a-/ACNPE... Better way than calling a regular taxi 5 0 4.433.10001 2022-08-12 13:31:38 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ...
49995 49995 6de199ee-050a-40b4-8bb1-81d8344b5890 P S https://play-lh.googleusercontent.com/a/ALm5wu... Everything nice but my credit card payment opt... 5 0 4.425.10001 2022-06-12 17:02:13 NaN NaN
49996 49996 b2c6f888-9c52-4b36-863f-5381a9135682 kannan Nair https://play-lh.googleusercontent.com/a-/ACNPE... Good 👍 5 0 4.425.10001 2022-06-12 17:01:45 NaN NaN
49997 49997 78b291ee-158a-4fa4-a3df-96d0e6db1c10 Aditya UN https://play-lh.googleusercontent.com/a-/ACNPE... Most of my Mumbai - Pune Trips through kuber h... 2 0 NaN 2022-06-12 17:00:33 NaN NaN
49998 49998 18ce8bb1-a762-4226-905b-1e6b2837ef7a Harrison Dozman https://play-lh.googleusercontent.com/a/ALm5wu... I think I love riding with uber 5 0 4.425.10001 2022-06-12 17:00:00 NaN NaN
49999 49999 1b4ce6ec-b972-48e3-91f1-a300574f6050 Abhishek Ram https://play-lh.googleusercontent.com/a/ALm5wu... Very helpful 5 0 4.425.10001 2022-06-12 16:57:51 NaN NaN

50000 rows × 11 columns