Yan_Shi_blogpost6

Author

Yan Shi

Code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

1 Research question: How is lockdown affect people’s perceptions for remote work?

2 Theoretical support: Lockdown create a big shock in people’s life that motivate them to express their feeling, while enforced stay at home order restricted their real social interaction, people tend to express their feelings or experiences online using social media. Hence, their perceptions slowly converge and affect other social media users, forming a collective perceptions.

2.1 First, let’s take a look at people’s perceptions towards remote work before and after lockdown.

Code
# load data
df = pd.read_csv('remote_work_with_emotions.csv').drop(columns = ['Unnamed: 0'])
df.head()
author_id username author_followers author_tweets author_description author_location text created_at geo_id retweets ... geo_name states_abbrev clean_text no_stopwords_text no_remotework_text lematize_text emotion scores compound_score sentiment
0 2729932651 TwelveRivers12 367 1862 We strive to raise the bar of what it means to... Austin, TX #WFH but make it fashion (Twelve Rivers fashio... 2020-12-19 20:00:14+00:00 c3f37afa9efcf94b 1 ... Austin, TX TX wfh but make it fashion twelve rivers fashion ... wfh make fashion twelve rivers fashion office ... make fashion twelve rivers fashion office big ... make fashion twelve river fashion office big g... neutral {'neg': 0.0, 'neu': 0.834, 'pos': 0.166, 'comp... 0.8674 Positive
1 389908361 JuanC611 214 12248 I'm a #BCB, craft beer drinkin #Kaskade listen... Oxnard, CA Step 2, in progress...\n#wfh #wfhlife @ Riverp... 2020-12-19 02:56:54+00:00 a3c0ae863771d69e 0 ... Oxnard, CA CA step in progress wfh wfhlife riverpark step progress wfh wfhlife riverpark step progress wfhlife riverpark step progress wfhlife riverpark neutral {'neg': 0.0, 'neu': 0.641, 'pos': 0.359, 'comp... 0.4215 Positive
2 737763400118198277 MissionTXperts 828 1618 Follow us on IG! @missiontxperts #FamousForExp... Mission, TX Congratulations on your graduation!!! Welcome ... 2020-12-18 22:35:35+00:00 77633125ba089dcb 1 ... Mission, TX TX congratulations on your graduation welcome to ... congratulations graduation welcome missiontxpe... congratulations graduation welcome missiontxpe... congratulation graduation welcome missiontxper... joy {'neg': 0.0, 'neu': 0.566, 'pos': 0.434, 'comp... 0.7845 Positive
3 522212036 FitnessFoundry 2693 14002 Award Winning Personal Trainer| EMT-B 🚑 NSCA-R... Boston and Malden, MA Part 2 #HomeWorkout \n\n#OldSchool Jumping Jac... 2020-12-18 19:07:33+00:00 75f5a403163f6f95 1 ... Malden, MA MA part homeworkout oldschool jumping jack variat... part homeworkout oldschool jumping jack variat... part homeworkout oldschool jumping jack variat... part homeworkout oldschool jumping jack variat... neutral {'neg': 0.0, 'neu': 0.81, 'pos': 0.19, 'compou... 0.7003 Positive
4 1931184464 fixyourmattress 1236 17479 The only comfortable solution to SAGGING mattr... USA SAGGING bed❓ 🛏〰️🛏FIRM it up with MATTRESS HELP... 2020-12-18 18:46:21+00:00 7df9a00dcf914d5e 0 ... Plantation, FL FL sagging bed firm it up with mattress helper un... sagging bed firm mattress helper mattress supp... sagging bed firm mattress helper mattress supp... sagging bed firm mattress helper mattress supp... neutral {'neg': 0.0, 'neu': 0.674, 'pos': 0.326, 'comp... 0.8020 Positive

5 rows × 23 columns

Code
import datetime
df['created_at']= df['created_at'].apply(lambda x: datetime.datetime.strptime(x,"%Y-%m-%d %H:%M:%S%z"))
df['date'] = df['created_at'].apply(lambda x: x.date())
df['year'] = df['created_at'].apply(lambda x: x.year)
df['month'] = df['created_at'].apply(lambda x: x.month)
df['tweet'] = 1
Code
df['lockdown'] = df['created_at'].apply(lambda x: 'before' if x < Timestamp('2020-3-01 00:00:00+0000', tz='UTC') else 'during' if 
                                       ((x >= Timestamp('2020-3-01 00:00:00+0000', tz='UTC'))&(x <= Timestamp('2020-7-01 00:00:00+0000', tz='UTC')))
                                        else 'after')
df['lockdown'].value_counts()
during    14693
after     10606
before     2579
Name: lockdown, dtype: int64
Code
df_agg_sen = df.groupby(['lockdown', 'sentiment']).sum().reset_index()
df_agg_sen = pd.concat([df_agg_sen.loc[3:], df_agg_sen.loc[:2]]).reset_index(drop = True)
df_agg_sen
lockdown sentiment author_id author_followers author_tweets retweets replies likes quote_count compound_score year month tweet
0 before Negative 3.769505e+19 1257112.0 5025147.0 57.0 71.0 415.0 7.0 -84.2040 498728.0 1607.0 247.0
1 before Neutral 8.697684e+19 3433022.0 16200139.0 242.0 118.0 1122.0 22.0 0.0000 1197352.0 3741.0 593.0
2 before Positive 2.759157e+20 9100106.0 47729277.0 711.0 345.0 3573.0 74.0 1030.6164 3511264.0 11402.0 1739.0
3 during Negative 2.068842e+20 21434633.0 84407564.0 850.0 1151.0 8755.0 173.0 -762.0527 4237960.0 8004.0 2098.0
4 during Neutral 5.063449e+20 53033140.0 217158097.0 2679.0 1769.0 20307.0 514.0 0.0000 9267760.0 17770.0 4588.0
5 during Positive 1.076227e+21 65728691.0 254783893.0 4694.0 3703.0 39131.0 746.0 4427.7021 16174140.0 31066.0 8007.0
6 after Negative 3.050080e+20 16982001.0 71593556.0 668.0 585.0 5889.0 81.0 -548.7395 2879550.0 10634.0 1425.0
7 after Neutral 6.510789e+20 50729142.0 192736481.0 2091.0 1078.0 13242.0 324.0 0.0000 6153092.0 22588.0 3045.0
8 after Positive 1.474055e+21 56700820.0 233804687.0 10092.0 2937.0 60528.0 2183.0 3495.4168 12399137.0 46103.0 6136.0
Code
# plot number of remote work tweets
sns.catplot(data=df_agg_sen, x="lockdown", y='tweet', hue="sentiment", kind="bar")
<seaborn.axisgrid.FacetGrid at 0x7fe926a27700>

Code
# plot aggregate number of retweets for different sentiment remote work tweet
sns.catplot(data=df_agg_sen, x="lockdown", y='likes', hue="sentiment", kind="bar")
<seaborn.axisgrid.FacetGrid at 0x7fe8a0f2be80>

Code
# compare % of positive retweets in different lockdown time slot
345/(345+118+71), 4694/(4694+2679+850), 10092/(10092+2091+668)
(0.6460674157303371, 0.570837893712757, 0.785308536300677)

From the plot we can tell that all sentiment expression increased, but the number of positive tweets increased significantly during lockdown, the high volume of positive remote work tweets has a strong impact in forming a collective emotions and swing others perceptions towards remote work. This can be demonstrated in number of retweets and number of likes, people clearly prefer positive remote work tweets and tend to spread them out after lockdown, which indicates the impact of collective emotions.

2.2 Now let’s dive in the emerging process of collective emotion, what changes their perception, how can they benefit from remote work

Code
# aggregate date in month and explore ingranied emotion
per = df.date.dt.to_period('M')
df_agg_m = df.groupby(per).sum()
df_agg_m['date'] = df_agg_m.index
df_agg_m = df_agg_m.reset_index(drop = True).drop(columns = ['year', 'month', 'author_id', 'author_followers', 'author_tweets'])
df2 = pd.melt(df_agg_m, id_vars = ['date'], value_vars = ['emotion_joy', 'emotion_sadness', 'emotion_anger', 'emotion_fear'])
df2
date variable value
0 2019-03 emotion_joy 46.0
1 2019-04 emotion_joy 35.0
2 2019-05 emotion_joy 45.0
3 2019-06 emotion_joy 45.0
4 2019-07 emotion_joy 47.0
... ... ... ...
163 2022-06 emotion_fear 31.0
164 2022-07 emotion_fear 23.0
165 2022-08 emotion_fear 15.0
166 2022-09 emotion_fear 23.0
167 2022-10 emotion_fear 6.0

168 rows × 3 columns

Code
# plot monthly number of emotions
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(20,5))
sns.pointplot(x='date',y='value',data=df2, hue='variable')
plt.tick_params(axis='x', labelrotation = 45)
plt.title("Monthly remote working emotion")
Text(0.5, 1.0, 'Monthly remote working emotion')

Interesting, looks like both joy and fear increased during lockdown, why people are enjoy and why people are fear

Code
# explore remote work tweet in joy and fear during lockdown
data_lockdown = df[(df['created_at'] >= Timestamp('2020-3-01 00:00:00+0000', tz='UTC'))&(df['created_at'] <= Timestamp('2020-7-01 00:00:00+0000', tz='UTC'))].reset_index(drop = True)
data_joy = data_lockdown[data_lockdown['emotion']=='joy'].reset_index(drop = True)
data_fear = data_lockdown[data_lockdown['emotion']=='fear'].reset_index(drop = True)
Code
#bi_gram
def get_gram(text):
    return [' '.join(x) for x in text]
from nltk.util import ngrams
data_joy['bi_gram'] = data_joy['lematize_text'].apply(lambda x: (list(ngrams(x.split(), 2))))
data_joy['bi_gram'] = data_joy['bi_gram'].apply(lambda x: get_gram(x))
data_fear['bi_gram'] = data_fear['lematize_text'].apply(lambda x: (list(ngrams(x.split(), 2))))
data_fear['bi_gram'] = data_fear['bi_gram'].apply(lambda x: get_gram(x))
Code
#Tri_gram
data_joy['Tri_gram'] = data_joy['lematize_text'].apply(lambda x: (list(ngrams(x.split(), 3))))
data_joy['Tri_gram'] = data_joy['Tri_gram'].apply(lambda x: get_gram(x))
data_fear['Tri_gram'] = data_fear['lematize_text'].apply(lambda x: (list(ngrams(x.split(), 3))))
data_fear['Tri_gram'] = data_fear['Tri_gram'].apply(lambda x: get_gram(x))
Code
#top 20 bi_gram for joy
from collections import Counter
list_joy = []
for bi in data_joy['bi_gram']:
    list_joy.extend(bi) 
counts_joy = Counter(list_joy)
counts_joy.most_common()[:20]
[('working home', 110),
 ('work home', 83),
 ('good morning', 72),
 ('happy hour', 64),
 ('look like', 51),
 ('zoom meeting', 48),
 ('home office', 40),
 ('feel like', 40),
 ('new normal', 33),
 ('happy friday', 33),
 ('work remotely', 31),
 ('zoom u', 30),
 ('co worker', 28),
 ('womeninbusiness womenintech', 28),
 ('covid coronavirus', 26),
 ('ability work', 26),
 ('happy monday', 24),
 ('conference call', 24),
 ('team meeting', 22),
 ('good thing', 22)]
Code
#top 20 Tri_gram for joy
list2 = []
for bi in data_joy['Tri_gram']:
    list2.extend(bi) 
counts1 = Counter(list2)
counts1.most_common()[:20]
[('ability work remotely', 25),
 ('womeninbusiness womenintech workingmom', 20),
 ('new webex work', 19),
 ('webex work bundle', 19),
 ('work bundle make', 19),
 ('bundle make much', 19),
 ('make much affordable', 19),
 ('much affordable everyone', 19),
 ('affordable everyone use', 19),
 ('everyone use webex', 19),
 ('use webex enterprise', 19),
 ('webex enterprise conferencing', 19),
 ('enterprise conferencing calling', 19),
 ('conferencing calling amp', 19),
 ('calling amp collaboration', 19),
 ('amp collaboration smb', 19),
 ('collaboration smb priceswebex', 19),
 ('smb priceswebex meeting', 19),
 ('priceswebex meeting webex', 19),
 ('meeting webex calling', 19)]
Code
#top 20 bi_gram for fear
list_fear = []
for bi in data_fear['bi_gram']:
    list_fear.extend(bi) 
counts_fear = Counter(list_fear)
counts_fear.most_common()[:20]
[('working home', 65),
 ('work home', 49),
 ('home office', 42),
 ('zoom u', 42),
 ('conference call', 39),
 ('look like', 35),
 ('zoom call', 26),
 ('covid coronavirus', 23),
 ('thomas capone', 22),
 ('watch u', 20),
 ('year old', 20),
 ('collaboration remotelearning', 18),
 ('standing desk', 16),
 ('remotejobs remoteworklife', 16),
 ('w briancurtisnbc', 15),
 ('north carolina', 15),
 ('nbcdfw w', 14),
 ('remotelearning remotejobs', 14),
 ('coronavirus covid', 14),
 ('new york', 13)]
Code
#top 20 Tri_gram for joy
list2 = []
for bi in data_fear['Tri_gram']:
    list2.extend(bi) 
counts1 = Counter(list2)
counts1.most_common()[:20]
[('remotelearning remotejobs remoteworklife', 14),
 ('collaboration remotelearning remotejobs', 13),
 ('thomas capone collaboration', 11),
 ('capone collaboration remotelearning', 11),
 ('remotejobs remoteworklife remoteemployees', 11),
 ('nbcdfw w briancurtisnbc', 9),
 ('lose weight make', 9),
 ('pm news nbcdfw', 7),
 ('selfie selfisolation wfhdailyheadgear', 7),
 ('selfisolation wfhdailyheadgear chapel', 7),
 ('wfhdailyheadgear chapel hill', 7),
 ('perry hall maryland', 7),
 ('zoom u call', 7),
 ('waltsable gmail com', 7),
 ('work home life', 7),
 ('news nbcdfw w', 6),
 ('chapel hill north', 6),
 ('hill north carolina', 6),
 ('coronavirus perry hall', 6),
 ('pm nbcdfw w', 5)]
Code
# topic model
import gensim
import gensim.corpora as corpora
from pprint import pprint
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
import gensim.corpora as corpora
def sent_to_words(sentences):
    '''
    tokenize words
    '''
    for sentence in sentences:
        yield(gensim.utils.simple_preprocess(str(sentence), deacc=True))            #deacc=True removes punctuations
data_words = list(sent_to_words(data_joy['lematize_text'].values.tolist()))
# Create Dictionary
id2word = corpora.Dictionary(data_words)
# Create Corpus
texts = data_words
# Term Document Frequency
corpus = [id2word.doc2bow(text) for text in texts]
# number of topics
num_topics = 5
# Build LDA model
lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
                                       id2word=id2word,
                                       num_topics=num_topics, 
                                       random_state=100,
                                      update_every=1,
                                      chunksize=100,
                                      passes=10,
                                      alpha='auto')
pyLDAvis.enable_notebook()
vis = gensimvis.prepare(lda_model, corpus, id2word, mds='mmds')
vis
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/pyLDAvis/_prepare.py:246: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only
  default_term_info = default_term_info.sort_values(
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  from imp import reload
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  from imp import reload
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  from imp import reload
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  from imp import reload
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  from imp import reload
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  from imp import reload
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  from imp import reload
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/past/builtins/misc.py:45: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  from imp import reload
Code
# topic model
import gensim
import gensim.corpora as corpora
from pprint import pprint
import pyLDAvis
import pyLDAvis.gensim_models as gensimvis
import gensim.corpora as corpora
def sent_to_words(sentences):
    '''
    tokenize words
    '''
    for sentence in sentences:
        yield(gensim.utils.simple_preprocess(str(sentence), deacc=True))            #deacc=True removes punctuations
data_words = list(sent_to_words(data_fear['lematize_text'].values.tolist()))
# Create Dictionary
id2word = corpora.Dictionary(data_words)
# Create Corpus
texts = data_words
# Term Document Frequency
corpus = [id2word.doc2bow(text) for text in texts]
# number of topics
num_topics = 3
# Build LDA model
lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus,
                                       id2word=id2word,
                                       num_topics=num_topics, 
                                       random_state=100,
                                      update_every=1,
                                      chunksize=100,
                                      passes=10,
                                      alpha='auto')
pyLDAvis.enable_notebook()
vis = gensimvis.prepare(lda_model, corpus, id2word, mds='mmds')
vis
/Users/yanshi/opt/anaconda3/lib/python3.9/site-packages/pyLDAvis/_prepare.py:246: FutureWarning: In a future version of pandas all arguments of DataFrame.drop except for the argument 'labels' will be keyword-only
  default_term_info = default_term_info.sort_values(
Code
dog = data_fear[data_fear['lematize_text'].str.contains('call') == True].reset_index()
dog['lematize_text']
0      daughter interrupted conference call amp loudl...
1                         wanna call already living room
2      anyone used effinbirds image video call backgr...
3      called macys today disconnected message statin...
4      avtweeps avchallenges daychallenge fail far ho...
                             ...                        
159    working home year conference call video become...
160    moment kid running behind work video call beco...
161           every time jump call pet decide time fight
162    neighbor walking hall get exercise break desk ...
163    uforiascience mavie called oneyou u come dna d...
Name: lematize_text, Length: 164, dtype: object

First topic is about the concern about quartinelife and stay at home order, second topic is concern about interruption of conference call like your pet or family members, third topic is concern about financial situation for remote work (income, money)