Final Project checkin 5

Final project DACSS 603
Author

Diana Rinker

Published

May 9, 2023

DACSS 603, spring 2023

Final Project check-in 5, Diana Rinker.

Introduction. Online engagement

It is well known that online engagement with the web resource is a highly valuable metric and is contributing to the site revenue. This research project is exploring which factors contribute to users online engagement. To do that I will use the data of an online blog on the news website. The author of this blog is posting articles about interpersonal relationships every work day (Mon- Fri). The posts are formulated as a letter from a reader with the situation and a question about relationships. The author gives an advice about the situation. Website readers are free to comment under each post, but cannot make their own posts.

All post methadata and comments are public. They are saved by the website and available for the analysis. Using this data set, I will explore how readers’ engagement connected with blogs’s author engagement, site comments’, web source of readers and negative behaviors online.

Research Question and hypothesis.

RQ: Which factors influence user’s engagement in online blog?

DV: My dependent construct is “user’s engagement”, I will measure users’ engagement at the level of individual post, using the following metrics:

1. Exit rate or “bounces”. When the visitor is coming to the page and then leaving, i.e. not opening other pages on this website. this will represent all readers.

2. Number of comments - is engagement metric, representing only loyal readers, who have created an account and signed in.

IV: My main independent variables are

  1. Post popularity. Unique users - number of unique people viewed the post.
  2. Source of the readers - which web page the reader came from. This is represented by 6 different variables, each is continuous type.
  3. Mood of the conversation , derivative continuous variable calculated as the ratio of “likes” to all emotions (sum of thumbs up and thumbs down), with the range 0-1.
  4. Blocked and flagged comments.
  5. Number of author’s comments.
  6. Weekday of the post.

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ tibble  3.1.8     ✔ purrr   1.0.1
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.4     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: 'gridExtra'


The following object is masked from 'package:dplyr':

    combine



Please cite as: 


 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 



Attaching package: 'lubridate'


The following objects are masked from 'package:base':

    date, intersect, setdiff, union

Data source and description.

To answer my research question I will use two datasets. The first data set has information about all comments associated with each post by post ID. The second data set is analytics data for the web page. It contains one post per row and variables describe each post as a whole without breaking down to the comment level.

In this project I will analyze posts for January 2022 - March 2023. Here is the list of variables in each data set:

Comments data:

Code
getwd()
[1] "C:/Users/Diana/OneDrive - University Of Massachusetts Medical School/Documents/R/R working directory/DACSS/603/603_Spring_2023/posts"
Code
# First, I will load the data set with the comment level data:
raw <- as_tibble (read_csv("C:\\Users\\Diana\\OneDrive - University Of Massachusetts Medical School\\Documents\\R\\R working directory\\DACSS\\603\\my study files for dacss603\\globe\\ data.2021.plus.csv"))
Rows: 105136 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): content, user_name, display_name, image_url, email, approved
dbl  (7): message_id, post_id, user_id, parent, absolute_likes, absolute_dis...
lgl  (3): email_verified, created_at, private_profile
dttm (1): written_at

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
comments.data<-raw 
colnames (comments.data)
 [1] "content"           "message_id"        "post_id"          
 [4] "user_id"           "user_name"         "display_name"     
 [7] "image_url"         "email"             "email_verified"   
[10] "created_at"        "private_profile"   "approved"         
[13] "written_at"        "parent"            "absolute_likes"   
[16] "absolute_dislikes" "comment.year"     
Code
head(comments.data$written_at)
[1] "2021-01-01 08:28:44 UTC" "2021-01-01 08:57:08 UTC"
[3] "2021-01-01 10:36:58 UTC" "2021-01-01 11:13:13 UTC"
[5] "2021-01-01 12:27:44 UTC" "2021-01-01 12:51:48 UTC"
Code
comments.data <-comments.data%>%
              mutate(com.year = format(written_at,format = "%Y" ))
# range(comments.data$com.year)
# dim(comments.data)
str(comments.data)
tibble [105,136 × 18] (S3: tbl_df/tbl/data.frame)
 $ content          : chr [1:105136] "08:15  Dawn of the 21st year of the 21st century<br/>&quot;21st Century Schizoid Man&quot;  on the BOSE" "Liz:  😄" "Happy New Year to all y'all. And Ms. G, thank you for letting me play for another year playing in your sandbox." "Blog needs a new name - Hate Letters" ...
 $ message_id       : num [1:105136] 1.15e+08 1.15e+08 1.15e+08 1.15e+08 1.15e+08 ...
 $ post_id          : num [1:105136] 27071015 27071015 27071009 27071009 27071009 ...
 $ user_id          : num [1:105136] 2889100 1855822 5133556 5156343 5560421 ...
 $ user_name        : chr [1:105136] "Lefty49" "JacquiSmith" "Bzznlike-crazyman" "--SnowMan--" ...
 $ display_name     : chr [1:105136] "Lefty49" "JacquiSmith" "Bzznlike-crazyman" "--SnowMan--" ...
 $ image_url        : chr [1:105136] "https://u.o0bc.com/avatars/48/121/77/5077296.png?35" "https://u.o0bc.com/avatars/41/194/84/5554729.png?47" "https://u.o0bc.com/avatars/85/175/84/5549909.png?9" "https://u.o0bc.com/avatars/64/191/84/5553984.png?5" ...
 $ email            : chr [1:105136] "danwalker1949@aol.com" "amcr1124@comcast.net" "eflynn105@yahoo.com" "markedoc@yahoo.com" ...
 $ email_verified   : logi [1:105136] FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ created_at       : logi [1:105136] NA NA NA NA NA NA ...
 $ private_profile  : logi [1:105136] FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ approved         : chr [1:105136] "approved" "approved" "approved" "approved" ...
 $ written_at       : POSIXct[1:105136], format: "2021-01-01 08:28:44" "2021-01-01 08:57:08" ...
 $ parent           : num [1:105136] NA 1.15e+08 NA NA 1.15e+08 ...
 $ absolute_likes   : num [1:105136] 2 0 14 7 7 7 4 3 2 4 ...
 $ absolute_dislikes: num [1:105136] 1 0 1 0 1 1 1 1 1 0 ...
 $ comment.year     : num [1:105136] 2021 2021 2021 2021 2021 ...
 $ com.year         : chr [1:105136] "2021" "2021" "2021" "2021" ...
Code
dim(comments.data)
[1] 105136     18
Code
comments.data <- comments.data%>%
  filter(written_at  >= "2022-01-01")

Post data:

Code
# Second, loadng post-level data :
merged <- as_tibble (read_csv("C:\\Users\\Diana\\OneDrive - University Of Massachusetts Medical School\\Documents\\R\\R working directory\\DACSS\\603\\my study files for dacss603\\globe\\data.merged.csv"))
Rows: 535 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (3): Letter, Exit rate, post.month
dbl  (18): Page views, Search + amp referral visits, Direct (non-email) refe...
num   (1): Visits when post was on LL HP
date  (1): post.date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
# colnames(merged)
str(merged)
tibble [535 × 23] (S3: tbl_df/tbl/data.frame)
 $ post.date                         : Date[1:535], format: "2021-01-04" "2021-01-05" ...
 $ Letter                            : chr [1:535] "love letters | blog | I don&'t want him to let me go" "love letters | blog | Should I be working to get her back?" "love letters | blog | I&'m sick of thinking about the breakup" "love letters | blog | I don&'t want to be selfish about 2020 Christmas" ...
 $ Page views                        : num [1:535] 14830 12067 11921 12817 12866 ...
 $ Search + amp referral visits      : num [1:535] 1005 822 793 765 934 ...
 $ Direct (non-email) referral visits: num [1:535] 10005 7948 7997 8746 8400 ...
 $ Visits                            : num [1:535] 12998 10391 10331 11040 10818 ...
 $ Uniques                           : num [1:535] 11453 8985 8917 9662 8564 ...
 $ Other website referral visits     : num [1:535] 129 232 106 152 139 153 165 157 85 151 ...
 $ Social referral visits            : num [1:535] 457 94 156 106 113 368 90 171 84 129 ...
 $ BDC referral visits               : num [1:535] 7087 5312 5185 5901 4570 ...
 $ Visits when post was on LL HP     : num [1:535] 3167 3323 2929 3056 4795 ...
 $ Exits                             : num [1:535] 9679 7549 7613 8188 7900 ...
 $ Exit rate                         : chr [1:535] "74%" "73%" "74%" "74%" ...
 $ dup                               : num [1:535] 0 0 0 0 0 0 0 0 0 0 ...
 $ post_id                           : num [1:535] 27071003 27070997 27070991 27070985 27070979 ...
 $ n.comments                        : num [1:535] 267 207 266 372 319 267 337 154 179 375 ...
 $ post.year                         : num [1:535] 2021 2021 2021 2021 2021 ...
 $ post.month                        : chr [1:535] "01" "01" "01" "01" ...
 $ post.likes                        : num [1:535] 1440 864 936 1497 1145 ...
 $ post.dislikes                     : num [1:535] 72 106 96 520 154 188 150 62 106 150 ...
 $ post.total.likes                  : num [1:535] 1512 970 1032 2017 1299 ...
 $ blocked.sum                       : num [1:535] 1 2 3 9 2 3 4 0 3 3 ...
 $ pct.positive                      : num [1:535] 95.2 89.1 90.7 74.2 88.1 ...
Code
dim(merged)
[1] 535  23
Code
# Limiting the dataset to 2022: 
merged <- merged%>%
  filter(post.date  >=  "2022-01-01")
dim(merged)
[1] 287  23
Code
# Due to dataset of comments having more data than post dataset,  I will cut them to match: 
merged <- merged %>%
  filter (!is.na(merged$pct.positive))

To begin, I will review available variables and evaluate to identify a metric for each construct in my study.

1.DV: engagement.

Exit rate

This variable is measuring how many people visited the page and then left the website after the first view. This metric is the best measure of engagement for all users, as it represents the first step after being exposed to the post - either quitting the site or remaining on the site.

Here can see the distribution of this variable :

Code
# str(merged)
merged <- merged %>%
  mutate(e.rate = Exits/`Page views`)
select (merged, `Exit rate`, e.rate)
# A tibble: 287 × 2
   `Exit rate` e.rate
   <chr>        <dbl>
 1 76%          0.675
 2 79%          0.703
 3 86%          0.791
 4 68%          0.600
 5 75%          0.665
 6 76%          0.678
 7 83%          0.751
 8 88%          0.804
 9 91%          0.836
10 72%          0.620
# … with 277 more rows
Code
# merged$`Exit rate` <- as.numeric(sub("%", "", merged$`Exit rate`)) / 100

ggplot(data=merged, mapping=aes(x=e.rate))+
    geom_histogram(binwidth = 0.01, fill = "sandybrown", alpha = 0.7)+
  labs(title="Exit rate histogram")

Distribution of exit rate over time:

Code
# creating  year_month  variable 
merged$post.month <-as.numeric(merged$post.month)
merged$year_month <- paste0(merged$post.year, "-", sprintf("%02d", merged$post.month))

ggplot(merged, mapping = aes(x=year_month , y=`e.rate`, fill=year_month ))+
  geom_boxplot() +
  labs(title = "distribution of `Exit rate` per post ", y = "Exit rate" , x="Month")+ 
     theme(axis.text.x = element_text(angle = 45, hjust = 1))

Number of comments:

engagement metric for loyal readers.

Commenting requires user to log in, which is an indicator of greater engagement of an individual user. Therefore this variable represents engagement of a subset of users,loyal readers, who have created an account.

Code
# merged$year_month 
ggplot(data=merged, mapping=aes(x=n.comments))+
  geom_histogram(fill = "seagreen4")+
  labs(title="Number of comments histogram")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

And change in the distribution over time:

Code
ggplot(merged, mapping = aes(x=year_month , y=n.comments, fill=year_month ))+
  geom_boxplot() +
  labs(title = "distribution of comments per post ", y = "Number of comments" )+
  scale_y_continuous(breaks = seq (from=0, to= 10000, by= 100)) + 
     theme(axis.text.x = element_text(angle = 45, hjust = 1))

2.IV. Independent variables:

2.1.Popularity

“Uniques” variable represents number of unique people who came to the page and viewed it at least once, his metric represents popularity of the post. It has Poisson distribution:

Code
# colnames(merged)
ggplot(data=merged, mapping=aes(x=Uniques))+
    geom_histogram(fill = "purple2")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can see a long tail on the right, showing that there is a number of posts who are way more popular than the majority of the sample.

Distribution of popularity over time also showing significant variance:

Code
ggplot( data=merged, mapping=aes(y=Uniques, x=merged$year_month))+
          geom_boxplot()+
  labs(title="Number of unique viewers per month", x="Month", y="Number of unique viewers")+ 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Warning: Use of `merged$year_month` is discouraged.
ℹ Use `year_month` instead.

Next plot is illustrating connection between popularity and engagement metrics:

Code
pairs(subset (merged, select=c(Uniques, e.rate, n.comments )))

We can see that popularity strongly correlated with exit rate, where an increase in popularity causes decrease of engagement. It is not as strongly correlated with number of comments. Engagement rate and number of comments don’t show obvious pattern between each other.

2.2. Referral sources.

The website analytics provides information on where the viewers are coming from to the blog page. For example, if people clicked on the blog link posted on FaceBook, that would be referral from social media. If people clicked on the blog link within BDC website, that would be “BDC referral visit”.

There are 5 sources of referrals, each corresponding with a variable in the data set. Variable’s value is a number of visits from this referral source.

"Search + amp referral visits"
"Direct (non-email) referral visits"
"Other website referral visits"
"Social referral visits"
"BDC referral visits"
"Visits when post was on LL HP" 
Code
#renaming variables for convenience: 
merged<-merged%>%
  rename(google ="Search + amp referral visits",
         direct ="Direct (non-email) referral visits",
         other.web = "Other website referral visits",
          social= "Social referral visits",
          bdc= "BDC referral visits",
          ll= "Visits when post was on LL HP" )

ggplot(merged, mapping=aes(x=post.date))+
  geom_point(aes(y=google), color="red")+
  geom_point(aes(y=direct), color="green")+
  geom_point(aes(y=other.web), color="yellow")+
  geom_point(aes(y=social), color="purple")+
  geom_point(aes(y=bdc), color="blue")+
  geom_point(aes(y=ll), color="pink")  +
  labs(title = "Referral sources per post ", y = "Number of referrals" , x="Post")+ 
  scale_color_manual(values = c("red", "green", "yellow", "purple", "blue", "pink"),
                     labels = c("Google", "Direct", "Other Web", "Social", "Bdc", "LL"))

This graph is showing of “Search + amp referral visits” have high variability. Other referral sources range is much smaller.

Code
ggplot(merged, mapping=aes(x=direct , y=e.rate))+
  geom_point()

2.3. Post weekdays

Calculating weekdays variable:

Code
# colnames(comments.data)
# merged <- merge( merged, post.date , by = "post_id", all = TRUE)

merged <-merged %>%
  mutate (weekday = wday(post.date, label = TRUE))
table(merged$weekday)

Sun Mon Tue Wed Thu Fri Sat 
  0  49  61  60  59  58   0 

The posts are being published Monday through Friday (with rare exceptions). No posts are published on the weekend. Below is a plot for exit rate and number of comments’ distributions per week day

Code
ggplot (merged,  mapping=aes(x=weekday, y=e.rate) )+
  geom_boxplot()+
  geom_point()+
  labs(title="Exit rate per weekday")

Code
ggplot (merged,  mapping=aes(x=weekday, y=n.comments) )+
  geom_boxplot()+
  geom_point()+
  labs(title="Number of comments per weekday")

We can see that exit rate is overall lower on Tuesday, and number of comments overall highest on Fridays.

2.4. Authors comments

To identify, how much the author of the blog is engaged in the post, I will create an additional variable derived from a user_name field and review the distribution of autor’s comments :

Code
# str(merged)
comments.data$user_name<-  ifelse (is.na(comments.data$user_name), 0, comments.data$user_name)
comments.data$author<-  ifelse (comments.data$user_name=="MeredithGoldstein", 1, 0)

comments.grouped <-comments.data %>%
  group_by(post_id)%>%
  summarize(n.comments=n(),
            author.sum = sum(author))

# Comments data contains rows that dont actually reporesent posts, and were crearted by web support team for troubleshooting. I need to remove these rows. They typically have very low number of comments
comments.grouped <-comments.grouped %>%
filter(n.comments >100)  # removing invalid posts created by the  website management team.

comments.grouped <-comments.grouped %>%
  select(post_id, author.sum)
 dim(comments.grouped)
[1] 287   2
Code
#adding author.sum to main data set: 
merged <- merge( merged , comments.grouped, by = "post_id", all = TRUE)

 ggplot (merged,  mapping=aes(x=author.sum) )+
  geom_histogram()+
  labs(title="Authors comments distribution")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Authors comments per engagement metrics:

Code
ggplot (merged,  mapping=aes(x=author.sum, y=e.rate ))+
  geom_point()+
  labs(title="Exit rate per author.sum")

Code
ggplot (merged,  mapping=aes(x=author.sum, y=n.comments) )+
  geom_point()+
  labs(title="Number of comments per author.sum")

This graph shows, that majority of posts have no author’s comments.

2.5. Mood of the post.

This is a numerical variable, calculated as percentage of “thumbs up” from all likes (both “thumbs up” and “thumbs down”).

Code
# colnames(merged)
 ggplot (merged,  mapping=aes(x=pct.positive) )+
  geom_histogram(fill = "springgreen3")+
  labs(title="Mood of the post (pct.positive) distribution")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plotting mood pf the post against engagement metrics:

Code
 ggplot (merged,  mapping=aes(x=pct.positive, y=e.rate ))+
  geom_point()+
  labs(title="Exit rate per pct.positive")

Code
ggplot (merged,  mapping=aes(x=pct.positive, y=n.comments) )+
  geom_point()+
  labs(title="Number of comments per pct.positive")

Distribution of mood over time:

Code
 ggplot(merged, mapping = aes(x=year_month , y=pct.positive, fill=year_month ))+
  geom_boxplot() +
  labs(title = "distribution of Mood per post ", y = "Mood" , x="Month")+ 
     theme(axis.text.x = element_text(angle = 45, hjust = 1))

2.6. Blocked comments per post.

Now I will visualize amount of blocked comments per post. It also has Poisson distribution.

Code
 ggplot (merged,  mapping=aes(x=blocked.sum) )+
  geom_histogram(fill = "brown")+
  labs(title="blocked.sum distribution")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Plotting blocked comments against engagement metrics:

Code
 ggplot (merged,  mapping=aes(x=blocked.sum, y=e.rate ))+
  geom_point()+
  labs(title="Exit rate per blocked.sum")

Code
ggplot (merged,  mapping=aes(x=blocked.sum, y=n.comments) )+
  geom_point()+
  labs(title="Number of comments per blocked.sum")

Distribution of blocked comments over time:

Code
ggplot(merged, mapping = aes(x=year_month , y=blocked.sum, fill=year_month ))+
  geom_boxplot() +
  labs(title = "Number Blocked comments  per post ", y = "Number of blocked comments per post" , x="Month")+ 
     theme(axis.text.x = element_text(angle = 45, hjust = 1))

3. Creating a model for exit rate - all users engagement.

3.1 Basic model.

I will start with entering all of the independent variable in the model and then eliminating ones that are not significant.

Code
# colnames(merged)
model.1 <- lm(e.rate ~ log(Uniques) +google +direct +other.web +social +bdc + ll + author.sum +n.comments +blocked.sum + pct.positive +weekday , data = merged)

merged<- merged %>%
     mutate (author.sum =author.sum +0.001,
          blocked.sum =blocked.sum +0.001)

# logging variables with Poisson distribution to see if that increases significantce.
model.2 <- lm(e.rate ~ log(Uniques) +google +direct +other.web +social +bdc + ll + log(author.sum) +n.comments + log(blocked.sum) + pct.positive +weekday , data = merged)

model.3 <- lm(e.rate ~ log(Uniques)+ google +direct +other.web +social +bdc  +weekday , data = merged)

stargazer( model.1, model.2, model.3,  type = 'text')

=================================================================================================
                                                 Dependent variable:                             
                    -----------------------------------------------------------------------------
                                                       e.rate                                    
                               (1)                       (2)                       (3)           
-------------------------------------------------------------------------------------------------
log(Uniques)                0.189***                  0.190***                  0.191***         
                             (0.009)                   (0.009)                   (0.009)         
                                                                                                 
google                     -0.00000***               -0.00000***               -0.00000***       
                            (0.00000)                 (0.00000)                 (0.00000)        
                                                                                                 
direct                     -0.00003***               -0.00003***               -0.00002***       
                            (0.00000)                 (0.00000)                 (0.00000)        
                                                                                                 
other.web                  -0.00001***               -0.00001***               -0.00001***       
                            (0.00000)                 (0.00000)                 (0.00000)        
                                                                                                 
social                     -0.00004**                -0.00004**                -0.00004**        
                            (0.00002)                 (0.00002)                 (0.00002)        
                                                                                                 
bdc                        0.00002***                0.00002***                0.00002***        
                            (0.00000)                 (0.00000)                 (0.00000)        
                                                                                                 
ll                           0.00000                   0.00000                                   
                            (0.00000)                 (0.00000)                                  
                                                                                                 
author.sum                   -0.002                                                              
                             (0.002)                                                             
                                                                                                 
log(author.sum)                                        -0.001                                    
                                                      (0.0005)                                   
                                                                                                 
n.comments                  -0.00002                  -0.00002                                   
                            (0.00003)                 (0.00003)                                  
                                                                                                 
blocked.sum                  -0.0001                                                             
                            (0.0002)                                                             
                                                                                                 
log(blocked.sum)                                       0.0002                                    
                                                      (0.0004)                                   
                                                                                                 
pct.positive                -0.00001                   0.00001                                   
                            (0.0002)                  (0.0002)                                   
                                                                                                 
weekday.L                    0.006*                    0.006*                    0.007**         
                             (0.003)                   (0.003)                   (0.003)         
                                                                                                 
weekday.Q                   0.012***                  0.012***                  0.014***         
                             (0.003)                   (0.003)                   (0.003)         
                                                                                                 
weekday.C                     0.003                     0.003                     0.004          
                             (0.003)                   (0.003)                   (0.003)         
                                                                                                 
weekday4                     0.006**                   0.007**                  0.007***         
                             (0.003)                   (0.003)                   (0.003)         
                                                                                                 
Constant                    -1.030***                 -1.038***                 -1.047***        
                             (0.086)                   (0.086)                   (0.082)         
                                                                                                 
-------------------------------------------------------------------------------------------------
Observations                   287                       287                       287           
R2                            0.918                     0.918                     0.917          
Adjusted R2                   0.914                     0.914                     0.914          
Residual Std. Error     0.020 (df = 271)          0.020 (df = 271)          0.020 (df = 276)     
F Statistic         203.411*** (df = 15; 271) 203.257*** (df = 15; 271) 306.564*** (df = 10; 276)
=================================================================================================
Note:                                                                 *p<0.1; **p<0.05; ***p<0.01

Elimination of non-significant factors slightly impacted coefficients of significant variables, and did not change R^2 or adjusted R^2 much.

My next step is to diagnose this model:

3.2. Diagnostic of linear models:

Code
model.3 <- lm(e.rate ~ log(Uniques)+ google +direct +other.web +social +bdc  +weekday , data = merged)

plot(model.3, which = 1:6)

We can see the issue with distribution of residuals in “Residuals vs fitted” plot. This suggests non-linear relationship between variables.

I will transform some referral variables to see if that gives me a better model:

Code
#original model
model.3 <- lm(e.rate ~ log(Uniques)+ google +direct +other.web +social +bdc  +weekday , data = merged)
# Logging Google referrals
model.4 <- lm(e.rate ~ log(Uniques)+log(google) +direct +other.web +social +bdc  +weekday , data = merged)

stargazer( model.3, model.4, model.5,    type = 'text')
Error in .stargazer.wrap(..., type = type, title = title, style = style, : object 'model.5' not found

This table showing better fit when google variable is logged. Uniques’ coefficient also changed significantly as I am modified google variable. That might indicate interaction between them, which I will explore later in this paper.

Now, I will log the rest of referral variables.

Code
model.4 <- lm(e.rate ~ log(Uniques)+log(google) +direct +other.web +social +bdc  +weekday , data = merged)
model.5 <- lm(e.rate ~ log(Uniques)+log(google) +log(direct) +log(other.web) +log(social) +log(bdc)  +weekday , data = merged)
model.6 <- lm(e.rate ~ log(Uniques)+log(google) +log(direct) +other.web +social +log(bdc)  +weekday , data = merged)

stargazer( model.4,model.5,model.6,  type = 'text')

=================================================================
                                      Dependent variable:        
                               ----------------------------------
                                             e.rate              
                                   (1)        (2)         (3)    
-----------------------------------------------------------------
log(Uniques)                    0.106***    0.096***   0.112***  
                                 (0.012)    (0.013)     (0.013)  
                                                                 
log(google)                     0.016***    0.019***   0.014***  
                                 (0.004)    (0.005)     (0.005)  
                                                                 
direct                         -0.00003***                       
                                (0.00000)                        
                                                                 
other.web                      -0.00001***            -0.00001***
                                (0.00000)              (0.00000) 
                                                                 
social                         -0.00003**              -0.00003  
                                (0.00002)              (0.00002) 
                                                                 
bdc                            0.00003***                        
                                (0.00000)                        
                                                                 
log(direct)                                -0.298***   -0.307*** 
                                            (0.029)     (0.027)  
                                                                 
log(other.web)                               -0.001              
                                            (0.002)              
                                                                 
log(social)                                  -0.005              
                                            (0.003)              
                                                                 
log(bdc)                                    0.218***   0.220***  
                                            (0.018)     (0.017)  
                                                                 
weekday.L                        0.008**    0.009***   0.010***  
                                 (0.003)    (0.003)     (0.003)  
                                                                 
weekday.Q                       0.016***    0.018***   0.019***  
                                 (0.003)    (0.003)     (0.003)  
                                                                 
weekday.C                        0.005*     0.006**     0.006**  
                                 (0.003)    (0.003)     (0.003)  
                                                                 
weekday4                        0.008***    0.009***   0.008***  
                                 (0.003)    (0.003)     (0.003)  
                                                                 
Constant                        -0.384***   0.453***   0.396***  
                                 (0.073)    (0.094)     (0.093)  
                                                                 
-----------------------------------------------------------------
Observations                       287        287         287    
R2                                0.913      0.901       0.906   
Adjusted R2                       0.910      0.898       0.903   
Residual Std. Error (df = 276)    0.020      0.021       0.021   
F Statistic (df = 10; 276)     291.392***  252.106*** 266.805*** 
=================================================================
Note:                                 *p<0.1; **p<0.05; ***p<0.01

All referral sources except “other.web” and “social” show better fit when logged. Moving forward, I will use these two variables not logeed:

Code
model.7 <- lm(e.rate ~ log(Uniques)+log(google) +log(direct) +other.web+social +log(bdc)  +weekday , data = merged)

stargazer(model.7,  type = 'text')

===============================================
                        Dependent variable:    
                    ---------------------------
                              e.rate           
-----------------------------------------------
log(Uniques)                 0.112***          
                              (0.013)          
                                               
log(google)                  0.014***          
                              (0.005)          
                                               
log(direct)                  -0.307***         
                              (0.027)          
                                               
other.web                   -0.00001***        
                             (0.00000)         
                                               
social                       -0.00003          
                             (0.00002)         
                                               
log(bdc)                     0.220***          
                              (0.017)          
                                               
weekday.L                    0.010***          
                              (0.003)          
                                               
weekday.Q                    0.019***          
                              (0.003)          
                                               
weekday.C                     0.006**          
                              (0.003)          
                                               
weekday4                     0.008***          
                              (0.003)          
                                               
Constant                     0.396***          
                              (0.093)          
                                               
-----------------------------------------------
Observations                    287            
R2                             0.906           
Adjusted R2                    0.903           
Residual Std. Error      0.021 (df = 276)      
F Statistic          266.805*** (df = 10; 276) 
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

Diagnostics of model.7:

Code
par(mfrow = c(2,3))
plot(model.7, which = 1:6)

We still observe curvi-linear distribution of residuals.

Now I will explore an interaction with popularity (variable “Uniques”)

Code
# last model
model.7 <- lm(e.rate ~ log(Uniques)+log(google) +log(direct) +other.web +log(bdc)+social  +weekday , data = merged)

# adding interaction: 
model.8 <- lm(e.rate ~ log(Uniques)*log(google) +log(direct) +other.web +social +log(bdc)  +weekday , data = merged)
stargazer( model.7,model.8, type = 'text')

============================================================================
                                         Dependent variable:                
                         ---------------------------------------------------
                                               e.rate                       
                                    (1)                       (2)           
----------------------------------------------------------------------------
log(Uniques)                     0.112***                  0.372***         
                                  (0.013)                   (0.042)         
                                                                            
log(google)                      0.014***                  0.200***         
                                  (0.005)                   (0.029)         
                                                                            
log(direct)                      -0.307***                 -0.293***        
                                  (0.027)                   (0.026)         
                                                                            
other.web                       -0.00001***               -0.00001***       
                                 (0.00000)                 (0.00000)        
                                                                            
log(bdc)                         0.220***                  0.181***         
                                  (0.017)                   (0.017)         
                                                                            
social                           -0.00003                 -0.00003**        
                                 (0.00002)                 (0.00002)        
                                                                            
weekday.L                        0.010***                  0.009***         
                                  (0.003)                   (0.003)         
                                                                            
weekday.Q                        0.019***                  0.017***         
                                  (0.003)                   (0.003)         
                                                                            
weekday.C                         0.006**                   0.006**         
                                  (0.003)                   (0.003)         
                                                                            
weekday4                         0.008***                  0.007***         
                                  (0.003)                   (0.003)         
                                                                            
log(Uniques):log(google)                                   -0.021***        
                                                            (0.003)         
                                                                            
Constant                         0.396***                  -1.734***        
                                  (0.093)                   (0.345)         
                                                                            
----------------------------------------------------------------------------
Observations                        287                       287           
R2                                 0.906                     0.918          
Adjusted R2                        0.903                     0.915          
Residual Std. Error          0.021 (df = 276)          0.020 (df = 275)     
F Statistic              266.805*** (df = 10; 276) 281.232*** (df = 11; 275)
============================================================================
Note:                                            *p<0.1; **p<0.05; ***p<0.01

Adding interaction between popularity and google referrals significantly improved the model: interaction term has negative correlation with dependent variable, it also made “social” referrals significant, and improved models R^2 and adjusted R^2.

It also significantly changed distribution of residuals: now distribution is equally spread around 0.

Code
par(mfrow = c(1,2))
plot(model.7, which = 1)
plot(model.8, which = 1)

Lets review other diagnostic plots:

Code
par(mfrow = c(1,2))
plot(model.7, which = 2)
plot(model.8, which = 2)

Code
par(mfrow = c(1,2))
plot(model.7, which = 3)
plot(model.8, which = 3)

Code
par(mfrow = c(2,2))
plot(model.7, which = 4)
plot(model.8, which = 4)

par(mfrow = c(2,2))

Code
plot(model.7, which = 5)
plot(model.8, which = 5)

par(mfrow = c(2,2))

Code
plot(model.7, which = 6)
plot(model.8, which = 6)

We can see from both models, that variable 231 is an outlier that significantly impacts the model. I will remove that observation and re-evaluate the model:

Code
merged.old <- merged
dim(merged)
[1] 287  27
Code
dim(merged.old)
[1] 287  27
Code
merged <- merged[-c(235), ]
model.8 <- lm(e.rate ~ log(Uniques)*log(google) +log(direct) +other.web +social +log(bdc)  +weekday , data = merged.old)
model.9 <- lm(e.rate ~ log(Uniques)*log(google) +log(direct) +other.web +social +log(bdc)  +weekday , data = merged)

stargazer( model.8,model.9, type = 'text')

============================================================================
                                         Dependent variable:                
                         ---------------------------------------------------
                                               e.rate                       
                                    (1)                       (2)           
----------------------------------------------------------------------------
log(Uniques)                     0.372***                  0.374***         
                                  (0.042)                   (0.043)         
                                                                            
log(google)                      0.200***                  0.201***         
                                  (0.029)                   (0.029)         
                                                                            
log(direct)                      -0.293***                 -0.293***        
                                  (0.026)                   (0.026)         
                                                                            
other.web                       -0.00001***               -0.00001***       
                                 (0.00000)                 (0.00000)        
                                                                            
social                          -0.00003**                -0.00003**        
                                 (0.00002)                 (0.00002)        
                                                                            
log(bdc)                         0.181***                  0.181***         
                                  (0.017)                   (0.017)         
                                                                            
weekday.L                        0.009***                  0.009***         
                                  (0.003)                   (0.003)         
                                                                            
weekday.Q                        0.017***                  0.017***         
                                  (0.003)                   (0.003)         
                                                                            
weekday.C                         0.006**                   0.006**         
                                  (0.003)                   (0.003)         
                                                                            
weekday4                         0.007***                  0.007***         
                                  (0.003)                   (0.003)         
                                                                            
log(Uniques):log(google)         -0.021***                 -0.022***        
                                  (0.003)                   (0.003)         
                                                                            
Constant                         -1.734***                 -1.751***        
                                  (0.345)                   (0.345)         
                                                                            
----------------------------------------------------------------------------
Observations                        287                       286           
R2                                 0.918                     0.919          
Adjusted R2                        0.915                     0.915          
Residual Std. Error          0.020 (df = 275)          0.020 (df = 274)     
F Statistic              281.232*** (df = 11; 275) 281.142*** (df = 11; 274)
============================================================================
Note:                                            *p<0.1; **p<0.05; ***p<0.01
Code
par(mfrow = c(1,2))
plot(model.8, which = 1)
plot(model.9, which = 1)

Code
par(mfrow = c(1,2))
plot(model.8, which = 2)
plot(model.9, which = 2)

Code
par(mfrow = c(1,2))
plot(model.8, which = 3)
plot(model.9, which = 3)

Code
par(mfrow = c(2,2))
plot(model.8, which = 4)
plot(model.9, which = 4)

par(mfrow = c(2,2))

Code
plot(model.8, which = 5)
plot(model.9, which = 5)

par(mfrow = c(2,2))

Code
plot(model.8, which = 6)
plot(model.9, which = 6)

New, the variable “other.web” became insignificant, but model fit improved.

3.3. Improved model.

I will check whether “direct” referral sources have an interaction with popularity:

Code
#last model: 
model.9 <- lm(e.rate ~ log(Uniques)*log(google) +log(direct) +other.web +social +log(bdc)  +weekday , data = merged)
# adding interaction to direct.
model.10 <- lm(e.rate ~ log(google) +log(Uniques)* log(direct) +other.web +social +log(bdc)  +weekday , data = merged)

stargazer( model.9, model.10, type = 'text' )

===========================================================
                                   Dependent variable:     
                               ----------------------------
                                          e.rate           
                                    (1)            (2)     
-----------------------------------------------------------
log(Uniques)                      0.374***      0.356***   
                                  (0.043)        (0.103)   
                                                           
log(google)                       0.201***      0.013***   
                                  (0.029)        (0.005)   
                                                           
log(direct)                      -0.293***       -0.037    
                                  (0.026)        (0.117)   
                                                           
other.web                       -0.00001***    -0.00001*** 
                                 (0.00000)      (0.00000)  
                                                           
social                           -0.00003**     -0.00003*  
                                 (0.00002)      (0.00002)  
                                                           
log(bdc)                          0.181***      0.212***   
                                  (0.017)        (0.017)   
                                                           
weekday.L                         0.009***      0.010***   
                                  (0.003)        (0.003)   
                                                           
weekday.Q                         0.017***      0.018***   
                                  (0.003)        (0.003)   
                                                           
weekday.C                         0.006**        0.005*    
                                  (0.003)        (0.003)   
                                                           
weekday4                          0.007***      0.007***   
                                  (0.003)        (0.003)   
                                                           
log(Uniques):log(google)         -0.022***                 
                                  (0.003)                  
                                                           
log(Uniques):log(direct)                        -0.026**   
                                                 (0.011)   
                                                           
Constant                         -1.751***      -2.085**   
                                  (0.345)        (1.044)   
                                                           
-----------------------------------------------------------
Observations                        286            286     
R2                                 0.919          0.908    
Adjusted R2                        0.915          0.905    
Residual Std. Error (df = 274)     0.020          0.021    
F Statistic (df = 11; 274)       281.142***    246.519***  
===========================================================
Note:                           *p<0.1; **p<0.05; ***p<0.01

Diagnostics:

Code
par(mfrow = c(1,3))
plot(model.9, which = 1)
plot(model.10, which = 1)

We can see model 9 ( with interaction between google and Uniques) shows better residuals vs fitted plot.

Code
par(mfrow = c(1,3))
plot(model.9, which = 2)
plot(model.10, which = 2)

par(mfrow = c(1,3))

Code
plot(model.9, which = 3)
plot(model.10, which = 3)

par(mfrow = c(1,3))

Code
plot(model.9, which = 4)
plot(model.10, which = 4)

par(mfrow = c(1,3))

Code
plot(model.9, which = 5)
plot(model.10, which = 5)
plot(model.11, which = 5)
Error in plot(model.11, which = 5): object 'model.11' not found
Code
par(mfrow = c(1,3))

Code
plot(model.9, which = 6)
plot(model.10, which = 6)

Both models have similar R^2 and adjusted R^2.

Calculation of AIC and BIC:

Code
AIC(model.9) 
[1] -1425.685
Code
AIC(model.10) 
[1] -1391.349
Code
BIC(model.9) 
[1] -1378.158
Code
BIC(model.10) 
[1] -1343.821

AIC and BIC are very close.

To summarize, models # 9 and 10 demonstrated a good fit with high R^2 value and good results in diagnostic plots. Each of them is visualizing interaction between popularity and referral variable.

3.4. Visualizing interaction:

Moderating impact of referral source on relationship between popularity and engagement:

Code
interact_plot(model.9, pred =Uniques , modx =  google, plot.points = TRUE, data=merged )

This graph shows that for higher levels of Google impact of popularity on exit rate will be lower (i.e. better engagement for the same level of popularity) .

Moderating impact of direct referrals on relationship between popularity and engagement:

Code
interact_plot(model.9, pred = Uniques, modx = direct, plot.points = TRUE )
Using data merged from global environment. This could cause incorrect
results if merged has been altered since the model was fit. You can
manually provide the data to the "data =" argument.

Popularity generally decreases engagement. However, with higher levels of direct , impact of popularity on exit rate will be lower.

4. Model for N.comments - loyal readers engagement:

4.1. Basic model for n.comments

Now as I have created a model for exit rate, I will do the same for n.comments as dependent variable, which represents the measurement of engagement of loyal readers.

Code
colnames(merged)
 [1] "post_id"          "post.date"        "Letter"           "Page views"      
 [5] "google"           "direct"           "Visits"           "Uniques"         
 [9] "other.web"        "social"           "bdc"              "ll"              
[13] "Exits"            "Exit rate"        "dup"              "n.comments"      
[17] "post.year"        "post.month"       "post.likes"       "post.dislikes"   
[21] "post.total.likes" "blocked.sum"      "pct.positive"     "e.rate"          
[25] "year_month"       "weekday"          "author.sum"      
Code
summary(lm(n.comments ~ log(Uniques)+ author.sum + google +direct +other.web +social +bdc + ll + blocked.sum + pct.positive +weekday , data = merged))

Call:
lm(formula = n.comments ~ log(Uniques) + author.sum + google + 
    direct + other.web + social + bdc + ll + blocked.sum + pct.positive + 
    weekday, data = merged)

Residuals:
    Min      1Q  Median      3Q     Max 
-80.884 -23.999  -5.505  22.392 151.869 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)   
(Intercept)   8.515e+01  1.626e+02   0.524  0.60083   
log(Uniques) -9.224e+00  1.672e+01  -0.552  0.58172   
author.sum    4.589e-01  3.273e+00   0.140  0.88859   
google        3.254e-04  6.779e-04   0.480  0.63163   
direct        9.845e-03  5.072e-03   1.941  0.05328 . 
other.web     2.802e-03  2.421e-03   1.157  0.24810   
social        6.634e-02  2.923e-02   2.270  0.02399 * 
bdc          -6.122e-03  5.051e-03  -1.212  0.22658   
ll            9.226e-03  3.777e-03   2.443  0.01521 * 
blocked.sum   4.526e-01  3.551e-01   1.275  0.20355   
pct.positive  9.601e-01  3.806e-01   2.523  0.01221 * 
weekday.L     3.527e-01  6.042e+00   0.058  0.95349   
weekday.Q    -1.850e+01  6.075e+00  -3.045  0.00255 **
weekday.C    -3.375e+00  5.180e+00  -0.651  0.51530   
weekday^4    -2.998e+00  4.942e+00  -0.607  0.54460   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 37.18 on 271 degrees of freedom
Multiple R-squared:  0.2262,    Adjusted R-squared:  0.1862 
F-statistic: 5.658 on 14 and 271 DF,  p-value: 1.277e-09
Code
model.1 <- lm(n.comments ~ log(Uniques)+ author.sum + google +direct +other.web +social +bdc + ll + blocked.sum + pct.positive +weekday , data = merged)
model.2 <- lm(n.comments ~ Uniques+ author.sum + google +direct +other.web +social +bdc + ll + blocked.sum + pct.positive +weekday , data = merged)
#removing unsignificant variables:
model.3 <- lm(n.comments ~  direct  +social + ll  + pct.positive +weekday , data = merged)
stargazer( model.1,model.2, model.3, type = 'text')

==========================================================================================
                                             Dependent variable:                          
                    ----------------------------------------------------------------------
                                                  n.comments                              
                              (1)                     (2)                    (3)          
------------------------------------------------------------------------------------------
log(Uniques)                -9.224                                                        
                           (16.724)                                                       
                                                                                          
Uniques                                              0.004                                
                                                    (0.004)                               
                                                                                          
author.sum                   0.459                   0.277                                
                            (3.273)                 (3.282)                               
                                                                                          
google                      0.0003                  -0.004                                
                            (0.001)                 (0.004)                               
                                                                                          
direct                      0.010*                   0.008                 0.004***       
                            (0.005)                 (0.006)                (0.001)        
                                                                                          
other.web                    0.003                  -0.0004                               
                            (0.002)                 (0.004)                               
                                                                                          
social                      0.066**                 0.063**                0.068**        
                            (0.029)                 (0.029)                (0.029)        
                                                                                          
bdc                         -0.006                  -0.009*                               
                            (0.005)                 (0.005)                               
                                                                                          
ll                          0.009**                 0.010**                0.012***       
                            (0.004)                 (0.004)                (0.003)        
                                                                                          
blocked.sum                  0.453                   0.478                                
                            (0.355)                 (0.356)                               
                                                                                          
pct.positive                0.960**                0.995***                1.008***       
                            (0.381)                 (0.380)                (0.377)        
                                                                                          
weekday.L                    0.353                  -0.239                  1.961         
                            (6.042)                 (6.074)                (5.967)        
                                                                                          
weekday.Q                 -18.499***              -19.001***              -20.251***      
                            (6.075)                 (6.072)                (5.961)        
                                                                                          
weekday.C                   -3.375                  -3.231                  -5.207        
                            (5.180)                 (5.160)                (5.087)        
                                                                                          
weekday4                    -2.998                  -3.203                  -3.907        
                            (4.942)                 (4.945)                (4.901)        
                                                                                          
Constant                    85.149                  -10.846                 6.193         
                           (162.552)               (38.121)                (35.107)       
                                                                                          
------------------------------------------------------------------------------------------
Observations                  286                     286                    286          
R2                           0.226                   0.227                  0.206         
Adjusted R2                  0.186                   0.187                  0.183         
Residual Std. Error    37.182 (df = 271)       37.155 (df = 271)      37.257 (df = 277)   
F Statistic         5.658*** (df = 14; 271) 5.694*** (df = 14; 271) 8.976*** (df = 8; 277)
==========================================================================================
Note:                                                          *p<0.1; **p<0.05; ***p<0.01

4.2. Diagnostics:

Code
model.3 <- lm(n.comments ~  direct  +social + ll  + pct.positive +weekday , data = merged)

par(mfrow = c(2,3))
plot(model.3, which = 1:6)

Observation #48 looks like on outlier impacting the model. I will remove this observation to see if that gives us any different results:

Code
merged.old <- merged
merged <- merged[-c(48), ]
model.3 <- lm(n.comments ~  direct  +social + ll  + pct.positive +weekday , data = merged.old )
model.4 <- lm(n.comments ~  direct  +social + ll  + pct.positive +weekday , data = merged)
stargazer( model.3, model.4, type = 'text')

==================================================================
                                 Dependent variable:              
                    ----------------------------------------------
                                      n.comments                  
                             (1)                     (2)          
------------------------------------------------------------------
direct                     0.004***               0.004***        
                           (0.001)                 (0.001)        
                                                                  
social                     0.068**                 0.067**        
                           (0.029)                 (0.029)        
                                                                  
ll                         0.012***               0.018***        
                           (0.003)                 (0.003)        
                                                                  
pct.positive               1.008***                0.923**        
                           (0.377)                 (0.371)        
                                                                  
weekday.L                   1.961                  -5.138         
                           (5.967)                 (6.250)        
                                                                  
weekday.Q                 -20.251***             -24.223***       
                           (5.961)                 (5.982)        
                                                                  
weekday.C                   -5.207                 -8.503*        
                           (5.087)                 (5.099)        
                                                                  
weekday4                    -3.907                 -4.654         
                           (4.901)                 (4.822)        
                                                                  
Constant                    6.193                  -0.466         
                           (35.107)               (34.562)        
                                                                  
------------------------------------------------------------------
Observations                 286                     285          
R2                          0.206                   0.232         
Adjusted R2                 0.183                   0.210         
Residual Std. Error   37.257 (df = 277)       36.615 (df = 276)   
F Statistic         8.976*** (df = 8; 277) 10.432*** (df = 8; 276)
==================================================================
Note:                                  *p<0.1; **p<0.05; ***p<0.01

We can see that removing #48 changed some coefficients, did not impact significance of variables. It also improved R^2 and adjusted R^2.

Code
par(mfrow = c(2,3))
plot(model.4, which = 1:6)

In the first and third plot we can see values being concentrated unevenly, which suggests hetersroscedasticity of the variables.

4. Conlusion

We have build tree models for different engagement metrics: engagement of all readers (model 9 and 10) and engagement of loyal readers (model 4) . As we can see from the models, different factors contribute to engagement of these two groups of readers:

Code
stargazer(model.9, model.10, model.4, type = 'text')

====================================================================================================
                                                     Dependent variable:                            
                         ---------------------------------------------------------------------------
                                               e.rate                              n.comments       
                                    (1)                       (2)                      (3)          
----------------------------------------------------------------------------------------------------
log(Uniques)                     0.374***                  0.356***                                 
                                  (0.043)                   (0.103)                                 
                                                                                                    
log(google)                      0.201***                  0.013***                                 
                                  (0.029)                   (0.005)                                 
                                                                                                    
log(direct)                      -0.293***                  -0.037                                  
                                  (0.026)                   (0.117)                                 
                                                                                                    
other.web                       -0.00001***               -0.00001***                               
                                 (0.00000)                 (0.00000)                                
                                                                                                    
direct                                                                              0.004***        
                                                                                     (0.001)        
                                                                                                    
social                          -0.00003**                 -0.00003*                 0.067**        
                                 (0.00002)                 (0.00002)                 (0.029)        
                                                                                                    
log(bdc)                         0.181***                  0.212***                                 
                                  (0.017)                   (0.017)                                 
                                                                                                    
ll                                                                                  0.018***        
                                                                                     (0.003)        
                                                                                                    
pct.positive                                                                         0.923**        
                                                                                     (0.371)        
                                                                                                    
weekday.L                        0.009***                  0.010***                  -5.138         
                                  (0.003)                   (0.003)                  (6.250)        
                                                                                                    
weekday.Q                        0.017***                  0.018***                -24.223***       
                                  (0.003)                   (0.003)                  (5.982)        
                                                                                                    
weekday.C                         0.006**                   0.005*                   -8.503*        
                                  (0.003)                   (0.003)                  (5.099)        
                                                                                                    
weekday4                         0.007***                  0.007***                  -4.654         
                                  (0.003)                   (0.003)                  (4.822)        
                                                                                                    
log(Uniques):log(google)         -0.022***                                                          
                                  (0.003)                                                           
                                                                                                    
log(Uniques):log(direct)                                   -0.026**                                 
                                                            (0.011)                                 
                                                                                                    
Constant                         -1.751***                 -2.085**                  -0.466         
                                  (0.345)                   (1.044)                 (34.562)        
                                                                                                    
----------------------------------------------------------------------------------------------------
Observations                        286                       286                      285          
R2                                 0.919                     0.908                    0.232         
Adjusted R2                        0.915                     0.905                    0.210         
Residual Std. Error          0.020 (df = 274)          0.021 (df = 274)         36.615 (df = 276)   
F Statistic              281.142*** (df = 11; 274) 246.519*** (df = 11; 274) 10.432*** (df = 8; 276)
====================================================================================================
Note:                                                                    *p<0.1; **p<0.05; ***p<0.01

We can see, that different referral sources contribute to users engagement. While all readers’engagement (measured with exit rate) is influenced by interaction of referrals from Google and Direct and popularity, internal bdc referrals and somewhat referrals from other web sources, loyal readers’ engagement is highly impacted by the mood of the post, internal page referrals, social referrals. They also show some direct referrals’ impact, but very small in compare with its influence on all users’ engagement.

We also see significant contribution of day of the week on readers engagement. Specifically, Monday’s engagement of all readers is significantly lower than any other day. For loyal readers, Wednesday showing thee lowest engagement.

It also important to point out, that popularity of the post (Uniques) only plays a role for all user’s engagement. Loyal readers appear not being impacted by popularity. Impact of popularity (Uniques) is moderated by Google and Direct referrals, where the posts with higher amounts of referrals show overall better engagement.

To conclude, we found significant connection between the referral source and readers engagement. Most metrics related to the post discussion appeared non-relvent, with the exception of post mood positively impacting loyal readers engagement.

Limitations:

The role of Google referrals needs to be explored further, to identify how posts with high level of Google referrals contibute to this model. Also, finding out a cause of some pots having high Google referral rate would be beneficial for the blog development.