Data Analytics and Computational Social Science: Final Project

Angela Smith

Introduction

In summer 2021, staff at Parallax Advanced Research were asked to complete an anonymous survey regarding their perceptions of innovation within the organization. The survey included 20 core questions spanning topics including organizational risk tolerance, creative expression, ease of idea sharing, and employee self-actualization. A variety of demographic information was also collected.

This analysis will examine the differences in participant responses to innovation prompts based on management roles within Parallax. The question “do managers perceive the innovation environment differently from those in non-management roles?” will guide the data exploration and analysis.

Data

All data manipulation and analyses were completed using R 4.1.2 (R Core Team, 2021), readxl 1.3.1 (Wickham and Bryan, 2021), rmarkdown 2.11 (Allaire et. al, 2021), tidyr 1.3.1 (Wickham, 2021), dplyr 1.0.7 (Wickham et. al, 2021), and ggplot2 3.3.5 (Wickham, 2021).

The data was collected via online survey using Microsoft Forms. Upon completion of the survey, data was exported to Excel, then read-in to R using readxl.

Below are the variables included in the survey data, along with data types and descriptions.

knitr::opts_chunk$set(echo = TRUE)

library(readxl)

innovation_description <- read_excel(path="/Users/angelasmith/Desktop/DACSS601/Final Project/Innovation Culture Survey 2021 - for DACSS final.xlsx", range="Description!A1:C36")

library(rmarkdown)
paged_table(innovation_description)

knitr::opts_chunk$set(echo = TRUE)

library(readxl)
library(dplyr)

#Reading in the excel sheet
innovation <- read_excel(path="/Users/angelasmith/Desktop/DACSS601/Final Project/Innovation Culture Survey 2021 - for DACSS final.xlsx", range="Numerical!A1:AC53")

#mutating employed months and years to a months-only variable
innovation <- mutate(innovation,totalmonthsEmployed = ((yearsEmployed*12)+monthsEmployed))

#mutating management role to a two-category variable
innovation <- innovation %>% 
  mutate(managementRoleSimple = recode(managementRole, "Non-management"="Non-management", "Middle-management"="Management", "Upper-management"="Management"))

#making highestDegree ordinal
innovation <- innovation %>%
  mutate(highestDegree =factor(highestDegree, order = TRUE, levels = c("High School Diploma","Associate's Degree","Bachelor's Degree","Master's Degree","Professonal Degree","Doctoral Degree")))


#using paged_table to show the entire innovation table without cutoffs
library(rmarkdown)
paged_table(innovation)

#factor(var, levels=c(“disagree”, “neutral”, “agree”))

Demographics

Of 101 individuals invited to participate in the survey, 52 responded to the survey. About 3/5 of respondents were non-management, and 2/5 were management. The majority of respondents have a bachelor’s degree or higher. About half of the organization has technical degrees, and about 2/3 have non-technical or no degrees.

Management Role

knitr::opts_chunk$set(echo = TRUE)

###Employees in a management role
table(innovation$managementRoleSimple)


    Management Non-management 
            21             31

Degree level

knitr::opts_chunk$set(echo = TRUE)

###Employee highest degree
table(innovation$highestDegree)


High School Diploma  Associate's Degree   Bachelor's Degree 
                  6                   1                  15 
    Master's Degree  Professonal Degree     Doctoral Degree 
                 20                   0                   9

Degree type

knitr::opts_chunk$set(echo = TRUE)

###Employee educational background
table(innovation$educationBackground)


         Both Non-Technical     Technical 
           10            25            17

Survey responses

Survey responses were based on a 7-point Likert scale:
-3 strongly disagree
-2 disagree
-1 slightly disagree
0 neither agree nor disagree
1 slightly agree
2 agree
3 strongly agree

Below, a visualization summarizes responses by management role and survey variable.

knitr::opts_chunk$set(echo = TRUE)

library(tidyr)
library(dplyr)

#mutating survey responses to text
innovation_txt <- innovation %>% 
  mutate(across(c(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol),~case_when(.==-3~"Strongly disagree",.==-2~"Disagree",.==-1~"Slightly disagree",.==0~"Neither agree nor disagree",.==1~"Slightly agree",.==2~"Agree",.==3~"Strongly agree")))

#making survey responses ordinal
innovation_txt <- innovation_txt %>%
  mutate(across(c(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol), ~factor(., order=TRUE, levels=(c("Strongly disagree","Disagree","Slightly disagree","Neither agree nor disagree","Slightly agree","Agree","Strongly agree")))))

#pivots the dataframe for geom_tile()
innovation_pivot <- pivot_longer(innovation_txt,
  cols = utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol,
  names_to = "surveyColumns",
  values_to = "surveyResponse")

#creates percentage values for the geom_tile viz
innovation_pivot2 <- innovation_pivot %>% 
  select(id,managementRoleSimple,surveyColumns,surveyResponse) %>%
  count(managementRoleSimple,surveyColumns,surveyResponse) %>%
  group_by(managementRoleSimple,surveyColumns) %>%
  mutate(percent = (n/sum(n))*100)

#adds counts to the innovation pivot
#innovation_pivot2 <- innovation_pivot %>% 
#  select(id,managementRoleSimple,surveyColumns,surveyResponse) %>% 
#  count(managementRoleSimple,surveyColumns,surveyResponse)

library(ggplot2)

#creates geom_tile summarizing responses of all variables within the survey
ggplot(innovation_pivot2, aes(x=surveyResponse, y= surveyColumns, fill=percent)) +
  geom_tile() +
  facet_wrap(vars(managementRoleSimple)) +
  scale_fill_gradient(low = "blue", high = "red") +
  #geom_text(aes(label=round(percent, digits = 1))) ##labels each tile as a percentage
  labs(x='Staff Response',y='Frequency',title="Survey Responses Disaggregated by \nManagement Level") +
  theme(axis.text.x = element_text(angle = 90))

When examining the data visually, we can see many of the variables have similar distributions across both management and non-management roles. However, there are a few variables where responses diverge between the two groups. The variables riskCareerImpactRisktol, rewardsRiskRisktol, readilyHeardEase, ideasFilteredEase, exploreOpportunitiesCreative, and encouragedShareIdeasEase diverge enough when comparing management and non-management responses to warrant further investigation.

Analysis and visualizations

The median response to the prompt “Taking a risk (with the chance of failure) can negatively impact my career trajectory at Parallax” is -1, or “slightly disagree” in the survey response. This indicates on the whole, staff do not perceive they would be penalized for taking a risk that might lead to failure. However, disaggregating this data reveals that management and non-management perceptions diverge. Management appears more comfortable taking risks without career reprisal. But non-management staff had a median response of 0, which translates to “neither agree nor disagree.” This difference in response might indicate that staff are less clear on Parallax’s rules and boundaries on risk-taking. They may also be more risk averse in general. This is an opportunity for Parallax to clarify it’s organizational stance on risk-taking for staff, particularly those in non-management roles.

knitr::opts_chunk$set(echo = TRUE)


library(ggplot2)
library(dplyr)

#used to calculate median for line in bar plot below
innovation <- innovation %>%
  group_by(managementRoleSimple) %>%
  mutate(median_riskCareerImpactRisktol =median(riskCareerImpactRisktol))

#bar plot disaggregated by management level 
ggplot(innovation, aes(riskCareerImpactRisktol)) +
  geom_bar(aes(y = (..count..)/sum(..count..)),fill='gray',col='black') +
  scale_y_continuous(labels=scales::percent) +
  geom_vline(aes(xintercept = median_riskCareerImpactRisktol), color="blue") +
  labs(x='Staff Response',y='Relative Frequency',title="riskCareerImpactRisktol Disaggregated by Management Level",subtitle='Taking a risk (with the chance of failure) can negatively impact my career trajectory at Parallax.') +
  theme(axis.text.x = element_text(angle = 90)) +
  facet_wrap(vars(managementRoleSimple)) +
  #scale_x_discrete(labels=c("-3"="Strongly disagree","-2"="Disagree","-1"="Slightly disagree","0"="Neither agree nor disagree","1"="Slightly agree","2"="Agree","3"="Strongly agree"))

Error: <text>:24:0: unexpected end of input
22: 
23: 
   ^

I considered that it might be possible members of management were risk-seeking without fear of reprisal due to tenure length in the organization. It would stand to reason that, the longer you’re at an organization, the more likely you might be to take risks because you know which risks you can take without getting fired. To examine this, I plotted the responses to the risk question against the number of months employed at Parallax, and used color to disaggregate by management level. There doesn’t appear to be any relationship between months employed at Parallax and perception of risk-taking. In fact, most non-management roles with longer tenure (>30 months) were, at best, unsure of the impact of risk taking would have on their career.

knitr::opts_chunk$set(echo = TRUE)

#scatterplot disaggregated by management level
ggplot(innovation, aes(riskCareerImpactRisktol,totalmonthsEmployed, color = managementRoleSimple)) + 
  geom_point()+
  labs(x='Response',y='Months Employed',title="riskCareerImpactRisktol by Months Employed, Disaggregated by Management Level",subtitle='Taking a risk (with the chance of failure) can negatively impact my career trajectory at Parallax.', color="Management Level") +
  scale_color_manual(values=(c("Management" = "red", "Non-management" = "blue"))) +
  theme(axis.text.x = element_text(angle = 90))

  #scale_x_discrete(breaks = c("-3","-2","-1","0","1","2","3"), labels=c("Strongly disagree","Disagree","Slightly disagree","Neither agree nor disagree","Slightly agree","Agree","Strongly agree"))

The median response to the prompt “Parallax rewards people for participating in potentially risky opportunities, irrespective of the outcome” is 0, or “neither agree nor disagree” in the survey response. This indicates on the whole, staff do not necessarily agree nor disagree that the organization rewards people for taking risks. However, disaggregating this data reveals that management and non-management perceptions diverge. Management appears more confident that potentially risky opportunities can be rewarded, as they slightly agree with the statement. But non-management staff had a median response of 0, which translates to “neither agree nor disagree.” This difference in response might indicate that staff are less clear on Parallax’s approach to rewarding or penalizing risk-takers within the organization.

knitr::opts_chunk$set(echo = TRUE)

library(ggplot2)
library(dplyr)

#used to calculate median for line in bar plot below
innovation <- innovation %>%
  group_by(managementRoleSimple) %>%
  mutate(median_rewardsRiskRisktol =median(rewardsRiskRisktol))

#bar plot disaggregated by management level
ggplot(innovation, aes(rewardsRiskRisktol)) +
  geom_bar(aes(y = (..count..)/sum(..count..)),fill='gray',col='black') +
  scale_y_continuous(labels=scales::percent) +
  geom_vline(aes(xintercept = median_rewardsRiskRisktol), color="blue") +
  labs(x='Staff Response',y='Relative Frequency',subtitle="Parallax rewards people for participating in potentially risky opportunities, irrespective of the \noutcome.",title="rewardsRiskRisktol Disaggregated by Management Level") +
  theme(axis.text.x = element_text(angle = 90)) +
  facet_wrap(vars(managementRoleSimple))

  #scale_x_discrete(breaks = -3:3, labels=c("Strongly disagree","Disagree","Slightly disagree","Neither agree nor disagree","Slightly agree","Agree","Strongly agree"))

The median response to the prompt “Parallax’s processes allow ideas to be readily heard up the chain of command.” is 2, or “agree” in the survey response. This indicates on the whole, staff agrees that ideas at the lowest levels of the organization can be taken to the highest rungs. Disaggregating this data reveals that management and non-management perceptions diverge. Management appears more confident that Parallax’s processes allow ideas to move up the chain of command, as they slightly agree with the statement. But non-management staff had a median response of 1, which translates to “slightly agree.” While both groups clearly agree with the statement, non-management may be less confident in Parallax’s processes. This may indicate a true lack of confidence in processes, or a lack of awareness in the process of how ideas move up the chain of command.

knitr::opts_chunk$set(echo = TRUE)

library(ggplot2)
library(dplyr)

#used to calculate median for line in bar plot below
innovation <- innovation %>%
  group_by(managementRoleSimple) %>%
  mutate(median_readilyHeardEase =median(readilyHeardEase))

#bar plot disaggregated by management level 
ggplot(innovation, aes(readilyHeardEase)) +
  geom_bar(aes(y = (..count..)/sum(..count..)),fill='gray',col='black') +
  scale_y_continuous(labels=scales::percent) +
  geom_vline(aes(xintercept = median_readilyHeardEase), color="blue") +
  labs(x='Staff Response',y='Relative Frequency',subtitle="Parallax's processes allow ideas to be readily heard up the chain of command.",title="readilyHeardEase Disaggregated by Management Level") +
  theme(axis.text.x = element_text(angle = 90)) +
  facet_wrap(vars(managementRoleSimple))

  #scale_x_discrete(breaks = c("-3","-2","-1","0","1","2","3"), labels=c("Strongly disagree","Disagree","Slightly disagree","Neither agree nor disagree","Slightly agree","Agree","Strongly agree"))

Finally, the variables ideasFilteredEase, exploreOpportunitiesCreativity, and encouragedShareIdeasEase were analyzed. These are unique variables in comparison to the previous variables examined because the responses from management and non-management differ in their proportional distribution of responses, however, they not differ in their measure of central tendency.

For example, the prompt “Ideas are filtered so that only the most promising opportunities make it to the surface” has a median of 0 – “neither agree nor disagree” – for both management and non-management groups. However, the distribution of the responses differ substantially when viewing the bar chart. Management leans toward disagreeing with the statement, while the non-management group overwhelmingly responded “neither agree nor disagree.” This is an interesting result, as two separate conclusions can be made. First, many in management seem to disagree that only the most promising results make it to the surface – perhaps the organization should filter ideas more. Secondly, the overwhelmingly neutral response of non-management may indicate the group isn’t sure how ideas or filtered, leaving opportunity to improve upon transparency in decision-making about organizational opportunities.

The prompt “Parallax has a strong desire to explore opportunities and to create new things” elicited similar responses from management and non-management with a median of 2, or “agree”. In other words, there is broad agreement that the organization is open to exploration and new ideas. Notably, the non-management graph skews negatively (further to the left) compared to the management graph, and has a longer tail. While this distribution is not strong enough to pull the median lower, and thus probably isn’t a huge problem for Parallax, it may be a variable to watch in the future for a possible slide down.

Simiarly, the prompt “I am encouraged to share ideas out of the scope of current work” has a median of 2 – “agree” – for both management and non-management groups. The non-management graph also skews negatively here, though not enough to pull the overall median downward. This is another variable that may be worth monitoring to ensure non-management feel comfortable sharing their ideas at Parallax, even if they fall outside of the current scope of the individual’s or organization’s work.

knitr::opts_chunk$set(echo = TRUE)


library(dplyr)
library(ggplot2)

#used to calculate median for line in bar plot below
innovation <- innovation %>%
  group_by(managementRoleSimple) %>%
  mutate(median_ideasFilteredEase =median(ideasFilteredEase))
  
#bar plot disaggregated by management level 
ggplot(innovation, aes(ideasFilteredEase)) +
  geom_bar(aes(y = (..count..)/sum(..count..)),fill='gray',col='black') +
  scale_y_continuous(labels=scales::percent) +
  geom_vline(aes(xintercept = median_ideasFilteredEase), color="blue") +
  labs(x='Staff Response',y='Relative Frequency',subtitle="Ideas are filtered so that only the most promising opportunities make it to the surface.",title="ideasFilteredEase Disaggregated by Management Level") +
  theme(axis.text.x = element_text(angle = 90)) +
  facet_wrap(vars(managementRoleSimple))

  #scale_x_discrete(breaks = c("-3","-2","-1","0","1","2","3"), labels=c("Strongly disagree","Disagree","Slightly disagree","Neither agree nor disagree","Slightly agree","Agree","Strongly agree"))

#used to calculate median for line in bar plot below
 innovation <- innovation %>%
  group_by(managementRoleSimple) %>%
  mutate(median_exploreOpportunitiesCreative =median(exploreOpportunitiesCreative))
  
#bar plot disaggregated by management level
 ggplot(innovation, aes(exploreOpportunitiesCreative)) +
  geom_bar(aes(y = (..count..)/sum(..count..)),fill='gray',col='black') +
  scale_y_continuous(labels=scales::percent) +
  geom_vline(aes(xintercept = median_exploreOpportunitiesCreative), color="blue") +
  labs(x='Staff Response',y='Relative Frequency',subtitle="Parallax has a strong desire to explore opportunities and to create new things.",title="exploreOpportunitiesCreative Disaggregated by Management Level") +
  theme(axis.text.x = element_text(angle = 90)) +
  facet_wrap(vars(managementRoleSimple))

  #scale_x_discrete(breaks = c("-3","-2","-1","0","1","2","3"), labels=c("Strongly disagree","Disagree","Slightly disagree","Neither agree nor disagree","Slightly agree","Agree","Strongly agree"))

#used to calculate median for line in bar plot below
innovation <- innovation %>%
  group_by(managementRoleSimple) %>%
  mutate(median_encouragedShareIdeasEase =median(encouragedShareIdeasEase))

#bar plot disaggregated by management level 
ggplot(innovation, aes(encouragedShareIdeasEase)) +
  geom_bar(aes(y = (..count..)/sum(..count..)),fill='gray',col='black') +
  scale_y_continuous(labels=scales::percent) +
  geom_vline(aes(xintercept = median_encouragedShareIdeasEase), color="blue") +
  labs(x='Staff Response',y='Relative Frequency',subtitle="I am encouraged to share ideas out of the scope of current work.",title="encouragedShareIdeasEase Disaggregated by Management Level") +
  theme(axis.text.x = element_text(angle = 90)) +
  facet_wrap(vars(managementRoleSimple))

  #scale_x_discrete(breaks = c("-3","-2","-1","0","1","2","3"), labels=c("Strongly disagree","Disagree","Slightly disagree","Neither agree nor disagree","Slightly agree","Agree","Strongly agree"))

Reflection

When beginning this project, I wanted to be intentional about the dataset I was using. As a full-time analyst, I had a strong preference to work with a dataset relevant to my role. I also preferred to work with data I had some familiarity with. I happened to inherit this survey from a colleague I collaborate with on innovation-related projects at Parallax. As the new owner, this gave me a perfect opportunity to dig into this data a bit more.

With this dataset, as with any new dataset I am working with, I try to create one or two summative visualizations that will help me quickly locate points of interest for analysis, especially if there are not specific points of interest already determined by someone else. I created the geom_tile visualization as a first step. I also ran some quick summaries on the demographic variables to get a sense of the response group. Finally, in an earlier version of this project, I created tables that included the medians for all variables, and the same tables disaggregated by the various demographic variables. I then triangulated this data to get a sense of the most interesting parts of the dataset and refined my research question.

As with any analytical project, the process of updating code and analyzing the data is iterative. Each step informs the other, and I find I usually have to do several passes between organizing data, visualizing it, and analyzing it. As a decision analyst, I always ask the question “how is this actionable?” throughout the analysis. This framed what I chose to follow up on. A finding may be interesting but not especially actionable. In this case, focusing on management vs. non-management perceptions met the criteria of being both interesting and actionable.

I came into this project with a background in analytics and visualization, so I felt very comfortable with the conceptual elements of analyzing and visualizing data. I also code (though not in R prior to this course), so I felt confident in my ability to handle most of the technical aspects of the project. By the time I understand the dataset and really have a sense of where I want to go, my biggest challenge shifts to figuring out how to manifest the visualizations I have conceived in my head. For me, the hardest part about this project was translating – both in terms of translating what I want to code, and then translating that to Google when I couldn’t figure it out. These two elements get easier over time, but unfortunately my limited intuition about how R “thinks” made this a more time-consuming process than I initially expected. I am grateful to have Sean and Larri as support when I have these kinds of technical questions.

This process of building intuition with how R thinks makes me appreciate the design of the platform. I love that I can use one platform to organize, visualize, and analyze my data. I am used to typically using no fewer than two pieces of software (usually a SQL developer and a visualization or modelling software, sometimes more than that). I also appreciate rmarkdown so much. I consider myself a fairly organized analyst, and I’m generally good about keeping code and documentation with my analytical work for future reference and reproducibility, but having this all in one place is very convenient.

I do wish I would’ve known R’s limitations heading into it. R is exceedingly flexible, but it can’t do everything I want (yet). My limited technical knowledge sometimes prevents me from creating on-the-fly workarounds or elegant solutions, but I know that will come with time.

With more time, I do expect to continue analyzing this dataset. The number of variables is so large that it really warrants further analysis, but it was well beyond the scope of this project to analyze all variables from every angle. One benefit of working on a dataset related to my role is that I will likely have future opportunities to expand on this work.

Conclusion

On the whole, I am very impressed that the survey responses regarding innovation at Parallax were so positive. I am still fairly new to the organization, and our innovation efforts are just ramping up. I didn’t have any preconceived ideas of how staff would respond to this survey, so it was certainly a pleasant surprise to see so many positive responses. Frankly, I was also relieved that the discrepancies between management and non-management weren’t necessarily negative. If anything, it showed Parallax has opportunity to improve in terms of communicating values and policies, but there isn’t any evidence suggesting we have a morale issue or a larger culture issue (which is great, because those are the hardest issues to solve in an organization).

For me, there are some fairly clear takeaways related to communicating Parallax’s values and policies. There are obvious, low-hanging-fruit improvements we can make to communicate to staff that:
1) We want people to feel like they can make calculated risks. Our culture rewards people for this behavior – we don’t punish people for it.
2) To be on the cutting edge of science and technology, we want to explore big ideas and give people room to think outside of the box.
3) We want great ideas to be heard across the organization, whether the staff member sharing it is working at a tactical or strategic level.

These points are an entry point to many larger, organization-wide conversations. How do we reward people? What is our decision framework for taking calculated risks? How do we encourage people to share innovative ideas? How do we decide which ideas are resourced so they’re successful? This dataset won’t answer these questions, but it provides necessary context to pose them to the broader organization.

While there is still quite a lot to explore in this dataset, the opportunity I most look forward to is seeing how the responses to this survey change in the future. At a time when Parallax is growing faster than ever before, and as we aspire to be a premiere research institute, there is real opportunity to use this data to create a sustainable innovation infrastructure for the organization. I look forward to being part of the conversation.

Bibliography

Allaire, JJ. et al. (2021). rmarkdown: dynamic documents for R. R package version 2.11. https://cran.r-project.org/web/packages/rmarkdown/index.html

Frame, M. (2021). Unpublished raw survey data on innovation at Parallax Advanced Research. Parallax Advanced Research.

R Core Team (2021). R: A language and environment for statistical computing. R version 4.1.2. https://www.R-project.org/.

Wickham, H. (2021). ggplot2: elegant graphics for data analysis. R package version 3.3.5. https://cran.r-project.org/web/packages/ggplot2/index.html.

Wickham, H. (2021). Welcome to the tidyverse. R package version 1.3.1. https://cran.r-project.org/web/packages/tidyverse/index.html.

Wickham, H., and J. Bryan. (2021). readxl: read excel files. R package version 1.3.1. https://cran.r-project.org/web/packages/readxl/index.html.

Wickham, H., François, R., Henry, L., and K. Müller (2021). dplyr: a grammar of data manipulation. R package version 1.0.7. https://cran.r-project.org/web/packages/dplyr/index.html.

Comment on this article Share:

Final Project