Customizing ggplot
For HW05, I am reading in data from a recent survey administered by my employer, Parallax Advanced Research, to better understand the current state of innovation culture inside the organization.
Below are the variables included in the survey data, along with data types and descriptions.
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
innovation_description <- read_excel(path="/Users/angelasmith/Desktop/DACSS601/Final Project/Innovation Culture Survey 2021 - for DACSS final.xlsx", range="Description!A1:C36")
library(rmarkdown)
paged_table(innovation_description)
##Tidy Dataset Many variables in this dataset have been precoded for calculation. Most survey responses are on a 7-point Likert scale, ranging from strongly disagree to strongly agree. The coding is as follows: -3 strongly disagree -2 disagree -1 slightly disagree 0 neither agree nor disagree 1 slightly agree 2 agree 3 strongly agree
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
library(dplyr)
#Reading in the excel sheet
innovation <- read_excel(path="/Users/angelasmith/Desktop/DACSS601/Final Project/Innovation Culture Survey 2021 - for DACSS final.xlsx", range="Numerical!A1:AC53")
#mutating employed months and years to a months-only variable
innovation <- mutate(innovation,totalmonthsEmployed = ((yearsEmployed*12)+monthsEmployed))
#mutating management role to a two-category variable
innovation <- innovation %>%
mutate(managementRoleSimple = recode(managementRole, "Non-management"="Non-management", "Middle-management"="Management", "Upper-management"="Management"))
#making highestDegree ordinal
#innovation %>%
#highestDegreeFactor <- factor(highestDegree, order = TRUE, levels = c("High School Diploma","Associate's Degree","Bachelor's #Degree","Master's Degree","Professonal Degree","Doctoral Degree"))
# highestDegreeFactor
library(rmarkdown)
paged_table(innovation)
Several demographic questions were asked during the survey. This informs us about the people participating in the survey.
knitr::opts_chunk$set(echo = TRUE)
#TO DO: move these to different chunks, add titles, figure out how to change order.
###Employees in a management role
table(innovation$managementRoleSimple)
Management Non-management
21 31
###Employee highest degree
table(innovation$highestDegree)
Associate's Degree Bachelor's Degree Doctoral Degree
1 15 9
High School Diploma Master's Degree Professional Degree
6 20 1
###Employee educational background
table(innovation$educationBackground)
Both Non-Technical Technical
10 25 17
Questions: How do I add headers or titles to these tables? How can I sort the table data in a particular order (e.g., putting “Both” at the end of the last table, vs. at the beginning)?
Survey responses were based on a 7-point Likert scale. They have been converted to numbers for computation. Below, the medians for each variable have been calculated.
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
descriptive1 <- summarise_at(innovation, vars(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol), median, na.rm=TRUE)
paged_table(descriptive1)
#TO DO: look into whether I can turn this table into a boxplot or heatmap for reading ease.
The medians were again computed, with the data disaggregated by management role.
knitr::opts_chunk$set(echo = TRUE)
#TO DO: look into whether I can turn these tables into boxplots or mosaic visualizations for reading ease.
###Responses by management role
descriptive2 <- innovation %>%
group_by(managementRoleSimple) %>%
summarise_at(vars(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol), median, na.rm=TRUE)
paged_table(descriptive2)
##Visualizations
The median response to the prompt “Taking a risk (with the chance of failure) can negatively impact my career trajectory at Parallax” was -1, which translates to “slightly disagree” in the survey response. This indicates on the whole, staff do not perceive they would be penalized for taking a risk that might lead to failure. However, disaggregating this data reveals that management and non-management perceptions diverge. Management appears more comfortable taking risks without career reprisal. But non-management staff had a median response of 0, which translates to “neither agree nor disagree.” This difference in response might indicate that staff are less clear on Parallax’s rules and boundaries on risk-taking. They may also be more risk averse in general. This is an opportunity for Parallax to clarify it’s organizational stance on risk-taking for staff, particularly those in non-management roles.
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
##Original Plot
#ggplot(innovation, aes(ideasFilteredEase)) + geom_bar(fill='gray',col='black') +
# labs(x='Response',y='Frequency',title='Ideas are filtered so that only the most promising opportunities make it to the surface') +
#scale_x_discrete(breaks = -3:3, labels=c("Strongly Disagree","Disagree","Slightly Disagree","Neither Agree Nor Disagree","Slightly Agree","Agree","Strongly Agree")) +
# theme(axis.text.x = element_text(angle = 90))
#innovation <- innovation %>%
# group_by(managementRoleSimple) %>%
# mutate(median_ideasFilteredEase =median(ideasFilteredEase))
#ggplot(innovation, aes(ideasFilteredEase)) +
# geom_bar(aes(y = (..count..)/sum(..count..)),fill='gray',col='black') +
# scale_y_continuous(labels=scales::percent) +
# geom_vline(aes(xintercept = median_ideasFilteredEase), color="blue") +
# labs(x='Staff Response',y='Relative Frequency',title='Ideas are filtered so that only the most promising opportunities make it to the surface') +
# scale_x_discrete(breaks = -3:3, labels=c("Strongly Disagree","Disagree","Slightly Disagree","Neither Agree Nor Disagree","Slightly Agree","Agree","Strongly Agree")) +
# theme(axis.text.x = element_text(angle = 90))+
# facet_wrap(vars(managementRoleSimple))
innovation <- innovation %>%
mutate(median_riskCareerImpactRisktol = median(riskCareerImpactRisktol),
sd_riskCareerImpactRisktol = sd(riskCareerImpactRisktol))
##Original Plot
ggplot(innovation, aes(riskCareerImpactRisktol)) +
geom_bar(fill='gray',col='black') +
geom_errorbar(aes(ymin=median_riskCareerImpactRisktol-sd_riskCareerImpactRisktol,ymax=median_riskCareerImpactRisktol+sd_riskCareerImpactRisktol),color="orange") +
labs(x='Response',y='Frequency',title='Taking a risk (with the chance of failure) can negatively impact my career trajectory at Parallax') +
geom_vline(aes(xintercept = median(riskCareerImpactRisktol)), color="blue") +
#scale_x_discrete(breaks = -3:3, labels=c("Strongly Disagree","Disagree","Slightly Disagree","Neither Agree Nor Disagree","Slightly Agree","Agree","Strongly Agree")) +
theme(axis.text.x = element_text(angle = 90))
innovation <- innovation %>%
group_by(managementRoleSimple) %>%
mutate(median_riskCareerImpactRisktol =median(riskCareerImpactRisktol))
ggplot(innovation, aes(riskCareerImpactRisktol)) +
geom_bar(aes(y = (..count..)/sum(..count..)),fill='gray',col='black') +
scale_y_continuous(labels=scales::percent) +
geom_vline(aes(xintercept = median_riskCareerImpactRisktol), color="blue") +
labs(x='Staff Response',y='Relative Frequency',title='Taking a risk (with the chance of failure) can negatively impact my career trajectory at Parallax') +
geom_errorbar(aes(ymin=median_riskCareerImpactRisktol-sd_riskCareerImpactRisktol,ymax=median_riskCareerImpactRisktol+sd_riskCareerImpactRisktol),color="orange") +
# scale_x_discrete(breaks = -3:3, labels=c("Strongly Disagree","Disagree","Slightly Disagree","Neither Agree Nor Disagree","Slightly Agree","Agree","Strongly Agree")) +
theme(axis.text.x = element_text(angle = 90))+
facet_wrap(vars(managementRoleSimple))
I considered that it might be possible members of management were risk-seeking without fear of reprisal due to tenure length in the organization. It would stand to reason that, the longer you’re at an organization, the more likely you might be to take risks because you know which risks you can take without getting fired. To examine this, I plotted the responses to the risk question against the number of months employed at Parallax, and used color to disaggregate by management level. There doesn’t really appear to be any correlation between months employed at Parallax and perception of risk-taking. In fact, most non-management roles with longer tenure (>30 months) were, at best, unsure of the impact of risk taking would have on their career.
knitr::opts_chunk$set(echo = TRUE)
ggplot(innovation, aes(riskCareerImpactRisktol,totalmonthsEmployed, color = managementRoleSimple)) +
geom_point()+
labs(x='Response',y='Months Employed',title='Taking a risk (with the chance of failure) can negatively impact my career trajectory at Parallax')
##Elements to resolve
###What is missing (if anything) in your analysis process so far? Two rather important elements are missing. First, I need to clarify my specific research question(s) with this dataset. Last week, I focused on exploring the data to understand where the interesting points were. But there’s a lot of noise, and frankly, it seems beyond the scope of this project to thoroughly examine everything interesting in the dataset. I am leaning towards examining the differences in management and non-management staff perceptions of innovation at Parallax. It’s substantive enough to be interesting, without getting too overwhelming with data.
I am also missing a handful of useful visualizations that could more effectively summarize the bulk of the data I have. I have ideas of what I want, but still need a bit more time to work through them. For instance, I’d like to create a heatmap that includes each of the Likert-based survey variables on the x-axis, with management and non-management on the y axis. The fill would be the median.
###What conclusions can you make about your research questions at this point? There seem to be clear differences in the ways in which management and non-management perceive innovation systems at Parallax. Above, I examined risk tolerance. Other variables that appear interesting upon first glance include readilyHeardEase and rewardsRiskRisktol.
###What do you think a naive reader would need to fully understand your graphs? Most immediately, I need to figure out how to resize the titles so they are not cut off.
I figured out how to change the numerical labels at the bottom of the graph to text to improve interpretation of the graphs, but they also need troubleshooting. This may be a size issue or something else.
My standard error bars are way off – not sure what is happening here. But I need to fix that.
###Is there anything you want to answer with your dataset, but can’t? I only have one year of data, so I’m unable to make any longitudinal observations.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Smith (2022, Jan. 14). Data Analytics and Computational Social Science: HW05. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomangelanicolesmith855171/
BibTeX citation
@misc{smith2022hw05, author = {Smith, Angela}, title = {Data Analytics and Computational Social Science: HW05}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomangelanicolesmith855171/}, year = {2022} }