Descriptive Statistics and Visualizations
For HW04, I am reading in data from a recent survey administered by my employer, Parallax Advanced Research, to better understand the current state of innovation culture inside the organization.
Below are the variables included in the survey data, along with data types and descriptions. The dataset is already tidy for the purpose of calculation, but some numerical data types may need to be recoded on-the-fly to reflect the categories of the 7-point Likert scale (strongly disagree to strongly agree).
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
innovation_description <- read_excel(path="/Users/angelasmith/Desktop/DACSS601/Final Project/Innovation Culture Survey 2021 - for DACSS final.xlsx", range="Description!A1:C36")
library(rmarkdown)
paged_table(innovation_description)
knitr::opts_chunk$set(echo = TRUE)
library(readxl)
innovation <- read_excel(path="/Users/angelasmith/Desktop/DACSS601/Final Project/Innovation Culture Survey 2021 - for DACSS final.xlsx", range="Numerical!A1:AO53")
library(rmarkdown)
paged_table(innovation)
Several demographic questions were asked during the survey. This informs us about the people participating in the survey.
knitr::opts_chunk$set(echo = TRUE)
###Employees that directly supervise staff
table(innovation$directSupervise)
Maybe No Yes
1 34 17
###Employees in a management role
table(innovation$managementRole)
Middle-management Non-management Upper-management
13 31 8
###Employee highest degree
table(innovation$highestDegree)
Associate's Degree Bachelor's Degree Doctoral Degree
1 15 9
High School Diploma Master's Degree Professional Degree
6 20 1
###Employee educational background
table(innovation$educationBackground)
Both Non-Technical Technical
10 25 17
Questions: How do I add headers or titles to these tables? How can I sort the table data in a particular order (e.g., putting “Both” at the end of the last table, vs. at the beginning)?
Survey responses were based on a 7-point Likert scale. They have been converted to numbers for computation. Below, the medians for each variable have been calculated.
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
descriptive1 <- summarise_at(innovation, vars(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol), median, na.rm=TRUE)
paged_table(descriptive1)
Questions: How can I add a mutation to the end of this code that will convert these numbers to their Likert scale responses? And how can I do it without having to write a mutate statement for each variable?
The medians were again computed, with the data disaggregated by demographics.
knitr::opts_chunk$set(echo = TRUE)
###Responses by direct supervision status
descriptive2 <- innovation %>%
group_by(directSupervise) %>%
summarise_at(vars(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol), median, na.rm=TRUE)
paged_table(descriptive2)
###Responses by management role
descriptive3 <- innovation %>%
group_by(managementRole) %>%
summarise_at(vars(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol), median, na.rm=TRUE)
paged_table(descriptive3)
###Responses by degree level
descriptive4 <- innovation %>%
group_by(highestDegree) %>%
summarise_at(vars(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol), median, na.rm=TRUE)
paged_table(descriptive4)
###Responses by educational background
descriptive5 <-innovation %>%
group_by(educationBackground) %>%
summarise_at(vars(utilizeCapabilitiesSelfact:prioritizeTimeBudgetRisktol), median, na.rm=TRUE)
paged_table(descriptive5)
Question: Again, how do I add titles to these? Am I overthinking – should I just separate out chunks and add headers?
##Visualizations
The median response to the prompt “Ideas are filtered so that only the most promising opportunities make it to the surface” was 0, which translates to “neither agree nor disagree” in the survey response. For an organization interested in innovation, it makes sense that there might be ambiguity regarding this question. Does our organization value maturing the most promising opportunities, or allowing staff to develop projects that are higher risk? Middling responses might indicate Parallax isn’t communicating what it values clearly to staff.
The median response to the prompt “Convention and conformity to past projects or work processes is the norm at Parallax” is also 0, indicating “neither agree nor disagree” in the survey response. Here again, Parallax may do well to communicate to staff whether they prefer convention and conformity to past projects, or deviation from convention.
knitr::opts_chunk$set(echo = TRUE)
ggplot(innovation, aes(ConventionNormCreative)) + geom_bar(fill='gray',col='black')+
labs(x='Response',y='Frequency',title='Convention and conformity to past projects or work processes is the norm at Parallax')
When examining this proportional crosstab of responses to the prompt “Taking a risk (with the chance of failure) can negatively impact my career trajectory at Parallax” by management level, we notice the distributions of responses vary pretty significantly by role. Middle management is extremely confident taking risks does not negatively impact their career trajectory (median = -2). Upper management is slightly less confident, but still accepting of risks (median = -1.5). However, non-management is much less confident that risk-taking is a positive approach (median = 0). This is another opportunity for Parallax to clarify it’s organizational stance on risk-taking for staff, particularly those in non-management roles.
knitr::opts_chunk$set(echo = TRUE)
prop.table(xtabs(~managementRole + riskCareerImpactRisktol, innovation), 1)*100
riskCareerImpactRisktol
managementRole -3 -2 -1 0 1
Middle-management 7.692308 46.153846 23.076923 23.076923 0.000000
Non-management 3.225806 25.806452 12.903226 45.161290 3.225806
Upper-management 0.000000 50.000000 37.500000 12.500000 0.000000
riskCareerImpactRisktol
managementRole 2
Middle-management 0.000000
Non-management 9.677419
Upper-management 0.000000
Questions: How can I turn this into a heatmap? How can I recode the variable labels so they’re not so ugly?
##Limitations of visualizations
In general, I am confident about my conceptual knowledge of translating data into visualizations. That said, I am still getting used to how R “thinks.” As indicated by my questions sprinkled in above, I often know what I want a visualization to look like but don’t yet have the full suite of tools to translate this into visualizations (though this is great practice for the final product). I occasionally find it difficult to toggle between earlier results and later visualizations, though I suppose that’s true for all tools. I am gaining some confidence in my ability to Google through some challenges, though others remain (again, see questions above).
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Smith (2022, Jan. 11). Data Analytics and Computational Social Science: HW04. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomangelanicolesmith854704/
BibTeX citation
@misc{smith2022hw04, author = {Smith, Angela}, title = {Data Analytics and Computational Social Science: HW04}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomangelanicolesmith854704/}, year = {2022} }