This is my HW 2 for DACSS 601.
The data I am using is from the textbook “Doing Economics”, a part of the Core Econ project. The dataset is from the European Values Study (EVS), which is a large-scale survey program based on human values that has been done repeatedly in different years. The main topics are related to family, work, environment, politics, religion/morality, and national identity, according to the EVS website.
EVSData <- read_excel("EVSData.xlsx", sheet = "Wave 1")
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 2"))
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 3"))
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 4"))
# I want to assign new names to the variables
colnames(EVSData) <- c("EVS_Wave",
# A tibble: 6 × 17
EVS_Wave Region Respondent_Numb… Health Satisfaction Work_Q1 Work_Q2
<chr> <chr> <dbl> <chr> <chr> <chr> <chr>
1 1981-19… Belgi… 1001 Fair 9 .a .a
2 1981-19… Belgi… 1002 Very … 9 .a .a
3 1981-19… Belgi… 1003 Poor 3 .a .a
4 1981-19… Belgi… 1004 Very … 9 .a .a
5 1981-19… Belgi… 1005 Very … 9 .a .a
6 1981-19… Belgi… 1006 Very … 9 .a .a
# … with 10 more variables: Work_Q3 <chr>, Work_Q4 <chr>,
# Work_Q5 <chr>, Sex <chr>, Age <chr>, Marital_Status <chr>,
# Number_Children <chr>, Education <chr>, Employment <chr>,
# Monthly_Income <chr>
The variables are: * EVS Wave (date or character): Describes which “wave” or set of years the respondent answered
Region (nominal): Which country or region the respondent is from
Respondent number (continuous): The original respondent number identifying which individual it is
Health (ordinal): The respondent’s state of health (subjective because it is self-reported) on a scale up to “Very Good”
Satisfaction with life (ordinal): How satisfied the respondent is with life on a scale of 1 (dissatisfied) to 10 (very satisfied)
Work Q1-Q5 (ordinal): These variables are answers to questions about the respondent’s attitude towards work on a Likert scale, with 1 being “Strongly Agree” and 5 being “Strongly Disagree”, as well as the option of “Don’t know” or to not answer at all.
Sex (nominal): The sex of the respondent
Age (continuous): The age of the respondent
Marital status (nominal): Whether the respondent was single/never married, divorced, widowed, married, or living with someone as if they were married.
Number of children (continuous): The number of living children the respondent has
Education (nominal): The level of education
Monthly household income (continuous): The monthly household income of the respondent, measured in 1,000 ppp Euros
Here are some basic data wrangling functions using the data set.
I want to clean up the data and get change the .a to NA, as well as make responses the same type of variable (for example: the life satisfaction variable is a mix of numbers and words but I want to change it to all numbers).
# Replacing NA
EVSData[EVSData == ".a"] <- NA
# Recoding life satisfaction
EVSData <- EVSData %>%
mutate(Satisfaction = recode(Satisfaction,
"Dissatisfied" = 1,
"2" = 2,
"3" = 3,
"4" = 4,
"5" = 5,
"6" = 6,
"7" = 7,
"8" = 8,
"9" = 9,
"Satisfied" = 10))
# Recoding number of children
EVSData <- EVSData %>%
mutate(Number_Children = recode(Number_Children,
"No children" = 0,
"1" = 1,
"2" = 2,
"3" = 3,
"4" = 4,
"5" = 5,
"6" = 6,
"7" = 7,
"8" = 8,
"9" = 9,
"10" = 10,
"11" = 11,
"12" = 12,
"13" = 13,
"14" = 14,
"15" = 15,
"16" = 16))
# If I want to look at only the results from Ireland:
EVSData_Ireland <- EVSData %>%
filter(Region == "Ireland")
# A tibble: 6 × 17
EVS_Wave Region Respondent_Numb… Health Satisfaction Work_Q1 Work_Q2
<chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
1 1981-19… Irela… 1 Good 10 <NA> <NA>
2 1981-19… Irela… 2 Very … 9 <NA> <NA>
3 1981-19… Irela… 3 Very … 10 <NA> <NA>
4 1981-19… Irela… 4 Very … 8 <NA> <NA>
5 1981-19… Irela… 5 Very … 7 <NA> <NA>
6 1981-19… Irela… 6 Very … 7 <NA> <NA>
# … with 10 more variables: Work_Q3 <chr>, Work_Q4 <chr>,
# Work_Q5 <chr>, Sex <chr>, Age <chr>, Marital_Status <chr>,
# Number_Children <dbl>, Education <chr>, Employment <chr>,
# Monthly_Income <chr>
# If I wanted to arrange the data based on satisfaction level and then age
EVSData_Ireland %>%
arrange(`Satisfaction`, `Age`)
# A tibble: 4,242 × 17
EVS_Wave Region Respondent_Number Health Satisfaction Work_Q1
<chr> <chr> <dbl> <chr> <dbl> <chr>
1 1981-1984 Ireland 81 Fair 1 <NA>
2 1981-1984 Ireland 71 Good 1 <NA>
3 1981-1984 Ireland 2035 Very good 1 <NA>
4 2008-2010 Ireland 351 Very good 1 Strongl…
5 1990-1993 Ireland 2813 Very good 1 <NA>
6 1990-1993 Ireland 408 Poor 1 <NA>
7 2008-2010 Ireland 731 Very good 1 Strongl…
8 1981-1984 Ireland 258 Good 1 <NA>
9 2008-2010 Ireland 676 Very good 1 Strongl…
10 1981-1984 Ireland 994 Very good 1 <NA>
# … with 4,232 more rows, and 11 more variables: Work_Q2 <chr>,
# Work_Q3 <chr>, Work_Q4 <chr>, Work_Q5 <chr>, Sex <chr>,
# Age <chr>, Marital_Status <chr>, Number_Children <dbl>,
# Education <chr>, Employment <chr>, Monthly_Income <chr>
Note: I had some trouble recoding the data without resulting in everything besides that which was specifically recoded coming back up as NA.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kimble (2022, April 11). Data Analytics and Computational Social Science: Kimble HW 2 - Revised/Final. Retrieved from
BibTeX citation
@misc{kimble2022kimble, author = {Kimble, Karen}, title = {Data Analytics and Computational Social Science: Kimble HW 2 - Revised/Final}, url = {}, year = {2022} }