Kimble HW 2 - Revised/Final

This is my HW 2 for DACSS 601.

Karen Kimble
2022-04-07

Dataset

The data I am using is from the textbook “Doing Economics”, a part of the Core Econ project. The dataset is from the European Values Study (EVS), which is a large-scale survey program based on human values that has been done repeatedly in different years. The main topics are related to family, work, environment, politics, religion/morality, and national identity, according to the EVS website.

library(readxl)
EVSData <- read_excel("EVSData.xlsx", sheet = "Wave 1")
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 2"))
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 3"))
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 4"))

# I want to assign new names to the variables

colnames(EVSData) <- c("EVS_Wave",
                       "Region",
                       "Respondent_Number",
                       "Health",
                       "Satisfaction",
                       "Work_Q1",
                       "Work_Q2",
                       "Work_Q3",
                       "Work_Q4",
                       "Work_Q5",
                       "Sex",
                       "Age",
                       "Marital_Status",
                       "Number_Children",
                       "Education",
                       "Employment",
                       "Monthly_Income")

print(head(EVSData))
# A tibble: 6 × 17
  EVS_Wave Region Respondent_Numb… Health Satisfaction Work_Q1 Work_Q2
  <chr>    <chr>             <dbl> <chr>  <chr>        <chr>   <chr>  
1 1981-19… Belgi…             1001 Fair   9            .a      .a     
2 1981-19… Belgi…             1002 Very … 9            .a      .a     
3 1981-19… Belgi…             1003 Poor   3            .a      .a     
4 1981-19… Belgi…             1004 Very … 9            .a      .a     
5 1981-19… Belgi…             1005 Very … 9            .a      .a     
6 1981-19… Belgi…             1006 Very … 9            .a      .a     
# … with 10 more variables: Work_Q3 <chr>, Work_Q4 <chr>,
#   Work_Q5 <chr>, Sex <chr>, Age <chr>, Marital_Status <chr>,
#   Number_Children <chr>, Education <chr>, Employment <chr>,
#   Monthly_Income <chr>
View(EVSData)

The Variables

The variables are: * EVS Wave (date or character): Describes which “wave” or set of years the respondent answered

Data Wrangling

Here are some basic data wrangling functions using the data set.

I want to clean up the data and get change the .a to NA, as well as make responses the same type of variable (for example: the life satisfaction variable is a mix of numbers and words but I want to change it to all numbers).

# Replacing NA
EVSData[EVSData == ".a"] <- NA

# Recoding life satisfaction
EVSData <- EVSData %>%
  mutate(Satisfaction = recode(Satisfaction,
                               "Dissatisfied" = 1,
                               "2" = 2,
                               "3" = 3,
                               "4" = 4,
                               "5" = 5,
                               "6" = 6,
                               "7" = 7,
                               "8" = 8,
                               "9" = 9,
                               "Satisfied" = 10))

# Recoding number of children
EVSData <- EVSData %>%
  mutate(Number_Children = recode(Number_Children,
                                  "No children" = 0,
                                  "1" = 1,
                                  "2" = 2,
                                  "3" = 3,
                                  "4" = 4,
                                  "5" = 5,
                                  "6" = 6,
                                  "7" = 7,
                                  "8" = 8,
                                  "9" = 9,
                                  "10" = 10,
                                  "11" = 11,
                                  "12" = 12,
                                  "13" = 13,
                                  "14" = 14,
                                  "15" = 15,
                                  "16" = 16))
# If I want to look at only the results from Ireland:

EVSData_Ireland <- EVSData %>%
  filter(Region == "Ireland")
head(EVSData_Ireland)
# A tibble: 6 × 17
  EVS_Wave Region Respondent_Numb… Health Satisfaction Work_Q1 Work_Q2
  <chr>    <chr>             <dbl> <chr>         <dbl> <chr>   <chr>  
1 1981-19… Irela…                1 Good             10 <NA>    <NA>   
2 1981-19… Irela…                2 Very …            9 <NA>    <NA>   
3 1981-19… Irela…                3 Very …           10 <NA>    <NA>   
4 1981-19… Irela…                4 Very …            8 <NA>    <NA>   
5 1981-19… Irela…                5 Very …            7 <NA>    <NA>   
6 1981-19… Irela…                6 Very …            7 <NA>    <NA>   
# … with 10 more variables: Work_Q3 <chr>, Work_Q4 <chr>,
#   Work_Q5 <chr>, Sex <chr>, Age <chr>, Marital_Status <chr>,
#   Number_Children <dbl>, Education <chr>, Employment <chr>,
#   Monthly_Income <chr>
View(EVSData_Ireland)
# If I wanted to arrange the data based on satisfaction level and then age

EVSData_Ireland %>%
 arrange(`Satisfaction`, `Age`)
# A tibble: 4,242 × 17
   EVS_Wave  Region  Respondent_Number Health    Satisfaction Work_Q1 
   <chr>     <chr>               <dbl> <chr>            <dbl> <chr>   
 1 1981-1984 Ireland                81 Fair                 1 <NA>    
 2 1981-1984 Ireland                71 Good                 1 <NA>    
 3 1981-1984 Ireland              2035 Very good            1 <NA>    
 4 2008-2010 Ireland               351 Very good            1 Strongl…
 5 1990-1993 Ireland              2813 Very good            1 <NA>    
 6 1990-1993 Ireland               408 Poor                 1 <NA>    
 7 2008-2010 Ireland               731 Very good            1 Strongl…
 8 1981-1984 Ireland               258 Good                 1 <NA>    
 9 2008-2010 Ireland               676 Very good            1 Strongl…
10 1981-1984 Ireland               994 Very good            1 <NA>    
# … with 4,232 more rows, and 11 more variables: Work_Q2 <chr>,
#   Work_Q3 <chr>, Work_Q4 <chr>, Work_Q5 <chr>, Sex <chr>,
#   Age <chr>, Marital_Status <chr>, Number_Children <dbl>,
#   Education <chr>, Employment <chr>, Monthly_Income <chr>
View(EVSData_Ireland)

Note: I had some trouble recoding the data without resulting in everything besides that which was specifically recoded coming back up as NA.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Kimble (2022, April 11). Data Analytics and Computational Social Science: Kimble HW 2 - Revised/Final. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkkimble886403/

BibTeX citation

@misc{kimble2022kimble,
  author = {Kimble, Karen},
  title = {Data Analytics and Computational Social Science: Kimble HW 2 - Revised/Final},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkkimble886403/},
  year = {2022}
}