Data Analytics and Computational Social Science: Kimble HW 2 - Revised/Final

Karen Kimble

Dataset

The data I am using is from the textbook “Doing Economics”, a part of the Core Econ project. The dataset is from the European Values Study (EVS), which is a large-scale survey program based on human values that has been done repeatedly in different years. The main topics are related to family, work, environment, politics, religion/morality, and national identity, according to the EVS website.

library(readxl)
EVSData <- read_excel("EVSData.xlsx", sheet = "Wave 1")
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 2"))
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 3"))
EVSData <- rbind(EVSData, read_excel("EVSData.xlsx", sheet = "Wave 4"))

# I want to assign new names to the variables

colnames(EVSData) <- c("EVS_Wave",
                       "Region",
                       "Respondent_Number",
                       "Health",
                       "Satisfaction",
                       "Work_Q1",
                       "Work_Q2",
                       "Work_Q3",
                       "Work_Q4",
                       "Work_Q5",
                       "Sex",
                       "Age",
                       "Marital_Status",
                       "Number_Children",
                       "Education",
                       "Employment",
                       "Monthly_Income")

print(head(EVSData))

# A tibble: 6 × 17
  EVS_Wave Region Respondent_Numb… Health Satisfaction Work_Q1 Work_Q2
  <chr>    <chr>             <dbl> <chr>  <chr>        <chr>   <chr>  
1 1981-19… Belgi…             1001 Fair   9            .a      .a     
2 1981-19… Belgi…             1002 Very … 9            .a      .a     
3 1981-19… Belgi…             1003 Poor   3            .a      .a     
4 1981-19… Belgi…             1004 Very … 9            .a      .a     
5 1981-19… Belgi…             1005 Very … 9            .a      .a     
6 1981-19… Belgi…             1006 Very … 9            .a      .a     
# … with 10 more variables: Work_Q3 <chr>, Work_Q4 <chr>,
#   Work_Q5 <chr>, Sex <chr>, Age <chr>, Marital_Status <chr>,
#   Number_Children <chr>, Education <chr>, Employment <chr>,
#   Monthly_Income <chr>

View(EVSData)

The Variables

The variables are: * EVS Wave (date or character): Describes which “wave” or set of years the respondent answered

Region (nominal): Which country or region the respondent is from
Respondent number (continuous): The original respondent number identifying which individual it is
Health (ordinal): The respondent’s state of health (subjective because it is self-reported) on a scale up to “Very Good”
Satisfaction with life (ordinal): How satisfied the respondent is with life on a scale of 1 (dissatisfied) to 10 (very satisfied)
Work Q1-Q5 (ordinal): These variables are answers to questions about the respondent’s attitude towards work on a Likert scale, with 1 being “Strongly Agree” and 5 being “Strongly Disagree”, as well as the option of “Don’t know” or to not answer at all.
Sex (nominal): The sex of the respondent
Age (continuous): The age of the respondent
Marital status (nominal): Whether the respondent was single/never married, divorced, widowed, married, or living with someone as if they were married.
Number of children (continuous): The number of living children the respondent has
Education (nominal): The level of education
Monthly household income (continuous): The monthly household income of the respondent, measured in 1,000 ppp Euros

Data Wrangling

Here are some basic data wrangling functions using the data set.

I want to clean up the data and get change the .a to NA, as well as make responses the same type of variable (for example: the life satisfaction variable is a mix of numbers and words but I want to change it to all numbers).

# Replacing NA
EVSData[EVSData == ".a"] <- NA

# Recoding life satisfaction
EVSData <- EVSData %>%
  mutate(Satisfaction = recode(Satisfaction,
                               "Dissatisfied" = 1,
                               "2" = 2,
                               "3" = 3,
                               "4" = 4,
                               "5" = 5,
                               "6" = 6,
                               "7" = 7,
                               "8" = 8,
                               "9" = 9,
                               "Satisfied" = 10))

# Recoding number of children
EVSData <- EVSData %>%
  mutate(Number_Children = recode(Number_Children,
                                  "No children" = 0,
                                  "1" = 1,
                                  "2" = 2,
                                  "3" = 3,
                                  "4" = 4,
                                  "5" = 5,
                                  "6" = 6,
                                  "7" = 7,
                                  "8" = 8,
                                  "9" = 9,
                                  "10" = 10,
                                  "11" = 11,
                                  "12" = 12,
                                  "13" = 13,
                                  "14" = 14,
                                  "15" = 15,
                                  "16" = 16))

# If I want to look at only the results from Ireland:

EVSData_Ireland <- EVSData %>%
  filter(Region == "Ireland")
head(EVSData_Ireland)

# A tibble: 6 × 17
  EVS_Wave Region Respondent_Numb… Health Satisfaction Work_Q1 Work_Q2
  <chr>    <chr>             <dbl> <chr>         <dbl> <chr>   <chr>  
1 1981-19… Irela…                1 Good             10 <NA>    <NA>   
2 1981-19… Irela…                2 Very …            9 <NA>    <NA>   
3 1981-19… Irela…                3 Very …           10 <NA>    <NA>   
4 1981-19… Irela…                4 Very …            8 <NA>    <NA>   
5 1981-19… Irela…                5 Very …            7 <NA>    <NA>   
6 1981-19… Irela…                6 Very …            7 <NA>    <NA>   
# … with 10 more variables: Work_Q3 <chr>, Work_Q4 <chr>,
#   Work_Q5 <chr>, Sex <chr>, Age <chr>, Marital_Status <chr>,
#   Number_Children <dbl>, Education <chr>, Employment <chr>,
#   Monthly_Income <chr>

View(EVSData_Ireland)

# If I wanted to arrange the data based on satisfaction level and then age

EVSData_Ireland %>%
 arrange(`Satisfaction`, `Age`)

# A tibble: 4,242 × 17
   EVS_Wave  Region  Respondent_Number Health    Satisfaction Work_Q1 
   <chr>     <chr>               <dbl> <chr>            <dbl> <chr>   
 1 1981-1984 Ireland                81 Fair                 1 <NA>    
 2 1981-1984 Ireland                71 Good                 1 <NA>    
 3 1981-1984 Ireland              2035 Very good            1 <NA>    
 4 2008-2010 Ireland               351 Very good            1 Strongl…
 5 1990-1993 Ireland              2813 Very good            1 <NA>    
 6 1990-1993 Ireland               408 Poor                 1 <NA>    
 7 2008-2010 Ireland               731 Very good            1 Strongl…
 8 1981-1984 Ireland               258 Good                 1 <NA>    
 9 2008-2010 Ireland               676 Very good            1 Strongl…
10 1981-1984 Ireland               994 Very good            1 <NA>    
# … with 4,232 more rows, and 11 more variables: Work_Q2 <chr>,
#   Work_Q3 <chr>, Work_Q4 <chr>, Work_Q5 <chr>, Sex <chr>,
#   Age <chr>, Marital_Status <chr>, Number_Children <dbl>,
#   Education <chr>, Employment <chr>, Monthly_Income <chr>

View(EVSData_Ireland)

Note: I had some trouble recoding the data without resulting in everything besides that which was specifically recoded coming back up as NA.

Comment on this article Share:

Kimble HW 2 - Revised/Final

Dataset

The Variables

Data Wrangling

Reuse

Citation