Challenge 1

challenge1

Lujia Li

dataset

Author

Lujia Li

Published

March 22, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Instructions

This document provides yaml header inforamtion you will need to replicate each week to submit your homework or other blog posts. Please observe the following conventions:

Save your own copy of this template as a blog post in the posts folder, naming it FirstLast_hwX.qmd
Edit the yaml header to change your author name - use the same name each week
include a description that is reader friendly
update the category list to indicate the type of submission, the data used, the main packages or techniques, your name, or any thing else to make your document easy to find
edit as a normal qmd/rmd file

Code

library(reader)

Loading required package: NCmisc


Attaching package: 'reader'

The following objects are masked from 'package:NCmisc':

    cat.path, get.ext, rmv.ext

Code

australian_marriage_tidy <- read_csv("posts/_data/australian_marriage_tidy.csv")

Error: 'posts/_data/australian_marriage_tidy.csv' does not exist in current working directory ('/Users/lujiali/Desktop/601_Spring_new/posts').

Code

View (australian_marriage_tidy)

Error in as.data.frame(x): object 'australian_marriage_tidy' not found

Code

x <- c(2,3,4,5)
mean(x)

[1] 3.5

Rendering your post

When you click the Render button a document will be generated that includes both content and the output of embedded code.

Warning

Be sure that you have moved your *.qmd file into the posts folder BEFORE you render it, so that all files are stored in the correct location.

Important

Only render a single file - don’t try to render the whole website!

Pilot Student Blogs

We are piloting a workflow including individual student websites with direted and limited pull requests back to course blogs. Please let us know if you would like to participate.

Reading in data files

The easiest data source to use - at least initially - is to choose something easily accessible, either from our _data folder provided, or from an online source that is publicly available.

Using Other Data

If you would like to use a source that you have access to and it is small enough and you don’t mind making it public, you can copy it into the _data file and include in your commit and pull request.

Using Private Data

If you would like to use a proprietary source of data, that should be possible using the same process outlined above. There may initially be a few issues. We hope to have this feature working smoothly soon!

Code

install.packages("xlsx")

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

Code

mydata2<-read_csv("/Users/lujiali/Documents/UMass/Courses/2023 Spring/DACSS 601/BRFSS.csv")

Rows: 55 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): State, Location 1
dbl (7): Year, Smoke everyday, Smoke some days, Former smoker, Never smoked,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

# need to name the dataset before "read_csv"
read_csv("/Users/lujiali/Documents/UMass/Courses/2023 Spring/DACSS 601/BRFSS.csv", skip = 3)

Rows: 52 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Nevada, Nevada
(39.49323999972637, -117.07183999971608)
dbl (7): 2011, 18, 4.9, 24.6, 52.5, 481, 10

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 52 × 9
   `2011` Nevada                    `18` `4.9` `24.6` `52.5` Nevad…¹ `481`  `10`
    <dbl> <chr>                    <dbl> <dbl>  <dbl>  <dbl> <chr>   <dbl> <dbl>
 1   2011 South Dakota              15.5   7.6   25.5   51.5 "South…  2732    21
 2   2011 Idaho                     12.4   4.8   24.4   58.5 "Idaho…  1687     5
 3   2011 Nebraska                  14.5   5.4   24.3   55.7 "Nebra…  2243    19
 4   2011 Massachusetts             13.3   4.9   28.3   53.5 "Massa…  1919    25
 5   2011 Washington                12.1   5.4   26     56.5 "Washi…  2956     6
 6   2011 Iowa                      15.5   4.8   25     54.6 "Iowa\…   281    16
 7   2011 Minnesota                 13.2   5.9   26.2   54.7 "Minne…   392     1
 8   2011 New Hampshire             15.4   4.1   29.2   51.4 "New H…  2405    26
 9   2011 Colorado                  12.3   6     27.2   54.6 "Color…  1398     9
10   2011 Nationwide (States and …  15.4   5.7   25.1   52.9  <NA>      NA    NA
# … with 42 more rows, and abbreviated variable name
#   ¹`Nevada\n(39.49323999972637, -117.07183999971608)`

Code

filter(mydata2, State== "Massachusetts")

# A tibble: 1 × 9
   Year State         Smoke eve…¹ Smoke…² Forme…³ Never…⁴ Locat…⁵ Count…⁶ States
  <dbl> <chr>               <dbl>   <dbl>   <dbl>   <dbl> <chr>     <dbl>  <dbl>
1  2011 Massachusetts        13.3     4.9    28.3    53.5 "Massa…    1919     25
# … with abbreviated variable names ¹`Smoke everyday`, ²`Smoke some days`,
#   ³`Former smoker`, ⁴`Never smoked`, ⁵`Location 1`, ⁶Counties

Code

install.packages("summarytools")

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

Code

summary(mydata2)

      Year         State           Smoke everyday  Smoke some days
 Min.   :2011   Length:55          Min.   : 8.50   Min.   :3.300  
 1st Qu.:2011   Class :character   1st Qu.:13.25   1st Qu.:5.250  
 Median :2011   Mode  :character   Median :15.40   Median :5.700  
 Mean   :2011                      Mean   :15.51   Mean   :5.804  
 3rd Qu.:2011                      3rd Qu.:17.55   3rd Qu.:6.200  
 Max.   :2011                      Max.   :23.80   Max.   :9.400  
                                                                  
 Former smoker    Never smoked    Location 1           Counties   
 Min.   :14.60   Min.   :45.60   Length:55          Min.   :  94  
 1st Qu.:23.95   1st Qu.:51.00   Class :character   1st Qu.: 806  
 Median :25.00   Median :52.90   Mode  :character   Median :1673  
 Mean   :24.89   Mean   :53.78                      Mean   :1600  
 3rd Qu.:26.20   3rd Qu.:55.65                      3rd Qu.:2398  
 Max.   :31.60   Max.   :72.20                      Max.   :3218  
                                                    NA's   :4     
     States    
 Min.   : 1.0  
 1st Qu.:13.5  
 Median :26.0  
 Mean   :26.0  
 3rd Qu.:38.5  
 Max.   :51.0  
 NA's   :4

Code

install.packages(dfSummary)

Error in install.packages(dfSummary): object 'dfSummary' not found

Code

pivot_longer(mydata2, names_to = "Smoke everyday", value_to = "Smoke some days")

Error in `pivot_longer()`:
! `cols` must select at least one column.

Error in `pivot_longer()`:
! Arguments in `...` must be used.
✖ Problematic argument:
• value_to = "Smoke some days"

Code

data0330<-read_csv("/Users/lujiali/Documents/UMass/Courses/2023 Spring/DACSS 601/BRFSS.csv")
pivot_longer(data0330, 'Massachusstes : Kansas', names_to = "State", values = "Smoke Everyday", "Smoke some days"
#I cannot figure out what is gong wrong here

Error: <text>:4:0: unexpected end of input
2: pivot_longer(data0330, 'Massachusstes : Kansas', names_to = "State", values = "Smoke Everyday", "Smoke some days"
3: #I cannot figure out what is gong wrong here
  ^