challenge1
Lujia Li
dataset
Author

Lujia Li

Published

March 22, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Instructions

This document provides yaml header inforamtion you will need to replicate each week to submit your homework or other blog posts. Please observe the following conventions:

  • Save your own copy of this template as a blog post in the posts folder, naming it FirstLast_hwX.qmd
  • Edit the yaml header to change your author name - use the same name each week
  • include a description that is reader friendly
  • update the category list to indicate the type of submission, the data used, the main packages or techniques, your name, or any thing else to make your document easy to find
  • edit as a normal qmd/rmd file
Code
library(reader)
Loading required package: NCmisc

Attaching package: 'reader'
The following objects are masked from 'package:NCmisc':

    cat.path, get.ext, rmv.ext
Code
australian_marriage_tidy <- read_csv("posts/_data/australian_marriage_tidy.csv")
Error: 'posts/_data/australian_marriage_tidy.csv' does not exist in current working directory ('/Users/lujiali/Desktop/601_Spring_new/posts').
Code
View (australian_marriage_tidy)
Error in as.data.frame(x): object 'australian_marriage_tidy' not found
Code
x <- c(2,3,4,5)
mean(x)
[1] 3.5

Rendering your post

When you click the Render button a document will be generated that includes both content and the output of embedded code.

Warning

Be sure that you have moved your *.qmd file into the posts folder BEFORE you render it, so that all files are stored in the correct location.

Important

Only render a single file - don’t try to render the whole website!

Pilot Student Blogs

We are piloting a workflow including individual student websites with direted and limited pull requests back to course blogs. Please let us know if you would like to participate.

Reading in data files

The easiest data source to use - at least initially - is to choose something easily accessible, either from our _data folder provided, or from an online source that is publicly available.

Using Other Data

If you would like to use a source that you have access to and it is small enough and you don’t mind making it public, you can copy it into the _data file and include in your commit and pull request.

Using Private Data

If you would like to use a proprietary source of data, that should be possible using the same process outlined above. There may initially be a few issues. We hope to have this feature working smoothly soon!

Code
install.packages("xlsx")
Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
Code
mydata2<-read_csv("/Users/lujiali/Documents/UMass/Courses/2023 Spring/DACSS 601/BRFSS.csv")
Rows: 55 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): State, Location 1
dbl (7): Year, Smoke everyday, Smoke some days, Former smoker, Never smoked,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
# need to name the dataset before "read_csv"
read_csv("/Users/lujiali/Documents/UMass/Courses/2023 Spring/DACSS 601/BRFSS.csv", skip = 3)
Rows: 52 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Nevada, Nevada
(39.49323999972637, -117.07183999971608)
dbl (7): 2011, 18, 4.9, 24.6, 52.5, 481, 10

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 52 × 9
   `2011` Nevada                    `18` `4.9` `24.6` `52.5` Nevad…¹ `481`  `10`
    <dbl> <chr>                    <dbl> <dbl>  <dbl>  <dbl> <chr>   <dbl> <dbl>
 1   2011 South Dakota              15.5   7.6   25.5   51.5 "South…  2732    21
 2   2011 Idaho                     12.4   4.8   24.4   58.5 "Idaho…  1687     5
 3   2011 Nebraska                  14.5   5.4   24.3   55.7 "Nebra…  2243    19
 4   2011 Massachusetts             13.3   4.9   28.3   53.5 "Massa…  1919    25
 5   2011 Washington                12.1   5.4   26     56.5 "Washi…  2956     6
 6   2011 Iowa                      15.5   4.8   25     54.6 "Iowa\…   281    16
 7   2011 Minnesota                 13.2   5.9   26.2   54.7 "Minne…   392     1
 8   2011 New Hampshire             15.4   4.1   29.2   51.4 "New H…  2405    26
 9   2011 Colorado                  12.3   6     27.2   54.6 "Color…  1398     9
10   2011 Nationwide (States and …  15.4   5.7   25.1   52.9  <NA>      NA    NA
# … with 42 more rows, and abbreviated variable name
#   ¹​`Nevada\n(39.49323999972637, -117.07183999971608)`
Code
filter(mydata2, State== "Massachusetts")
# A tibble: 1 × 9
   Year State         Smoke eve…¹ Smoke…² Forme…³ Never…⁴ Locat…⁵ Count…⁶ States
  <dbl> <chr>               <dbl>   <dbl>   <dbl>   <dbl> <chr>     <dbl>  <dbl>
1  2011 Massachusetts        13.3     4.9    28.3    53.5 "Massa…    1919     25
# … with abbreviated variable names ¹​`Smoke everyday`, ²​`Smoke some days`,
#   ³​`Former smoker`, ⁴​`Never smoked`, ⁵​`Location 1`, ⁶​Counties
Code
install.packages("summarytools")
Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
Code
summary(mydata2)
      Year         State           Smoke everyday  Smoke some days
 Min.   :2011   Length:55          Min.   : 8.50   Min.   :3.300  
 1st Qu.:2011   Class :character   1st Qu.:13.25   1st Qu.:5.250  
 Median :2011   Mode  :character   Median :15.40   Median :5.700  
 Mean   :2011                      Mean   :15.51   Mean   :5.804  
 3rd Qu.:2011                      3rd Qu.:17.55   3rd Qu.:6.200  
 Max.   :2011                      Max.   :23.80   Max.   :9.400  
                                                                  
 Former smoker    Never smoked    Location 1           Counties   
 Min.   :14.60   Min.   :45.60   Length:55          Min.   :  94  
 1st Qu.:23.95   1st Qu.:51.00   Class :character   1st Qu.: 806  
 Median :25.00   Median :52.90   Mode  :character   Median :1673  
 Mean   :24.89   Mean   :53.78                      Mean   :1600  
 3rd Qu.:26.20   3rd Qu.:55.65                      3rd Qu.:2398  
 Max.   :31.60   Max.   :72.20                      Max.   :3218  
                                                    NA's   :4     
     States    
 Min.   : 1.0  
 1st Qu.:13.5  
 Median :26.0  
 Mean   :26.0  
 3rd Qu.:38.5  
 Max.   :51.0  
 NA's   :4     
Code
install.packages(dfSummary)
Error in install.packages(dfSummary): object 'dfSummary' not found
Code
pivot_longer(mydata2, names_to = "Smoke everyday", value_to = "Smoke some days")
Error in `pivot_longer()`:
! `cols` must select at least one column.
Error in `pivot_longer()`:
! Arguments in `...` must be used.
✖ Problematic argument:
• value_to = "Smoke some days"
Code
data0330<-read_csv("/Users/lujiali/Documents/UMass/Courses/2023 Spring/DACSS 601/BRFSS.csv")
pivot_longer(data0330, 'Massachusstes : Kansas', names_to = "State", values = "Smoke Everyday", "Smoke some days"
#I cannot figure out what is gong wrong here
Error: <text>:4:0: unexpected end of input
2: pivot_longer(data0330, 'Massachusstes : Kansas', names_to = "State", values = "Smoke Everyday", "Smoke some days"
3: #I cannot figure out what is gong wrong here
  ^