Code
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Tejaswini_Ketineni
August 21, 2022
I am going to work with one data set: 1. railroad_2012_clean_county.csv
Initially the data set-1(railroad_2012_clean_country) is read
Head function is used to understand the population of data
# A tibble: 6 × 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1There are about 2930 rows and about 3 columns present in the dataset.
[1] "state"           "county"          "total_employees"There are three columns present in the data set namely : state,county and total_employees present
[1] 0[1] 0There are no nulls or missing values present in the data set
    state              county          total_employees  
 Length:2930        Length:2930        Min.   :   1.00  
 Class :character   Class :character   1st Qu.:   7.00  
 Mode  :character   Mode  :character   Median :  21.00  
                                       Mean   :  87.18  
                                       3rd Qu.:  65.00  
                                       Max.   :8207.00     distinct_states
1:              53   distinct_county
1:            1709There are 53 distinct states and 1709 distinct counties present.
 AE  AK  AL  AP  AR  AZ  CA  CO  CT  DC  DE  FL  GA  HI  IA  ID  IL  IN  KS  KY 
  1   6  67   1  72  15  55  57   8   1   3  67 152   3  99  36 103  92  95 119 
 LA  MA  MD  ME  MI  MN  MO  MS  MT  NC  ND  NE  NH  NJ  NM  NV  NY  OH  OK  OR 
 63  12  24  16  78  86 115  78  53  94  49  89  10  21  29  12  61  88  73  33 
 PA  RI  SC  SD  TN  TX  UT  VA  VT  WA  WI  WV  WY 
 65   5  46  52  91 221  25  92  14  39  69  53  22 The data set taken is analysed and the following observations are made.There are about 2930 rows and about 3 columns namely (state, county and the total_employees) present in the data set.The data set is checked for null and missing values.We observe that there are no such values present and the data set is clean.The summary statistics are checked.the count of unique states present in the data set is 53 and 1709 unique counties are present.Tabulating the states and the total_employees, we see that the highest number of employees are present in Texas(TX) and Georgia(GA) while the lowest employee count is observed in Armed forces(AE),Armed forces Pacific(AP).We also observe that the no.of states with employee count <10 is very less.
---
title: "Challenge 1 Instructions"
author: "Tejaswini_Ketineni"
desription: "Reading in data and creating a post"
date: "08/21/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
I am going to work with one data set: 
1. railroad_2012_clean_county.csv
## Reading the Data
Initially the data set-1(railroad_2012_clean_country) is read
```{r}
library(readxl)
railroad_2012_clean_county <- read_csv("_data/railroad_2012_clean_county.csv")
View(railroad_2012_clean_county)
```
Head function is used to understand the population of data 
```{r}
head(railroad_2012_clean_county)
```
```{r}
rows_and_columns_ds1 <- dim(railroad_2012_clean_county)
rows_and_columns_ds1
```
There are about 2930 rows and about 3 columns present in the dataset.
```{r}
names_col <- colnames(railroad_2012_clean_county)
names_col
```
There are three columns present in the data set namely : state,county and total_employees present
```{r}
sum(is.na(railroad_2012_clean_county))
sum(is.null(railroad_2012_clean_county))
```
There are no nulls or missing values present in the data set
```{r}
summary(railroad_2012_clean_county)
```
```{r}
library(data.table)
data_railroad <- data.table(railroad_2012_clean_county)
data_railroad[, .(distinct_states = length(unique(state)))]
data_railroad[, .(distinct_county = length(unique(county)))]
```
There are 53 distinct states and 1709 distinct counties present.
```{r}
(table(railroad_2012_clean_county$state))
```
## Description of the data
 The data set taken is analysed and the following observations are made.There are about 2930 rows and about 3 columns namely (state, county and the total_employees) present in the data set.The data set is checked for null and missing values.We observe that there are no such values present and the data set is clean.The summary statistics are checked.the count of unique states present in the data set is 53 and 1709 unique counties are present.Tabulating the states and the total_employees, we see that the highest number of employees are present in Texas(TX) and Georgia(GA) while the lowest employee count is observed in Armed forces(AE),Armed forces Pacific(AP).We also observe that the no.of states with employee count <10 is very less.