Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Tejaswini_Ketineni
August 21, 2022
I am going to work with one data set: 1. railroad_2012_clean_county.csv
Initially the data set-1(railroad_2012_clean_country) is read
Head function is used to understand the population of data
# A tibble: 6 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
There are about 2930 rows and about 3 columns present in the dataset.
[1] "state" "county" "total_employees"
There are three columns present in the data set namely : state,county and total_employees present
[1] 0
[1] 0
There are no nulls or missing values present in the data set
state county total_employees
Length:2930 Length:2930 Min. : 1.00
Class :character Class :character 1st Qu.: 7.00
Mode :character Mode :character Median : 21.00
Mean : 87.18
3rd Qu.: 65.00
Max. :8207.00
distinct_states
1: 53
distinct_county
1: 1709
There are 53 distinct states and 1709 distinct counties present.
AE AK AL AP AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY
1 6 67 1 72 15 55 57 8 1 3 67 152 3 99 36 103 92 95 119
LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR
63 12 24 16 78 86 115 78 53 94 49 89 10 21 29 12 61 88 73 33
PA RI SC SD TN TX UT VA VT WA WI WV WY
65 5 46 52 91 221 25 92 14 39 69 53 22
The data set taken is analysed and the following observations are made.There are about 2930 rows and about 3 columns namely (state, county and the total_employees) present in the data set.The data set is checked for null and missing values.We observe that there are no such values present and the data set is clean.The summary statistics are checked.the count of unique states present in the data set is 53 and 1709 unique counties are present.Tabulating the states and the total_employees, we see that the highest number of employees are present in Texas(TX) and Georgia(GA) while the lowest employee count is observed in Armed forces(AE),Armed forces Pacific(AP).We also observe that the no.of states with employee count <10 is very less.
---
title: "Challenge 1 Instructions"
author: "Tejaswini_Ketineni"
desription: "Reading in data and creating a post"
date: "08/21/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
I am going to work with one data set:
1. railroad_2012_clean_county.csv
## Reading the Data
Initially the data set-1(railroad_2012_clean_country) is read
```{r}
library(readxl)
railroad_2012_clean_county <- read_csv("_data/railroad_2012_clean_county.csv")
View(railroad_2012_clean_county)
```
Head function is used to understand the population of data
```{r}
head(railroad_2012_clean_county)
```
```{r}
rows_and_columns_ds1 <- dim(railroad_2012_clean_county)
rows_and_columns_ds1
```
There are about 2930 rows and about 3 columns present in the dataset.
```{r}
names_col <- colnames(railroad_2012_clean_county)
names_col
```
There are three columns present in the data set namely : state,county and total_employees present
```{r}
sum(is.na(railroad_2012_clean_county))
sum(is.null(railroad_2012_clean_county))
```
There are no nulls or missing values present in the data set
```{r}
summary(railroad_2012_clean_county)
```
```{r}
library(data.table)
data_railroad <- data.table(railroad_2012_clean_county)
data_railroad[, .(distinct_states = length(unique(state)))]
data_railroad[, .(distinct_county = length(unique(county)))]
```
There are 53 distinct states and 1709 distinct counties present.
```{r}
(table(railroad_2012_clean_county$state))
```
## Description of the data
The data set taken is analysed and the following observations are made.There are about 2930 rows and about 3 columns namely (state, county and the total_employees) present in the data set.The data set is checked for null and missing values.We observe that there are no such values present and the data set is clean.The summary statistics are checked.the count of unique states present in the data set is 53 and 1709 unique counties are present.Tabulating the states and the total_employees, we see that the highest number of employees are present in Texas(TX) and Georgia(GA) while the lowest employee count is observed in Armed forces(AE),Armed forces Pacific(AP).We also observe that the no.of states with employee count <10 is very less.