Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Theresa Szczepanski
September 16, 2022
state county total_employees
Length:2930 Length:2930 Min. : 1.00
Class :character Class :character 1st Qu.: 7.00
Mode :character Mode :character Median : 21.00
Mean : 87.18
3rd Qu.: 65.00
Max. :8207.00
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
# … with 2,920 more rows
[1] 53 1
The Railroad
data set consists of 2930 observations of three variables: state
, county
, and total_employees
of type character
, character
, and double
respectively. The minimum number of employees is 1, at several counties and the maximum is 8207 in Cook county Illinois.
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 IL COOK 8207
2 TX TARRANT 4235
3 NE DOUGLAS 3797
4 NY SUFFOLK 3685
5 VA INDEPENDENT CITY 3249
6 FL DUVAL 3073
7 CA SAN BERNARDINO 2888
8 CA LOS ANGELES 2545
9 TX HARRIS 2535
10 NE LINCOLN 2289
# … with 2,920 more rows
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AK SITKA 1
2 AL BARBOUR 1
3 AL HENRY 1
4 AP APO 1
5 AR NEWTON 1
6 CA MONO 1
7 CO BENT 1
8 CO CHEYENNE 1
9 CO COSTILLA 1
10 CO DOLORES 1
# … with 2,920 more rows
There are 53 distinct entries in the state
column. The 50 United states’ codes are represented as well as
state
Length:53
Class :character
Mode :character
# A tibble: 53 × 1
state
<chr>
1 AE
2 AK
3 AL
4 AP
5 AR
6 AZ
7 CA
8 CO
9 CT
10 DC
# … with 43 more rows
The cases of this data set represent a unique State and Country pairing. The number of employees, possibly represents the number of Railroad employees for a given State and County pairing.
Domain Code Domain Area Code Area
Length:30977 Length:30977 Min. : 1 Length:30977
Class :character Class :character 1st Qu.: 79 Class :character
Mode :character Mode :character Median : 156 Mode :character
Mean :1202
3rd Qu.: 231
Max. :5504
Element Code Element Item Code Item
Min. :5112 Length:30977 Min. :1057 Length:30977
1st Qu.:5112 Class :character 1st Qu.:1057 Class :character
Median :5112 Mode :character Median :1068 Mode :character
Mean :5112 Mean :1066
3rd Qu.:5112 3rd Qu.:1072
Max. :5112 Max. :1083
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:30977 Min. : 0
1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 171
Median :1992 Median :1992 Mode :character Median : 1800
Mean :1991 Mean :1991 Mean : 99411
3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 15404
Max. :2018 Max. :2018 Max. :23707134
NA's :1036
Flag Flag Description
Length:30977 Length:30977
Class :character Class :character
Mode :character Mode :character
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961 1000…
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962 1000…
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963 1000…
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964 1000…
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965 1000…
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
The birds
data consists of 8 variables of character type and 6 variables of double type. The data seem to describe the population of stock
or domesticated fowl in regions of the world for given years.
Domain
: For this data set all of the cases have the same Domain
and Domain Code
representing “live animals”.
# A tibble: 1 × 2
`Domain Code` Domain
<chr> <chr>
1 QA Live Animals
Area
consists of 248 entries. Notably, the entries with values less than 5000 represent Countries of the world. The numeric codes correspond with the the Alphabetical order of the country names. When the area code has a value of 5000, it represents the entire world and when the code is greater than 5000, it then corresponds to regions of the world rather than a specific country. In these cases, regions with numbers closer in value have closer geographic proximity. It should be noted that there is a value for Europe as well as a value for Easter Europe and Western Europe, so there are regions that are represented in multiple cases of these entries.
# A tibble: 248 × 2
`Area Code` Area
<dbl> <chr>
1 2 Afghanistan
2 3 Albania
3 4 Algeria
4 5 American Samoa
5 7 Angola
6 8 Antigua and Barbuda
7 9 Argentina
8 1 Armenia
9 22 Aruba
10 10 Australia
# … with 238 more rows
# A tibble: 248 × 2
`Area Code` Area
<dbl> <chr>
1 1 Armenia
2 2 Afghanistan
3 3 Albania
4 4 Algeria
5 5 American Samoa
6 7 Angola
7 8 Antigua and Barbuda
8 9 Argentina
9 10 Australia
10 11 Austria
# … with 238 more rows
# A tibble: 248 × 2
`Area Code` Area
<dbl> <chr>
1 5504 Polynesia
2 5503 Micronesia
3 5502 Melanesia
4 5501 Australia and New Zealand
5 5500 Oceania
6 5404 Western Europe
7 5403 Southern Europe
8 5402 Northern Europe
9 5401 Eastern Europe
10 5400 Europe
# … with 238 more rows
# A tibble: 28 × 2
`Area Code` Area
<dbl> <chr>
1 5000 World
2 5100 Africa
3 5101 Eastern Africa
4 5102 Middle Africa
5 5103 Northern Africa
6 5104 Southern Africa
7 5105 Western Africa
8 5200 Americas
9 5203 Northern America
10 5204 Central America
# … with 18 more rows
Element
: For this data set all of the cases have the same Element
and Element Code
representing “stocks”.
# A tibble: 1 × 2
`Element Code` Element
<dbl> <chr>
1 5112 Stocks
Item
: For this data set, all of the observations are of items of type chicken, duck, geese and guinea fowls, turkeys, or pigeons/other birds.
# A tibble: 5 × 2
`Item Code` Item
<dbl> <chr>
1 1057 Chickens
2 1068 Ducks
3 1072 Geese and guinea fowls
4 1079 Turkeys
5 1083 Pigeons, other birds
Item
Item Code Chickens Ducks Geese and guinea fowls Pigeons, other birds Turkeys
1057 13074 0 0 0 0
1068 0 6909 0 0 0
1072 0 0 4136 0 0
1079 0 0 0 0 5693
1083 0 0 0 1165 0
Year
, ‘Unit’, ‘Value’: For a given observation, there is the year the observation was made (between 1961 and 2018), and the number of livestock counted as a value
with units
of 1000 head. 4700 represents, 4,700,000 heads of the bird observed.
# A tibble: 1 × 1
Unit
<chr>
1 1000 Head
Year Unit Value
Min. :1961 Length:30977 Min. : 0
1st Qu.:1976 Class :character 1st Qu.: 171
Median :1992 Mode :character Median : 1800
Mean :1991 Mean : 99411
3rd Qu.:2005 3rd Qu.: 15404
Max. :2018 Max. :23707134
NA's :1036
# A tibble: 30,977 × 3
Year Unit Value
<dbl> <chr> <dbl>
1 1961 1000 Head 4700
2 1962 1000 Head 4900
3 1963 1000 Head 5000
4 1964 1000 Head 5300
5 1965 1000 Head 5500
6 1966 1000 Head 5800
7 1967 1000 Head 6600
8 1968 1000 Head 6290
9 1969 1000 Head 6300
10 1970 1000 Head 6000
# … with 30,967 more rows
Unit
Flag
consists of 6 values describing the methodology by which the data was collected.
# A tibble: 6 × 2
Flag `Flag Description`
<chr> <chr>
1 F FAO estimate
2 <NA> Official data
3 Im FAO data based on imputation methodology
4 M Data not available
5 * Unofficial figure
6 A Aggregate, may include official, semi-official, estimated or calculated…
Flag
* A F Im M
1494 6488 10007 1213 1002
Element
: All observations are of Stocks.
---
title: "Challenge 1"
author: "Theresa Szczepanski"
desription: "Reading in Railroad data and creating a post"
date: "09/16/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- birds
- Theresa_Szczepanski
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Read in the Railroad Data
- railroad_2012_clean_county.csv ⭐
```{r}
Railroad <- read_csv("_data/railroad_2012_clean_county.csv")
summary(Railroad)
Railroad
States <-select(Railroad, state)
Num_States <-unique(States)
dim(Num_States)
#| label: railroad wrangling and finding the number of unique states
```
## Describe the data
The `Railroad` data set consists of 2930 observations of three variables: `state`, `county`, and `total_employees` of type
`character`, `character`, and `double` respectively. The minimum number of employees is 1, at several counties and the maximum is 8207 in Cook county Illinois.
```{r}
#Overview of Railroad
arrange(Railroad, desc(total_employees))
arrange(Railroad, total_employees)
#| label: Railroad summary
```
There are 53 distinct entries in the `state` column. The 50 United states' codes are represented as well as
- **DC**, for Washington D.C.
- *AE, APO*, unknown State/Territory, but AE, APO is possibly an Armed Forces Europe post office box.
- *AP, APO*, unknown State/Territory, but AP, APO is possibly an Armed Forces Pacific post office box.
```{r}
#Finding the number of unique states
States <-select(Railroad, state)
Num_States <-unique(States)
summary(Num_States)
Num_States
#| label: Num States
```
The cases of this data set represent a unique State and Country pairing. The number of employees, possibly represents the number of Railroad employees
for a given State and County pairing.
## Read in the Birds Data
```{r}
Birds <- read_csv("_data/birds.csv")
summary(Birds)
head(Birds)
#| label: birds wrangling
```
## Describe the Data
The `birds` data consists of 8 variables of character type and 6 variables of double type. The data seem to describe the population of `stock` or domesticated fowl in regions of the world for given years.
`Domain`: For this data set all of the cases have the same `Domain` and `Domain Code` representing "live animals".
```{r}
Domains <-select(Birds, "Domain Code", Domain)
Num_Domains <-unique(Domains)
Num_Domains
#| label: Domains info
```
`Area` consists of 248 entries. Notably, the entries with values less than 5000 represent Countries of the world. The numeric codes correspond with the the Alphabetical order of the country names. When the area code has a value of 5000, it represents the entire world and when the code is greater than 5000, it then corresponds to regions of the world rather than a specific country. In these cases, regions with numbers closer in value have closer geographic proximity. It should be noted that there is a value for Europe as well as a value for Easter Europe and Western Europe, so there are regions that are represented in multiple cases of these entries.
```{r}
Areas <-select(Birds, "Area Code", Area)
Num_Areas <-unique(Areas)
Num_Areas
arrange(Num_Areas, `Area Code`)
arrange(Num_Areas, desc(`Area Code`))
World_Region <- filter(Num_Areas, `Area Code` >= 5000)
arrange(World_Region, `Area Code`)
#| label: Area info
```
`Element`: For this data set all of the cases have the same `Element` and `Element Code` representing "stocks".
```{r}
Elements <-select(Birds, "Element Code", Element)
Num_Elements <-unique(Elements)
Num_Elements
#| label: Elements info
```
`Item`: For this data set, all of the observations are of items of type chicken, duck, geese and guinea fowls, turkeys, or pigeons/other birds.
```{r}
Items <-select(Birds, "Item Code", Item)
Num_Items <-unique(Items)
Num_Items
table(Items)
#| label: Items info
```
`Year`, 'Unit', 'Value': For a given observation, there is the year the observation was made (between 1961 and 2018), and the number of livestock counted
as a `value` with `units` of 1000 head. 4700 represents, 4,700,000 heads of the bird observed.
```{r}
Years_Values <-select(Birds, Year, Unit, Value)
Units <-select(Birds, Unit)
Num_Units <-unique(Units)
Num_Units
summary(Years_Values)
Years_Values
#| label: Years info
```
`Unit`
`Flag` consists of 6 values describing the methodology by which the data was collected.
```{r}
Flag_Descriptions <-select(Birds, Flag, `Flag Description`)
Num_Flag_Descriptions <-unique(Flag_Descriptions)
Num_Flag_Descriptions
Flags <-select(Birds, Flag)
table(Flags)
#| label: Flags info
```
`Element`: All observations are of Stocks.
```{r}
Elements <-select(Birds, "Element Code", Element)
Num_Elements <-unique(Elements)
dim(Elements)
Num_Elements
#| label: Flags info
```
## Further Challenges to come back to
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐