Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Yoshita Varma Annam
December 20, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
# A tibble: 30,977 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966
7 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1967 1967
8 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1968 1968
9 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1969 1969
10 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1970 1970
# … with 30,967 more rows, 4 more variables: Unit <chr>, Value <dbl>,
# Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
# ¹`Domain Code`, ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
By just viewing the data it looks like the data is about 30,977 bird entries and detailing for 14 features. The features mainly have domain code, domain area and area to describe the location of the birds. For the bird type item and item code explains it. There is also an entry for the year to time the data. To understand further we need to perform more operations.
Domain Code Domain Area Code Area
Length:30977 Length:30977 Min. : 1 Length:30977
Class :character Class :character 1st Qu.: 79 Class :character
Mode :character Mode :character Median : 156 Mode :character
Mean :1202
3rd Qu.: 231
Max. :5504
Element Code Element Item Code Item
Min. :5112 Length:30977 Min. :1057 Length:30977
1st Qu.:5112 Class :character 1st Qu.:1057 Class :character
Median :5112 Mode :character Median :1068 Mode :character
Mean :5112 Mean :1066
3rd Qu.:5112 3rd Qu.:1072
Max. :5112 Max. :1083
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:30977 Min. : 0
1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 171
Median :1992 Median :1992 Mode :character Median : 1800
Mean :1991 Mean :1991 Mean : 99411
3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 15404
Max. :2018 Max. :2018 Max. :23707134
NA's :1036
Flag Flag Description
Length:30977 Length:30977
Class :character Class :character
Mode :character Mode :character
[1] "Domain Code" "Domain" "Area Code" "Area"
[5] "Element Code" "Element" "Item Code" "Item"
[9] "Year Code" "Year" "Unit" "Value"
[13] "Flag" "Flag Description"
[1] "Afghanistan"
[2] "Albania"
[3] "Algeria"
[4] "American Samoa"
[5] "Angola"
[6] "Antigua and Barbuda"
[7] "Argentina"
[8] "Armenia"
[9] "Aruba"
[10] "Australia"
[11] "Austria"
[12] "Azerbaijan"
[13] "Bahamas"
[14] "Bahrain"
[15] "Bangladesh"
[16] "Barbados"
[17] "Belarus"
[18] "Belgium"
[19] "Belgium-Luxembourg"
[20] "Belize"
[21] "Benin"
[22] "Bermuda"
[23] "Bhutan"
[24] "Bolivia (Plurinational State of)"
[25] "Bosnia and Herzegovina"
[26] "Botswana"
[27] "Brazil"
[28] "Brunei Darussalam"
[29] "Bulgaria"
[30] "Burkina Faso"
[31] "Burundi"
[32] "Cabo Verde"
[33] "Cambodia"
[34] "Cameroon"
[35] "Canada"
[36] "Cayman Islands"
[37] "Central African Republic"
[38] "Chad"
[39] "Chile"
[40] "China, Hong Kong SAR"
[41] "China, Macao SAR"
[42] "China, mainland"
[43] "China, Taiwan Province of"
[44] "Colombia"
[45] "Comoros"
[46] "Congo"
[47] "Cook Islands"
[48] "Costa Rica"
[49] "Côte d'Ivoire"
[50] "Croatia"
[51] "Cuba"
[52] "Cyprus"
[53] "Czechia"
[54] "Czechoslovakia"
[55] "Democratic People's Republic of Korea"
[56] "Democratic Republic of the Congo"
[57] "Denmark"
[58] "Dominica"
[59] "Dominican Republic"
[60] "Ecuador"
[61] "Egypt"
[62] "El Salvador"
[63] "Equatorial Guinea"
[64] "Eritrea"
[65] "Estonia"
[66] "Eswatini"
[67] "Ethiopia"
[68] "Ethiopia PDR"
[69] "Falkland Islands (Malvinas)"
[70] "Fiji"
[71] "Finland"
[72] "France"
[73] "French Guyana"
[74] "French Polynesia"
[75] "Gabon"
[76] "Gambia"
[77] "Georgia"
[78] "Germany"
[79] "Ghana"
[80] "Greece"
[81] "Grenada"
[82] "Guadeloupe"
[83] "Guam"
[84] "Guatemala"
[85] "Guinea"
[86] "Guinea-Bissau"
[87] "Guyana"
[88] "Haiti"
[89] "Honduras"
[90] "Hungary"
[91] "Iceland"
[92] "India"
[93] "Indonesia"
[94] "Iran (Islamic Republic of)"
[95] "Iraq"
[96] "Ireland"
[97] "Israel"
[98] "Italy"
[99] "Jamaica"
[100] "Japan"
[101] "Jordan"
[102] "Kazakhstan"
[103] "Kenya"
[104] "Kiribati"
[105] "Kuwait"
[106] "Kyrgyzstan"
[107] "Lao People's Democratic Republic"
[108] "Latvia"
[109] "Lebanon"
[110] "Lesotho"
[111] "Liberia"
[112] "Libya"
[113] "Liechtenstein"
[114] "Lithuania"
[115] "Luxembourg"
[116] "Madagascar"
[117] "Malawi"
[118] "Malaysia"
[119] "Mali"
[120] "Malta"
[121] "Martinique"
[122] "Mauritania"
[123] "Mauritius"
[124] "Mexico"
[125] "Micronesia (Federated States of)"
[126] "Mongolia"
[127] "Montenegro"
[128] "Montserrat"
[129] "Morocco"
[130] "Mozambique"
[131] "Myanmar"
[132] "Namibia"
[133] "Nauru"
[134] "Nepal"
[135] "Netherlands"
[136] "Netherlands Antilles (former)"
[137] "New Caledonia"
[138] "New Zealand"
[139] "Nicaragua"
[140] "Niger"
[141] "Nigeria"
[142] "Niue"
[143] "North Macedonia"
[144] "Norway"
[145] "Oman"
[146] "Pacific Islands Trust Territory"
[147] "Pakistan"
[148] "Palestine"
[149] "Panama"
[150] "Papua New Guinea"
[151] "Paraguay"
[152] "Peru"
[153] "Philippines"
[154] "Poland"
[155] "Portugal"
[156] "Puerto Rico"
[157] "Qatar"
[158] "Republic of Korea"
[159] "Republic of Moldova"
[160] "Réunion"
[161] "Romania"
[162] "Russian Federation"
[163] "Rwanda"
[164] "Saint Helena, Ascension and Tristan da Cunha"
[165] "Saint Kitts and Nevis"
[166] "Saint Lucia"
[167] "Saint Pierre and Miquelon"
[168] "Saint Vincent and the Grenadines"
[169] "Samoa"
[170] "Sao Tome and Principe"
[171] "Saudi Arabia"
[172] "Senegal"
[173] "Serbia"
[174] "Serbia and Montenegro"
[175] "Seychelles"
[176] "Sierra Leone"
[177] "Singapore"
[178] "Slovakia"
[179] "Slovenia"
[180] "Solomon Islands"
[181] "Somalia"
[182] "South Africa"
[183] "South Sudan"
[184] "Spain"
[185] "Sri Lanka"
[186] "Sudan"
[187] "Sudan (former)"
[188] "Suriname"
[189] "Sweden"
[190] "Switzerland"
[191] "Syrian Arab Republic"
[192] "Tajikistan"
[193] "Thailand"
[194] "Timor-Leste"
[195] "Togo"
[196] "Tokelau"
[197] "Tonga"
[198] "Trinidad and Tobago"
[199] "Tunisia"
[200] "Turkey"
[201] "Turkmenistan"
[202] "Tuvalu"
[203] "Uganda"
[204] "Ukraine"
[205] "United Arab Emirates"
[206] "United Kingdom of Great Britain and Northern Ireland"
[207] "United Republic of Tanzania"
[208] "United States of America"
[209] "United States Virgin Islands"
[210] "Uruguay"
[211] "USSR"
[212] "Uzbekistan"
[213] "Vanuatu"
[214] "Venezuela (Bolivarian Republic of)"
[215] "Viet Nam"
[216] "Wallis and Futuna Islands"
[217] "Yemen"
[218] "Yugoslav SFR"
[219] "Zambia"
[220] "Zimbabwe"
[221] "World"
[222] "Africa"
[223] "Eastern Africa"
[224] "Middle Africa"
[225] "Northern Africa"
[226] "Southern Africa"
[227] "Western Africa"
[228] "Americas"
[229] "Northern America"
[230] "Central America"
[231] "Caribbean"
[232] "South America"
[233] "Asia"
[234] "Central Asia"
[235] "Eastern Asia"
[236] "Southern Asia"
[237] "South-eastern Asia"
[238] "Western Asia"
[239] "Europe"
[240] "Eastern Europe"
[241] "Northern Europe"
[242] "Southern Europe"
[243] "Western Europe"
[244] "Oceania"
[245] "Australia and New Zealand"
[246] "Melanesia"
[247] "Micronesia"
[248] "Polynesia"
[1] 248
[1] "Chickens" "Ducks" "Geese and guinea fowls"
[4] "Turkeys" "Pigeons, other birds"
[1] 5
After the following analysis it is clear that the data has been collected across the world for different countries from 1961 to 2018. The data is very specific to few types of the birds like chickens, ducks etc. We can say that the data is majorly from the poultry farms across the world. This might have other animals since they are only focusing the birds the entries are restricted to poultry birds. This data might be used to keep a track of the poultry birds in the that area with the count. There are total 248 countries for 5 different kind of birds including other birds. However, some the rows are still not clear of their purpose.
# A tibble: 290 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1961 1961
2 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1962 1962
3 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1963 1963
4 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1964 1964
5 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1965 1965
6 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1966 1966
7 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1967 1967
8 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1968 1968
9 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1969 1969
10 QA Live … 5400 Euro… 5112 Stocks 1057 Chic… 1970 1970
# … with 280 more rows, 4 more variables: Unit <chr>, Value <dbl>, Flag <chr>,
# `Flag Description` <chr>, and abbreviated variable names ¹`Domain Code`,
# ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
Domain Code Domain Area Code Area
Length:290 Length:290 Min. :5400 Length:290
Class :character Class :character 1st Qu.:5400 Class :character
Mode :character Mode :character Median :5400 Mode :character
Mean :5400
3rd Qu.:5400
Max. :5400
Element Code Element Item Code Item
Min. :5112 Length:290 Min. :1057 Length:290
1st Qu.:5112 Class :character 1st Qu.:1068 Class :character
Median :5112 Mode :character Median :1072 Mode :character
Mean :5112 Mean :1072
3rd Qu.:5112 3rd Qu.:1079
Max. :5112 Max. :1083
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:290 Min. : 2417
1st Qu.:1975 1st Qu.:1975 Class :character 1st Qu.: 12047
Median :1990 Median :1990 Mode :character Median : 31402
Mean :1990 Mean :1990 Mean : 421435
3rd Qu.:2004 3rd Qu.:2004 3rd Qu.: 109158
Max. :2018 Max. :2018 Max. :2486932
Flag Flag Description
Length:290 Length:290
Class :character Class :character
Mode :character Mode :character
---
title: "Challenge 1"
author: "Yoshita Varma Annam"
description: "Reading in data and creating a post"
date: "12/20/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xlsx ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
library(readr)
birds_csv <- read_csv("_data/birds.csv")
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
## View of the data
```{r}
#| label: understanding the data
birds_csv
```
By just viewing the data it looks like the data is about 30,977 bird entries and detailing for 14 features. The features mainly have domain code, domain area and area to describe the location of the birds. For the bird type item and item code explains it. There is also an entry for the year to time the data. To understand further we need to perform more operations.
## Analyzing the data
```{r}
#| label: summary of the data
summary(birds_csv)
```
```{r}
#| label: column names the data
colnames(birds_csv)
```
```{r}
#| label: finding unique values of the data
unique(birds_csv$Area)
length(unique(birds_csv$Area))
unique(birds_csv$Item)
length(unique(birds_csv$Item))
```
After the following analysis it is clear that the data has been collected across the world for different countries from 1961 to 2018. The data is very specific to few types of the birds like chickens, ducks etc. We can say that the data is majorly from the poultry farms across the world. This might have other animals since they are only focusing the birds the entries are restricted to poultry birds. This data might be used to keep a track of the poultry birds in the that area with the count. There are total 248 countries for 5 different kind of birds including other birds. However, some the rows are still not clear of their purpose.
## Viewing data for a particular Country for further analysis
```{r}
#| label: data for United Stated of America
birds_csv_america <- filter(birds_csv, Area == "Europe")
birds_csv_america
```
```{r}
summary(birds_csv_america)
```