Challenge1_KatiePopiela

Read in a dataset and describe it with words and visuals

#For my purposes I'll be reading in the birds.csv dataset. The data appears to record the number of various birds (ducks, geese, chickens, etc.) in different countries during the mid 20th-early 21st century. 

library(readr)
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6     ✔ dplyr   1.0.9
✔ tibble  3.1.8     ✔ stringr 1.4.0
✔ tidyr   1.2.0     ✔ forcats 0.5.1
✔ purrr   0.3.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(ggplot2)

birds_1<-read.csv("_data/birds.csv")
head(birds_1)

  Domain.Code       Domain Area.Code        Area Element.Code Element Item.Code
1          QA Live Animals         2 Afghanistan         5112  Stocks      1057
2          QA Live Animals         2 Afghanistan         5112  Stocks      1057
3          QA Live Animals         2 Afghanistan         5112  Stocks      1057
4          QA Live Animals         2 Afghanistan         5112  Stocks      1057
5          QA Live Animals         2 Afghanistan         5112  Stocks      1057
6          QA Live Animals         2 Afghanistan         5112  Stocks      1057
      Item Year.Code Year      Unit Value Flag Flag.Description
1 Chickens      1961 1961 1000 Head  4700    F     FAO estimate
2 Chickens      1962 1962 1000 Head  4900    F     FAO estimate
3 Chickens      1963 1963 1000 Head  5000    F     FAO estimate
4 Chickens      1964 1964 1000 Head  5300    F     FAO estimate
5 Chickens      1965 1965 1000 Head  5500    F     FAO estimate
6 Chickens      1966 1966 1000 Head  5800    F     FAO estimate

#Below is the head of the dataset. At first glance it looks ok, but the dimensions, as I will show, require the data to be trimmed down.

dim(birds_1)

[1] 30977    14

#There are 30,977 rows and 14 columns in this dataset (too many), so I will use the select function(s) to isolate specific variables; I want to look at the record of duck numbers in Czechoslovakia.

colnames(birds_1)

 [1] "Domain.Code"      "Domain"           "Area.Code"        "Area"            
 [5] "Element.Code"     "Element"          "Item.Code"        "Item"            
 [9] "Year.Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag.Description"

birds_refined<- select(birds_1,Area,Item,Value,Year)

birds_ref2 <- birds_refined %>%
  filter(Area=="Czechoslovakia") %>%
  filter(Item=="Ducks")

head(birds_ref2)

            Area  Item Value Year
1 Czechoslovakia Ducks   434 1961
2 Czechoslovakia Ducks   344 1962
3 Czechoslovakia Ducks   368 1963
4 Czechoslovakia Ducks   431 1964
5 Czechoslovakia Ducks   344 1965
6 Czechoslovakia Ducks   324 1966

#The recorded number of ducks in Czechoslovakia fluctuates between being in the 300,000s and the 700,000s. Lets visualize it!

ggplot(birds_ref2, aes(x=Year,y=Value)) + geom_jitter() + labs(y="Number of Ducks by 1000")