Harsha Kanakeswar Gudipudi
Joining Data

Harsha Kanaka Eswar Gudipudi


May 16, 2022


knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

(be sure to only include the category tags for the data you use!)

eggs_chicken_df <- read_csv("_data/FAOSTAT_egg_chicken.csv")
livestock_df <- read_csv("_data/FAOSTAT_livestock.csv")
dairy_df <- read_csv("_data/FAOSTAT_cattle_dairy.csv")

Printing the first five rows of every dataset

Understanding the dimensions of data set 1

Understanding the dimensions of data set 2

Understanding the dimensions of data set 3

Understanding the columns of data set 1

Understanding the columns of data set 2

Understanding the columns of data set 3

Briefly describe the data

I am using three data sets having information on eggs and chicken, livestock and dairy producing animals. The data sets contain information on livestock, live animals, eggs and chicken country wise. I would like to combine these three datasets to do an exploratory analysis country wise, item code wise etc.

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

Checking to see how many entries in the 1st dataset has null values.

Checking to see how many entries in the 1st dataset has null values.

Checking to see how many entries in the 1st dataset has null values.

The datasets are not very tidy but so I would like to clean them.

Cleaning the 3 data sets

eggs_chicken_df<- na.omit(eggs_chicken_df)
livestock_df<- na.omit(livestock_df)
dairy_df<- na.omit(dairy_df)

Understanding the dimensions of the data sets post cleaning.

##Join Data

dairy_livestock_df <-full_join(dairy_df,livestock_df)

Understanding the columns of Dairy-Livestock combined data frame.

dairy_livestock_eggschicken_df <- full_join(dairy_livestock_df,eggs_chicken_df)
Understanding columns of overall combined data

Understanding dimensions of overall combined data

With the final combined data set which was created by joining 3 individual data sets, we can answer a lot of questions. As an example, I attempted to see how the total value of Milk Production in Afghanistan is changing over years.I observed an exponential increase in terms of value as illustrated in the graph below.

milk_production <- dairy_livestock_eggschicken_df %>%
  filter(Element == "Production", Item == "Milk, whole fresh cow", Area == "Afghanistan" )

# Plotting the Milk Production over the years
ggplot(milk_production, aes(x = Year, y = Value)) +
  geom_line() +
  labs(x = "Year", y = "Milk Production") +
  ggtitle("Milk Production in Afghanistan") +