DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1 Solution

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge 1 Tasks
    • Task 1: Reading in the Data
    • Task 2: Describing the Country Group Data

Challenge 1 Solution

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
shelton
faostat
Author

Dane Shelton

Published

September 16, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge 1 Tasks

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Task 1: Reading in the Data

Used function read_csv() to read in FAO Country Groups dataset.

Data was clean for Tibbles format already, did not need to alter any arguments other than filename.

Code
# Block output from showing

#| include: false

# Using Readr package function read_csv() to Import FAO Stat Country

# Assigning variable name "country_code" to tibble created from read_csv

 country_code <- read_csv("_data/FAOSTAT_country_groups.csv")

# Remember to add desired data into working directory before calling read_csv 

Task 2: Describing the Country Group Data

Code
# Checking to type of country_code
# str(country_code)

# r-default dataframe

# Coercing to a Tibble and viewing

# country_code <- as_tibble(country_code)

# country_code
# 1947 x 7 

# Viewed Tibble in Another Window
# Viewed Head to determine variable types

head(country_code)
# A tibble: 6 × 7
  `Country Group Code` `Country Group` Country…¹ Country M49 C…² ISO2 …³ ISO3 …⁴
                 <dbl> <chr>               <dbl> <chr>   <chr>   <chr>   <chr>  
1                 5100 Africa                  4 Algeria 012     DZ      DZA    
2                 5100 Africa                  7 Angola  024     AO      AGO    
3                 5100 Africa                 53 Benin   204     BJ      BEN    
4                 5100 Africa                 20 Botswa… 072     BW      BWA    
5                 5100 Africa                233 Burkin… 854     BF      BFA    
6                 5100 Africa                 29 Burundi 108     BI      BDI    
# … with abbreviated variable names ¹​`Country Code`, ²​`M49 Code`, ³​`ISO2 Code`,
#   ⁴​`ISO3 Code`
Code
# which(is.na(country_code[,'M49 Code']==TRUE))

# Used to determine whether any countries in the dataset were not in UN

Description of Country Groups Data

Country Groups contains 1943 observations (rows) and 7 variables (columns).

The variables and their types are as follows: “Country Group Code” (double-precision), “Country Group” (character), “Country Code” , “Country” , “M49 Code” , “ISO2 Code” , “ISO3 Code” .

This data meticulously categorizes all the countries recognized by the United Nations, first by region (Africa, Americas, Asia…), then further geographically (South America, Central Asia, East Africa…). Each observation has standardized codes in multiple formats representing the associated nation.

Within the variable “Country Group” , which already categorized countries geographically in detail, the countries are further classified by socioeconomic indicators like “Low Income Economy” , “High Income Economy” , “Net Food Importing Country”. This data was likely collected from an organization like the United Nations as it uses common language in the manner it categorizes countries.

Other Useful Functions and Notes

which((is.na(...))): directly return index of vals missing

complete.cases(): return all complete obs

na.omit() : omits all obs with missing values

na.rm() : Logical argument option within some functions to remove NA for a calculation

summary(): more in depth, shows what variables have missing vals etc

glimpse(): less in depth

min(), max()

IQR(), range(), diff()

dim(), colnames(), starts/ends_with(), contains()

length(), str(), typeof()

Source Code
---
title: "Challenge 1 Solution"
author: "Dane Shelton"
desription: "Reading in data and creating a post"
date: "09/16/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - shelton
  - faostat
editor: 
  markdown: 
    wrap: 72
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge 1 Tasks

1)  read in a dataset, and

2)  describe the dataset using both words and any supporting information
    (e.g., tables, etc)

### Task 1: Reading in the Data

Used function `read_csv()` to read in FAO Country Groups dataset.

Data was clean for Tibbles format already, did not need to alter any
arguments other than filename.

```{r}
#| label: Reading In Country Groups Data

# Block output from showing

#| include: false

# Using Readr package function read_csv() to Import FAO Stat Country

# Assigning variable name "country_code" to tibble created from read_csv

 country_code <- read_csv("_data/FAOSTAT_country_groups.csv")

# Remember to add desired data into working directory before calling read_csv 

```

### Task 2: Describing the Country Group Data

```{r}
#| label: Summary of Country Code Data
#| include: true
# Checking to type of country_code
# str(country_code)

# r-default dataframe

# Coercing to a Tibble and viewing

# country_code <- as_tibble(country_code)

# country_code
# 1947 x 7 

# Viewed Tibble in Another Window
# Viewed Head to determine variable types

head(country_code)

# which(is.na(country_code[,'M49 Code']==TRUE))

# Used to determine whether any countries in the dataset were not in UN
```

#### Description of Country Groups Data

Country Groups contains 1943 observations (rows) and 7 variables
(columns).

The variables and their types are as follows: "Country Group Code" <dbl>
(double-precision), "Country Group" <chr> (character), "Country Code"
<dbl>, "Country" <chr>, "M49 Code" <chr>, "ISO2 Code" <chr>, "ISO3 Code"
<chr>.

This data meticulously categorizes all the countries recognized by the
United Nations, first by region (Africa, Americas, Asia...), then
further geographically (South America, Central Asia, East Africa...).
Each observation has standardized codes in multiple formats representing
the associated nation.

Within the variable "Country Group" , which already categorized
countries geographically in detail, the countries are *further*
classified by socioeconomic indicators like "Low Income Economy" , "High
Income Economy" , "Net Food Importing Country". This data was likely
collected from an organization like the United Nations as it uses common
language in the manner it categorizes countries.

##### Other Useful Functions and Notes

`which((is.na(...)))`: directly return index of vals missing

`complete.cases()`: return all complete obs

`na.omit()` : omits all obs with missing values

`na.rm()` : Logical argument option within some functions to remove NA
for a calculation

`summary()`: more in depth, shows what variables have missing vals etc

`glimpse()`: less in depth

`min()`, `max()`

`IQR()`, `range()`, `diff()`

`dim(`), `colnames()`, `starts/ends_with()`, `contains()`

`length()`, `str()`, `typeof()`