Final Project Assignment#1: Megan Galarneau

final_Project_assignment_1
final_project_world_energy_consumption
Megan Galarneau
Project & Data Description
Author

Megan Galarneau

Published

April 11, 2023

Code
library(tidyverse)
library(ggplot2)
library(dplyr)
library(lubridate)
library(patchwork)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Part 1. Introduction

Let’s talk about the blue marble we call home. It is no mystery that the climate crisis and greenhouse gas emissions (carbon dioxide, methane, nitrous oxide and fluoridated gases) have dominated the news cycles (and for some, our head space). With each new record setting year of global warming, the rise of non-fossil fuel energy sources (hydro, nuclear, solar, wind and biofuel power) has not been able to keep up with the growing demand for more energy (Thunberg, G., & Peters, G., 2022). For high-income countries in the US & Europe, energy use has begun to flatten and solar and wind power is sufficient enough to capture energy demands (Thunberg, G., & Peters, G., 2022). This has resulted in a slow fall in CO2 emissions. However, this is not the case in middle to low-income countries where the standard of living is simply different (Thunberg, G., & Peters, G., 2022). Here, demand for more energy soars in a young energy infrastructure system and solar and wind power are not cutting it (Thunberg, G., & Peters, G., 2022). As a result, fossil fuels and CO2 emissions rise in these areas. There is no one size fits all policy or solution for all countries to solve the climate crisis. But we can seek to understand global energy consumption and production today and how it has changed over time to identify trends and the drivers of climate change on a granular level.

To investigate this topic, I will be using the data set “World Energy Consumption” (2020). It was collected, aggregated, and documented by Hannah Ritchie, Pablo Rosado, and Max Roser. Primary data sources include BP Statistical Review of World Energy, SHIFT Data Portal, and EMBER - Global Electricity Dashboard. Other data sources include United Nations, World Bank, Gapminder, and Maddison Project Database. The complete codebook is available here. It is published and regularly updated by Our World In Data, an organization whose mission is to “make data and research on the world’s largest problems understandable and accessible”. They make data produced by third parties available and open access. I originally found this data set on Kaggle.com by collaborator Pralabh Poudel.

The “World Energy Consumption” data set describes global energy consumption and production by primary energy, per capita, growth rates, energy mix, and electricity mix. Each row represents each of these variables split by country and year. There are 242 unique countries represented ranging from 1900 to 2019 and about 11 primary energy sources:

  • Fossil Fuels: coal, oil, gas

  • Non-Fossil Fuels: biofuel, hydro, nuclear, solar, wind, low carbon, renewables, and other renewables

With this data set, I would like to answer the following research questions:

  • What is the global fossil fuel consumption, measured in terawatt-hours (sum of primary energy from coal, oil and gas)?

    • Investigate high income vs. middle to low-income countries
  • What is the global non-fossil fuel consumption, measured in terawatt-hours (sum of primary energy from biofuel, hydro, nuclear, solar, wind, low carbon, renewables, and other renewables)?

    • Investigate high income vs. middle to low-income countries

    • Which countries are the early adopters of non-fossil fuel energy?

  • What is the annual percentage change in primary energy consumption per country? (+ terawatt-hours)

  • What are the consumption-based CO2 emissions per capita of each country?

  • What is the correlation between population/GDP and energy consumption rates?

    • Investigate high income vs. middle to low-income countries

Part 2. Describe the data set(s)

The following code was used to describe this data set. In the summary section, I have chosen to omit some of the energy source columns listed above because the data set is too large (17,432 rows x 122 columns). Each of the rows represents a country, year, GDP, population, and the corresponding energy consumption and production information. There are about 11 primary energy source types:

  • Fossil Fuels: coal, oil, gas

  • Non-Fossil Fuels: biofuel, hydro, nuclear, solar, wind, low carbon, renewables, and other renewables

The data set analyzes these energy sources by annual percentage change (also in terawatt-hours), electricity generation, share of electricity consumption, share of primary energy consumption, per capita primary energy consumption, and more.

Please note that the data was altered to standardize the names of countries and regions according to Our World in Data, recalculate primary energy in terawatt-hours, and calculate per capita figures (which are calculated from the population metric). Population figures are sourced from Gapminder and UN World Population Prospects (UNWPP).

Read the dataset

Code
#read in the raw data set
library(readr)
raw_world_energy <- read_csv("MeganGalarneau_FinalProjectData/World_Energy_Consumption.csv")

Descriptive information of the dataset

Code
head(raw_world_energy)
Code
dim(raw_world_energy)
[1] 17432   122
Code
length(unique(raw_world_energy))
[1] 122

Summary statistics of the datasets (min, max, mean, median, etc.)

Code
#summary of data set statistics
summary_world_energy <- raw_world_energy %>%
select(-contains('gas')) %>%
select(-contains('coal'))%>%
select(-contains('oil'))%>%
select(-contains('hydro'))%>%
select(-contains('biofuel'))%>%
select(-contains('nuclear'))%>%
select(-contains('low_carbon'))%>%
select(-contains('solar'))%>%
select(-contains('wind'))%>%
select(-contains('other_renewables'))
print(summarytools::dfSummary(summary_world_energy,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

summary_world_energy

Dimensions: 17432 x 31
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
iso_code [character]
1. ARG
2. AUS
3. AUT
4. BEL
5. BGD
6. BGR
7. BOL
8. BRA
9. CAN
10. CHL
[ 206 others ]
121 ( 0.8% )
121 ( 0.8% )
121 ( 0.8% )
121 ( 0.8% )
121 ( 0.8% )
121 ( 0.8% )
121 ( 0.8% )
121 ( 0.8% )
121 ( 0.8% )
121 ( 0.8% )
14420 ( 92.3% )
1802 (10.3%)
country [character]
1. Argentina
2. Australia
3. Austria
4. Bangladesh
5. Belgium
6. Bolivia
7. Brazil
8. Bulgaria
9. Canada
10. Chile
[ 232 others ]
121 ( 0.7% )
121 ( 0.7% )
121 ( 0.7% )
121 ( 0.7% )
121 ( 0.7% )
121 ( 0.7% )
121 ( 0.7% )
121 ( 0.7% )
121 ( 0.7% )
121 ( 0.7% )
16222 ( 93.1% )
0 (0.0%)
year [numeric]
Mean (sd) : 1973.1 (34.3)
min ≤ med ≤ max:
1900 ≤ 1983 ≤ 2020
IQR (CV) : 56 (0)
121 distinct values 0 (0.0%)
energy_cons_change_pct [numeric]
Mean (sd) : Inf (NaN)
min ≤ med ≤ max:
-92.6 ≤ 2.6 ≤ Inf
IQR (CV) : 7.3 (NaN)
7753 distinct values 7590 (43.5%)
energy_cons_change_twh [numeric]
Mean (sd) : 36.9 (267.4)
min ≤ med ≤ max:
-6083.4 ≤ 0.7 ≤ 6446.8
IQR (CV) : 10.4 (7.3)
7128 distinct values 7540 (43.3%)
carbon_intensity_elec [numeric]
Mean (sd) : 394.1 (261.6)
min ≤ med ≤ max:
11 ≤ 353 ≤ 1116
IQR (CV) : 325 (0.7)
416 distinct values 16844 (96.6%)
electricity_generation [numeric]
Mean (sd) : 303 (1565.4)
min ≤ med ≤ max:
0 ≤ 17.3 ≤ 25899.8
IQR (CV) : 69.9 (5.2)
5303 distinct values 11313 (64.9%)
fossil_electricity [numeric]
Mean (sd) : 238 (1128.9)
min ≤ med ≤ max:
0 ≤ 5.6 ≤ 16233.1
IQR (CV) : 70.1 (4.7)
3901 distinct values 12333 (70.7%)
other_renewable_electricity [numeric]
Mean (sd) : 5.2 (31)
min ≤ med ≤ max:
0 ≤ 0 ≤ 702.9
IQR (CV) : 0.7 (6)
1883 distinct values 11348 (65.1%)
renewables_electricity [numeric]
Mean (sd) : 64.3 (340.5)
min ≤ med ≤ max:
0 ≤ 2.8 ≤ 7492.5
IQR (CV) : 17.3 (5.3)
4023 distinct values 11348 (65.1%)
energy_per_gdp [numeric]
Mean (sd) : 1.8 (1.6)
min ≤ med ≤ max:
0 ≤ 1.4 ≤ 13.5
IQR (CV) : 1.5 (0.9)
3276 distinct values 10532 (60.4%)
energy_per_capita [numeric]
Mean (sd) : 29602.5 (75226.3)
min ≤ med ≤ max:
0 ≤ 13776.7 ≤ 1676610
IQR (CV) : 33611 (2.5)
8864 distinct values 8397 (48.2%)
fossil_cons_change_pct [numeric]
Mean (sd) : 3.4 (12)
min ≤ med ≤ max:
-52.6 ≤ 2.6 ≤ 492.8
IQR (CV) : 7.3 (3.5)
3814 distinct values 13231 (75.9%)
fossil_share_energy [numeric]
Mean (sd) : 87.1 (14.9)
min ≤ med ≤ max:
17.2 ≤ 92.2 ≤ 100
IQR (CV) : 16.2 (0.2)
3551 distinct values 13148 (75.4%)
fossil_cons_change_twh [numeric]
Mean (sd) : 48.3 (309.8)
min ≤ med ≤ max:
-2147.3 ≤ 5.4 ≤ 5478
IQR (CV) : 28.9 (6.4)
4065 distinct values 13231 (75.9%)
fossil_fuel_consumption [numeric]
Mean (sd) : 2737.6 (11071.9)
min ≤ med ≤ max:
0.9 ≤ 296.3 ≤ 136761.6
IQR (CV) : 946.7 (4)
4273 distinct values 13148 (75.4%)
fossil_energy_per_capita [numeric]
Mean (sd) : 32913.4 (34982.9)
min ≤ med ≤ max:
124.1 ≤ 25509.7 ≤ 317582.5
IQR (CV) : 29321 (1.1)
4284 distinct values 13148 (75.4%)
fossil_cons_per_capita [numeric]
Mean (sd) : 2419.3 (3111)
min ≤ med ≤ max:
0 ≤ 1323.3 ≤ 20640.9
IQR (CV) : 3393 (1.3)
4560 distinct values 12673 (72.7%)
fossil_share_elec [numeric]
Mean (sd) : 75.7 (68.3)
min ≤ med ≤ max:
0 ≤ 77.2 ≤ 806.1
IQR (CV) : 54.6 (0.9)
4077 distinct values 12376 (71.0%)
other_renewable_consumption [numeric]
Mean (sd) : 16.4 (83.2)
min ≤ med ≤ max:
0 ≤ 0.1 ≤ 1614
IQR (CV) : 4.1 (5.1)
1951 distinct values 13142 (75.4%)
per_capita_electricity [numeric]
Mean (sd) : 4016.8 (5078.8)
min ≤ med ≤ max:
0 ≤ 2521.3 ≤ 57661.4
IQR (CV) : 4949.6 (1.3)
5377 distinct values 11933 (68.5%)
population [numeric]
Mean (sd) : 62862803 (379389950)
min ≤ med ≤ max:
2000 ≤ 6523861 ≤ 7713467904
IQR (CV) : 18307181 (6)
12730 distinct values 1756 (10.1%)
primary_energy_consumption [numeric]
Mean (sd) : 1672.2 (8827.1)
min ≤ med ≤ max:
0 ≤ 67.3 ≤ 162194.3
IQR (CV) : 381.3 (5.3)
9168 distinct values 7298 (41.9%)
renewables_elec_per_capita [numeric]
Mean (sd) : 1173.3 (3932.9)
min ≤ med ≤ max:
0 ≤ 195.4 ≤ 57655.8
IQR (CV) : 817.7 (3.4)
4698 distinct values 11933 (68.5%)
renewables_share_elec [numeric]
Mean (sd) : 30.1 (32)
min ≤ med ≤ max:
0 ≤ 16.9 ≤ 100
IQR (CV) : 52.3 (1.1)
4849 distinct values 11391 (65.3%)
renewables_cons_change_pct [numeric]
Mean (sd) : 11.9 (148)
min ≤ med ≤ max:
-92.6 ≤ 4 ≤ 8345.7
IQR (CV) : 16.8 (12.4)
3616 distinct values 13604 (78.0%)
renewables_share_energy [numeric]
Mean (sd) : 9.8 (13.2)
min ≤ med ≤ max:
0 ≤ 5.1 ≤ 82.8
IQR (CV) : 12.2 (1.3)
3422 distinct values 13148 (75.4%)
renewables_cons_change_twh [numeric]
Mean (sd) : 8.7 (54.6)
min ≤ med ≤ max:
-260.1 ≤ 0.2 ≤ 977.7
IQR (CV) : 3.7 (6.3)
3042 distinct values 13225 (75.9%)
renewables_consumption [numeric]
Mean (sd) : 239.3 (1042.1)
min ≤ med ≤ max:
0 ≤ 17.8 ≤ 18504.2
IQR (CV) : 85.1 (4.4)
3582 distinct values 13142 (75.4%)
renewables_energy_per_capita [numeric]
Mean (sd) : 4155.7 (12266.6)
min ≤ med ≤ max:
0 ≤ 727.6 ≤ 146224.5
IQR (CV) : 2524.3 (3)
3910 distinct values 13142 (75.4%)
gdp [numeric]
Mean (sd) : 541783311486 (4.083842e+12)
min ≤ med ≤ max:
196308000 ≤ 42816487424 ≤ 1.07e+14
IQR (CV) : 1.62462e+11 (7.5)
7943 distinct values 6976 (40.0%)

Generated by summarytools 1.0.1 (R version 4.2.2)
2023-04-11

3. The Tentative Plan for Visualization

For my analysis, I want to focus on fossil vs. non-fossil fuel energy consumption for high-income and middle to low-income countries over time. I would like to see if there are any trends against carbon intensity of electricity production (i.e. non-fossil fuel consumption = low carbon intensity). I also want to give a big picture of total primary energy consumption per country and provide insights on GDP correlation to energy consumption rates per country. Most of these visualizations will be ggplot line graphs with facet wraps as these research questions are looking for correlations. It would also be nice to have map plots to easily visualize the amount of energy consumption per country. I hope to determine a standardized scale for energy consumption by terawatt-hour.

While I have a plan for my visualizations, I need to tidy my data to achieve these goals. This will include finding the sum of multiple columns to produce average values for my non-fossil fuel graph. I will also need to select my high-income and middle to low-income countries for analysis and create new data set versions to work from. For the NA values, I will use the na.rm = TRUE function when graphing my data. I will definitely be dealing with missing data/NAs and outliers.

Bibliography

Hannah Ritchie, Max Roser and Pablo Rosado (2022) - “Energy”. Published online at OurWorldInData.org. Retrieved from: ‘https://ourworldindata.org/energy’ [Online Resource]

Thunberg, G., & Peters, G. (2022). ‘We are not moving in the right direction’. In The Climate Book. essay, Penguin Press.