Data Analytics and Computational Social Science: Ethan Campbell HW4

Ethan Campbell

Importing data

Importing and viewing the data to determine major cleaning changes that need to be made.

Fort_Worth <- read.csv("Fort_Worth_climate.csv", skip = 18)

head(Fort_Worth)

  PARAMETER YEAR   JAN   FEB   MAR   APR   MAY   JUN   JUL   AUG
1        PS 1981 99.17 99.00 98.53 98.64 98.18 98.29 98.49 98.45
2        PS 1982 98.81 99.11 98.50 98.52 98.27 98.28 98.51 98.53
3        PS 1983 98.91 98.51 98.06 98.08 98.23 98.26 98.63 98.59
4        PS 1984 99.40 98.61 98.42 98.03 98.38 98.36 98.49 98.47
5        PS 1985 99.24 99.08 98.57 98.47 98.26 98.42 98.49 98.42
6        PS 1986 99.22 98.56 98.70 98.41 98.27 98.40 98.58 98.55
    SEP   OCT   NOV   DEC   ANN
1 98.65 98.69 98.76 98.81 98.64
2 98.58 98.76 98.83 98.78 98.62
3 98.65 98.80 98.37 99.20 98.53
4 98.72 98.58 98.90 98.89 98.61
5 98.60 98.66 98.61 99.26 98.67
6 98.56 98.87 98.91 99.18 98.69

Month_combined <- Fort_Worth %>%
pivot_longer(
  cols = c(NOV, JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, DEC),
  names_to = "MONTH",
  values_to = "Month_AVG",
)

Month_combined

# A tibble: 4,800 x 5
   PARAMETER  YEAR   ANN MONTH Month_AVG
   <chr>     <int> <dbl> <chr>     <dbl>
 1 PS         1981  98.6 NOV        98.8
 2 PS         1981  98.6 JAN        99.2
 3 PS         1981  98.6 FEB        99  
 4 PS         1981  98.6 MAR        98.5
 5 PS         1981  98.6 APR        98.6
 6 PS         1981  98.6 MAY        98.2
 7 PS         1981  98.6 JUN        98.3
 8 PS         1981  98.6 JUL        98.5
 9 PS         1981  98.6 AUG        98.4
10 PS         1981  98.6 SEP        98.6
# ... with 4,790 more rows

Month_combined %>%
select("PARAMETER", "YEAR", "MONTH", "Month_AVG", "ANN")

# A tibble: 4,800 x 5
   PARAMETER  YEAR MONTH Month_AVG   ANN
   <chr>     <int> <chr>     <dbl> <dbl>
 1 PS         1981 NOV        98.8  98.6
 2 PS         1981 JAN        99.2  98.6
 3 PS         1981 FEB        99    98.6
 4 PS         1981 MAR        98.5  98.6
 5 PS         1981 APR        98.6  98.6
 6 PS         1981 MAY        98.2  98.6
 7 PS         1981 JUN        98.3  98.6
 8 PS         1981 JUL        98.5  98.6
 9 PS         1981 AUG        98.4  98.6
10 PS         1981 SEP        98.6  98.6
# ... with 4,790 more rows

Para_split <- Month_combined %>%
  pivot_wider(names_from = PARAMETER,
              values_from = Month_AVG,
              )
Para_split

# A tibble: 4,800 x 13
    YEAR   ANN MONTH    PS    TS   T2M  QV2M  RH2M WD50M WS10M WS50M
   <int> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  1981  98.6 NOV    98.8    NA    NA    NA    NA    NA    NA    NA
 2  1981  98.6 JAN    99.2    NA    NA    NA    NA    NA    NA    NA
 3  1981  98.6 FEB    99      NA    NA    NA    NA    NA    NA    NA
 4  1981  98.6 MAR    98.5    NA    NA    NA    NA    NA    NA    NA
 5  1981  98.6 APR    98.6    NA    NA    NA    NA    NA    NA    NA
 6  1981  98.6 MAY    98.2    NA    NA    NA    NA    NA    NA    NA
 7  1981  98.6 JUN    98.3    NA    NA    NA    NA    NA    NA    NA
 8  1981  98.6 JUL    98.5    NA    NA    NA    NA    NA    NA    NA
 9  1981  98.6 AUG    98.4    NA    NA    NA    NA    NA    NA    NA
10  1981  98.6 SEP    98.6    NA    NA    NA    NA    NA    NA    NA
# ... with 4,790 more rows, and 2 more variables: PRECTOTCORR <dbl>,
#   PRECTOTCORR_SUM <dbl>

Introduction

Here we notice that all the months are separated and that the PARAMETER column holds all of the unique values. I want to combine the months into one column and then spread the unique values. Once I complete this I can select the columns that I want and then remove the Na values. When I have this setup I will be able to run an analysis on the specific segment that I want.

Cleaning the Temperature column

The temperature column has NA values and is in C format so we need to rename the column and remove the Na values while switching it to F format.

# combining the columns of months to create the month column
Month_combined <- Fort_Worth %>%
pivot_longer(
  cols = c(NOV, JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, DEC),
  names_to = "MONTH",
  values_to = "Month_AVG",
)

# Viewing to make sure it is working correctly

Month_combined %>%
select("PARAMETER", "YEAR", "MONTH", "Month_AVG", "ANN")

# A tibble: 4,800 x 5
   PARAMETER  YEAR MONTH Month_AVG   ANN
   <chr>     <int> <chr>     <dbl> <dbl>
 1 PS         1981 NOV        98.8  98.6
 2 PS         1981 JAN        99.2  98.6
 3 PS         1981 FEB        99    98.6
 4 PS         1981 MAR        98.5  98.6
 5 PS         1981 APR        98.6  98.6
 6 PS         1981 MAY        98.2  98.6
 7 PS         1981 JUN        98.3  98.6
 8 PS         1981 JUL        98.5  98.6
 9 PS         1981 AUG        98.4  98.6
10 PS         1981 SEP        98.6  98.6
# ... with 4,790 more rows

# Splitting the parameter section to bring out each unique value 

Para_split <- Month_combined %>%
  pivot_wider(names_from = PARAMETER,
              values_from = Month_AVG,
              )
# Rename in the temperature column

Rename_temp <- Para_split %>%
  rename(Temperature = T2M)

# Mutating the temperature column to switch from C to F

Updated_temp <- Rename_temp %>%
  mutate(Temperature_F = Temperature * 9/5 + 32) %>%
  mutate(Annual_Temperature = ANN *9/5 + 32)

# Selecting the columns that are updated and then omitting the Na values

Final_Temperature <- Updated_temp %>%
  select(YEAR, MONTH, Temperature_F, Annual_Temperature) %>%
  na.omit(Temperature_F)

# Graphs for temperature

# Change in temperature over the years (monthly temperature)

ggplot(data = Final_Temperature, mapping = aes(x = YEAR, y = Temperature_F)) +
         geom_point(mapping = aes(color = MONTH)) +
         geom_smooth()

# A zoomed in look at the change in temperature over the years

ggplot(data = Final_Temperature) +
  geom_smooth(mapping = aes(x = YEAR, y = Annual_Temperature), se = FALSE)

# Slightly zoomed out version of the above graph that shows points for each year

ggplot(data = Final_Temperature, mapping = aes(x = YEAR, y = Annual_Temperature)) +
  geom_point(mapping = aes(color = Annual_Temperature)) +
  geom_smooth()

# attempted to do a violin graph that shows the change of each month combined over the years and then showed them separated by the facet.

ggplot(data = Final_Temperature) +
  geom_violin(mapping = aes(x = YEAR, y = Temperature_F, color = MONTH)) +
  facet_wrap(~ MONTH, nrow = 5)

Insights

I will start with temperature and I will need the columns YEAR, MONTH, ANN, and T2m. This will show the year that it occurred and the month. The t2m will show the temperature at 2meters above the surface level and will display it per month. The Ann will show the average for the year and this is good to use when looking at each year to get a comparison.

Cleaning Humidty

# Looking at table with needed columns
Para_split %>%
  select(YEAR, MONTH, RH2M, ANN) %>%
  na.omit(RH2M)

# A tibble: 480 x 4
    YEAR MONTH  RH2M   ANN
   <int> <chr> <dbl> <dbl>
 1  1981 NOV    80.2  68.8
 2  1981 JAN    69.4  68.8
 3  1981 FEB    67.2  68.8
 4  1981 MAR    67.4  68.8
 5  1981 APR    66.2  68.8
 6  1981 MAY    71.1  68.8
 7  1981 JUN    75.5  68.8
 8  1981 JUL    57.8  68.8
 9  1981 AUG    50.1  68.8
10  1981 SEP    63.5  68.8
# ... with 470 more rows

# renaming the columns
Percent_Humidty <- Para_split %>%
  rename(Humidity = RH2M) %>%
  rename(Annual_Humidity_percent = ANN)

# Placing in the right order and omitting the NA values
Final_Humidity <- Percent_Humidty %>%
  select(YEAR, MONTH, Humidity, Annual_Humidity_percent) %>%
  na.omit(Humidity)


# Graphing to see the changing in humidity over the years
ggplot(data = Final_Humidity, mapping = aes(x = YEAR, y = Humidity)) +
  geom_point() +
  geom_smooth()

# When was humidity the highest (Jan of 1998)

Final_Humidity %>%
  select(YEAR, MONTH, Humidity, Annual_Humidity_percent) %>%
  arrange(desc(Humidity))

# A tibble: 480 x 4
    YEAR MONTH Humidity Annual_Humidity_percent
   <int> <chr>    <dbl>                   <dbl>
 1  1998 JAN       86.6                    66.3
 2  1991 DEC       86.5                    71.1
 3  1994 DEC       85.7                    70.5
 4  1984 DEC       85.6                    62.4
 5  2015 MAY       84.9                    71.4
 6  1992 JAN       84.8                    72.8
 7  2018 OCT       84.7                    68.6
 8  1986 DEC       83.6                    69.6
 9  1993 JAN       83.6                    69.6
10  2001 JAN       83.4                    69.5
# ... with 470 more rows

# When was humidity the lowest (Aug of 2000)

Final_Humidity %>%
  select(YEAR, MONTH, Humidity, Annual_Humidity_percent) %>%
  arrange(Humidity) %>%
  na.omit(Humidity)

# A tibble: 480 x 4
    YEAR MONTH Humidity Annual_Humidity_percent
   <int> <chr>    <dbl>                   <dbl>
 1  2000 AUG       34.9                    63  
 2  1999 AUG       35.8                    61.4
 3  2011 JUL       36.2                    56.4
 4  2011 AUG       37.3                    56.4
 5  1985 AUG       37.6                    68.1
 6  2000 SEP       40.4                    63  
 7  2011 SEP       41.1                    56.4
 8  1998 JUL       41.2                    66.3
 9  1984 JUL       41.3                    62.4
10  1988 AUG       44.1                    61.3
# ... with 470 more rows

# Graphing for humidity

# change in humidity over the years

ggplot(data = Percent_Humidty, mapping = aes(x = YEAR, y = Humidity)) +
  geom_point() +
  geom_smooth()

# change in humidity over years by each month

ggplot(data = Percent_Humidty, mapping = aes(x = YEAR, y = Humidity)) +
  geom_point() +
  geom_smooth(mapping = aes(color = MONTH), se = FALSE)

# Facet wrap of the previous graph to separate them

ggplot(data = Percent_Humidty, mapping = aes(x = YEAR, y = Humidity)) + 
  geom_point() +
  geom_smooth(mapping = aes(color = MONTH), se = FALSE) +
  facet_wrap(~ MONTH, nrow = 5)

Insights

Here we are doing the same thing that we did to the previous section of temperature. We renamed the RH2m to humidity and then changed the ANN to annual_humidity_percent. Next we removed the na values to focus in on the information that we needed and then graphed it to start drawing conclusions.

Cleaning Precipitation

# Looking at the columns that will be used
Para_split %>%
  select(YEAR, MONTH, PRECTOTCORR_SUM, ANN) %>%
  na.omit(PRECTOTCORR_SUM)

# A tibble: 480 x 4
    YEAR MONTH PRECTOTCORR_SUM   ANN
   <int> <chr>           <dbl> <dbl>
 1  1981 NOV             38.9  1082.
 2  1981 JAN              9.95 1082.
 3  1981 FEB             46.3  1082.
 4  1981 MAR             84.7  1082.
 5  1981 APR             77.1  1082.
 6  1981 MAY            154.   1082.
 7  1981 JUN            103.   1082.
 8  1981 JUL             36.2  1082.
 9  1981 AUG             53.8  1082.
10  1981 SEP             75.6  1082.
# ... with 470 more rows

# changing the precipitation from mm to inches and then renaming.
Precipitation_Inches <- Para_split %>%
  rename(Precipitation = PRECTOTCORR_SUM) %>%
  mutate(Precipitation_annual = ANN / 25.4) %>%
  mutate(Precipitation_Monthly = Precipitation / 25.4)

# Placing in a clean readable order and omitting the NA values.
Final_Precipitation <- Precipitation_Inches %>%
  select(YEAR, MONTH, Precipitation_Monthly, Precipitation_annual) %>%
  na.omit(Precipitation_Monthly)

Final_Precipitation

# A tibble: 480 x 4
    YEAR MONTH Precipitation_Monthly Precipitation_annual
   <int> <chr>                 <dbl>                <dbl>
 1  1981 NOV                   1.53                  42.6
 2  1981 JAN                   0.392                 42.6
 3  1981 FEB                   1.82                  42.6
 4  1981 MAR                   3.33                  42.6
 5  1981 APR                   3.04                  42.6
 6  1981 MAY                   6.08                  42.6
 7  1981 JUN                   4.05                  42.6
 8  1981 JUL                   1.42                  42.6
 9  1981 AUG                   2.12                  42.6
10  1981 SEP                   2.98                  42.6
# ... with 470 more rows

# When was precipitation the highest (OCT of 1981)

Precipitation_Inches %>%
  select(YEAR, MONTH, Precipitation_Monthly, Precipitation_annual) %>%
  arrange(desc(Precipitation_Monthly)) %>%
  na.omit(Precipitation_Monthly)

# A tibble: 480 x 4
    YEAR MONTH Precipitation_Monthly Precipitation_annual
   <int> <chr>                 <dbl>                <dbl>
 1  1981 OCT                   15.6                  42.6
 2  2015 MAY                   15.2                  58.0
 3  1982 MAY                   11.3                  38.5
 4  1989 MAY                   10.6                  43.0
 5  2018 OCT                   10.4                  40.1
 6  2004 JUN                   10.2                  45.4
 7  2007 JUN                   10.2                  44.8
 8  1989 JUN                    9.64                 43.0
 9  1990 APR                    9.36                 46.7
10  1991 DEC                    8.75                 47.5
# ... with 470 more rows

# When was it the lowest (Jan of 1986)

Final_Precipitation %>%
  select(YEAR, MONTH, Precipitation_Monthly, Precipitation_annual) %>%
  arrange(Precipitation_Monthly) %>%
  na.omit(Precipitation_Monthly)

# A tibble: 480 x 4
    YEAR MONTH Precipitation_Monthly Precipitation_annual
   <int> <chr>                 <dbl>                <dbl>
 1  1986 JAN                 0.00984                 39.0
 2  2011 JUL                 0.0130                  22.5
 3  2000 AUG                 0.0248                  32.1
 4  2011 MAR                 0.0787                  22.5
 5  1993 JUL                 0.0823                  36.9
 6  2012 NOV                 0.124                   28.7
 7  2018 JAN                 0.170                   40.1
 8  2014 JAN                 0.185                   23.9
 9  2005 NOV                 0.193                   18.3
10  1981 DEC                 0.206                   42.6
# ... with 470 more rows

# selecting only one year that way i can do a facet grid on it by month
Precipitation_1987 <- Precipitation_Inches %>%
  select(YEAR, MONTH, Precipitation_Monthly, Precipitation_annual) %>%
  filter(YEAR > 1985, YEAR < 1987) %>%
  na.omit(Precipitation_Monthly)

# Graph based on the changes of the annual precipitation changes over the years
ggplot(data = Final_Precipitation) +
  geom_smooth(mapping = aes(x = YEAR, y = Precipitation_annual), se = FALSE)

ggplot(data = Final_Precipitation, mapping = aes(x = YEAR, y = Precipitation_Monthly)) +
  geom_smooth(mapping = aes(color = MONTH), se = FALSE)

# Facet grid by month of the precipitation changes over the years

ggplot(data = Final_Precipitation, mapping = aes(x = YEAR, y = Precipitation_Monthly)) + 
  geom_point() +
  geom_smooth(mapping = aes(color = MONTH), se = FALSE) +
  facet_wrap(~ MONTH, nrow = 5) +
  labs(title = "Each month's change in precipitation over 40 years", x = "Year", y = "Precipitation per month in Inches")

Insights

Once more we engaged in a similar process of isolating the information to draw conclusions by renaming and then selecting that information that is important. However, on this one we needed to change the precipitation from mm to inches which required us to divide the ANN and the monthly column by 25.4. During this analysis we aimed towards finding when it was the highest and lowest and learned they were within 5 years of each other.

Towards the bottom I tried to graph based on one year which was 1986 when the precipitation had reached a low point. However, the graph did not turn out so I need to rework the graph and possibly use a different one to visualize this information.

Types of data

parameter = CHR
- Precipitation_Monthly = DBL
- Temperature_F = DBL
- Humidity = DBL
year = Int
Months = NUM
ANN = DBL

The questions I want to answer with this data is

Does precipitation increase or decrease over the years and when was it the highest and lowest.
Is there a change in humidity and does it correlate with the change in precipitation, wind, and temperature.
Does the temperature increase over the years or is it decreasing and does it correlate to the other variables?
When considering all variables is there a noticeable change in the climate?

Answers to questions

Precipitation has changed over the last 40 years however, it has dropped and then returned to the original amount. The highest was in OCT of 1981 and the lowest it hit was in Jan of 1986
Humidity has changed in the last 40 years and has followed a very similar graph compared to the precipitation. It started at roughly 67% and then took a dip towards the lower 60’s and now has climbed to 70%.There is a correlation between precipitation and humidity however, the temperature has increased since the come of the 21st century.
There is an increase of temperature over the past 40 years, We notice roughly an increase of about 1.5 degrees and a major spike in temperature change between the years of 1990-2000.
This one will be for the correlation analysis.

References/Acknowledgements

Center, NASA Langley Research. The POWER Project. 05 08 2021. https://power.larc.nasa.gov.

(“These data were obtained from the NASA Langley Research Center (LaRC) POWER Project funded through the NASA Earth Science/Applied Science Program.”)

NOTES TO CHANGE- each segment that shows important information create own title and r area for it to emphasis these parts - learn more graphs to visualize the information better -

Comment on this article Share:

Ethan Campbell HW4

Importing data

Introduction

Cleaning the Temperature column

Insights

Cleaning Humidty

Insights

Cleaning Precipitation

Insights

Types of data

The questions I want to answer with this data is

Answers to questions

References/Acknowledgements

Reuse

Citation