Fort Worth climate change based on precipitation, humidity, and temperature from 1981-2020
Importing and viewing the data to determine major cleaning changes that need to be made.
PARAMETER YEAR JAN FEB MAR APR MAY JUN JUL AUG
1 PS 1981 99.17 99.00 98.53 98.64 98.18 98.29 98.49 98.45
2 PS 1982 98.81 99.11 98.50 98.52 98.27 98.28 98.51 98.53
3 PS 1983 98.91 98.51 98.06 98.08 98.23 98.26 98.63 98.59
4 PS 1984 99.40 98.61 98.42 98.03 98.38 98.36 98.49 98.47
5 PS 1985 99.24 99.08 98.57 98.47 98.26 98.42 98.49 98.42
6 PS 1986 99.22 98.56 98.70 98.41 98.27 98.40 98.58 98.55
SEP OCT NOV DEC ANN
1 98.65 98.69 98.76 98.81 98.64
2 98.58 98.76 98.83 98.78 98.62
3 98.65 98.80 98.37 99.20 98.53
4 98.72 98.58 98.90 98.89 98.61
5 98.60 98.66 98.61 99.26 98.67
6 98.56 98.87 98.91 99.18 98.69
Month_combined <- Fort_Worth %>%
pivot_longer(
cols = c(NOV, JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, DEC),
names_to = "MONTH",
values_to = "Month_AVG",
)
Month_combined
# A tibble: 4,800 x 5
PARAMETER YEAR ANN MONTH Month_AVG
<chr> <int> <dbl> <chr> <dbl>
1 PS 1981 98.6 NOV 98.8
2 PS 1981 98.6 JAN 99.2
3 PS 1981 98.6 FEB 99
4 PS 1981 98.6 MAR 98.5
5 PS 1981 98.6 APR 98.6
6 PS 1981 98.6 MAY 98.2
7 PS 1981 98.6 JUN 98.3
8 PS 1981 98.6 JUL 98.5
9 PS 1981 98.6 AUG 98.4
10 PS 1981 98.6 SEP 98.6
# ... with 4,790 more rows
# A tibble: 4,800 x 5
PARAMETER YEAR MONTH Month_AVG ANN
<chr> <int> <chr> <dbl> <dbl>
1 PS 1981 NOV 98.8 98.6
2 PS 1981 JAN 99.2 98.6
3 PS 1981 FEB 99 98.6
4 PS 1981 MAR 98.5 98.6
5 PS 1981 APR 98.6 98.6
6 PS 1981 MAY 98.2 98.6
7 PS 1981 JUN 98.3 98.6
8 PS 1981 JUL 98.5 98.6
9 PS 1981 AUG 98.4 98.6
10 PS 1981 SEP 98.6 98.6
# ... with 4,790 more rows
Para_split <- Month_combined %>%
pivot_wider(names_from = PARAMETER,
values_from = Month_AVG,
)
Para_split
# A tibble: 4,800 x 13
YEAR ANN MONTH PS TS T2M QV2M RH2M WD50M WS10M WS50M
<int> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1981 98.6 NOV 98.8 NA NA NA NA NA NA NA
2 1981 98.6 JAN 99.2 NA NA NA NA NA NA NA
3 1981 98.6 FEB 99 NA NA NA NA NA NA NA
4 1981 98.6 MAR 98.5 NA NA NA NA NA NA NA
5 1981 98.6 APR 98.6 NA NA NA NA NA NA NA
6 1981 98.6 MAY 98.2 NA NA NA NA NA NA NA
7 1981 98.6 JUN 98.3 NA NA NA NA NA NA NA
8 1981 98.6 JUL 98.5 NA NA NA NA NA NA NA
9 1981 98.6 AUG 98.4 NA NA NA NA NA NA NA
10 1981 98.6 SEP 98.6 NA NA NA NA NA NA NA
# ... with 4,790 more rows, and 2 more variables: PRECTOTCORR <dbl>,
# PRECTOTCORR_SUM <dbl>
Here we notice that all the months are separated and that the PARAMETER column holds all of the unique values. I want to combine the months into one column and then spread the unique values. Once I complete this I can select the columns that I want and then remove the Na values. When I have this setup I will be able to run an analysis on the specific segment that I want.
The temperature column has NA values and is in C format so we need to rename the column and remove the Na values while switching it to F format.
# combining the columns of months to create the month column
Month_combined <- Fort_Worth %>%
pivot_longer(
cols = c(NOV, JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, DEC),
names_to = "MONTH",
values_to = "Month_AVG",
)
# Viewing to make sure it is working correctly
Month_combined %>%
select("PARAMETER", "YEAR", "MONTH", "Month_AVG", "ANN")
# A tibble: 4,800 x 5
PARAMETER YEAR MONTH Month_AVG ANN
<chr> <int> <chr> <dbl> <dbl>
1 PS 1981 NOV 98.8 98.6
2 PS 1981 JAN 99.2 98.6
3 PS 1981 FEB 99 98.6
4 PS 1981 MAR 98.5 98.6
5 PS 1981 APR 98.6 98.6
6 PS 1981 MAY 98.2 98.6
7 PS 1981 JUN 98.3 98.6
8 PS 1981 JUL 98.5 98.6
9 PS 1981 AUG 98.4 98.6
10 PS 1981 SEP 98.6 98.6
# ... with 4,790 more rows
# Splitting the parameter section to bring out each unique value
Para_split <- Month_combined %>%
pivot_wider(names_from = PARAMETER,
values_from = Month_AVG,
)
# Rename in the temperature column
Rename_temp <- Para_split %>%
rename(Temperature = T2M)
# Mutating the temperature column to switch from C to F
Updated_temp <- Rename_temp %>%
mutate(Temperature_F = Temperature * 9/5 + 32) %>%
mutate(Annual_Temperature = ANN *9/5 + 32)
# Selecting the columns that are updated and then omitting the Na values
Final_Temperature <- Updated_temp %>%
select(YEAR, MONTH, Temperature_F, Annual_Temperature) %>%
na.omit(Temperature_F)
# Graphs for temperature
# Change in temperature over the years (monthly temperature)
ggplot(data = Final_Temperature, mapping = aes(x = YEAR, y = Temperature_F)) +
geom_point(mapping = aes(color = MONTH)) +
geom_smooth()
# A zoomed in look at the change in temperature over the years
ggplot(data = Final_Temperature) +
geom_smooth(mapping = aes(x = YEAR, y = Annual_Temperature), se = FALSE)
# Slightly zoomed out version of the above graph that shows points for each year
ggplot(data = Final_Temperature, mapping = aes(x = YEAR, y = Annual_Temperature)) +
geom_point(mapping = aes(color = Annual_Temperature)) +
geom_smooth()
# attempted to do a violin graph that shows the change of each month combined over the years and then showed them separated by the facet.
ggplot(data = Final_Temperature) +
geom_violin(mapping = aes(x = YEAR, y = Temperature_F, color = MONTH)) +
facet_wrap(~ MONTH, nrow = 5)
I will start with temperature and I will need the columns YEAR, MONTH, ANN, and T2m. This will show the year that it occurred and the month. The t2m will show the temperature at 2meters above the surface level and will display it per month. The Ann will show the average for the year and this is good to use when looking at each year to get a comparison.
# Looking at table with needed columns
Para_split %>%
select(YEAR, MONTH, RH2M, ANN) %>%
na.omit(RH2M)
# A tibble: 480 x 4
YEAR MONTH RH2M ANN
<int> <chr> <dbl> <dbl>
1 1981 NOV 80.2 68.8
2 1981 JAN 69.4 68.8
3 1981 FEB 67.2 68.8
4 1981 MAR 67.4 68.8
5 1981 APR 66.2 68.8
6 1981 MAY 71.1 68.8
7 1981 JUN 75.5 68.8
8 1981 JUL 57.8 68.8
9 1981 AUG 50.1 68.8
10 1981 SEP 63.5 68.8
# ... with 470 more rows
# renaming the columns
Percent_Humidty <- Para_split %>%
rename(Humidity = RH2M) %>%
rename(Annual_Humidity_percent = ANN)
# Placing in the right order and omitting the NA values
Final_Humidity <- Percent_Humidty %>%
select(YEAR, MONTH, Humidity, Annual_Humidity_percent) %>%
na.omit(Humidity)
# Graphing to see the changing in humidity over the years
ggplot(data = Final_Humidity, mapping = aes(x = YEAR, y = Humidity)) +
geom_point() +
geom_smooth()
# When was humidity the highest (Jan of 1998)
Final_Humidity %>%
select(YEAR, MONTH, Humidity, Annual_Humidity_percent) %>%
arrange(desc(Humidity))
# A tibble: 480 x 4
YEAR MONTH Humidity Annual_Humidity_percent
<int> <chr> <dbl> <dbl>
1 1998 JAN 86.6 66.3
2 1991 DEC 86.5 71.1
3 1994 DEC 85.7 70.5
4 1984 DEC 85.6 62.4
5 2015 MAY 84.9 71.4
6 1992 JAN 84.8 72.8
7 2018 OCT 84.7 68.6
8 1986 DEC 83.6 69.6
9 1993 JAN 83.6 69.6
10 2001 JAN 83.4 69.5
# ... with 470 more rows
# When was humidity the lowest (Aug of 2000)
Final_Humidity %>%
select(YEAR, MONTH, Humidity, Annual_Humidity_percent) %>%
arrange(Humidity) %>%
na.omit(Humidity)
# A tibble: 480 x 4
YEAR MONTH Humidity Annual_Humidity_percent
<int> <chr> <dbl> <dbl>
1 2000 AUG 34.9 63
2 1999 AUG 35.8 61.4
3 2011 JUL 36.2 56.4
4 2011 AUG 37.3 56.4
5 1985 AUG 37.6 68.1
6 2000 SEP 40.4 63
7 2011 SEP 41.1 56.4
8 1998 JUL 41.2 66.3
9 1984 JUL 41.3 62.4
10 1988 AUG 44.1 61.3
# ... with 470 more rows
# Graphing for humidity
# change in humidity over the years
ggplot(data = Percent_Humidty, mapping = aes(x = YEAR, y = Humidity)) +
geom_point() +
geom_smooth()
# change in humidity over years by each month
ggplot(data = Percent_Humidty, mapping = aes(x = YEAR, y = Humidity)) +
geom_point() +
geom_smooth(mapping = aes(color = MONTH), se = FALSE)
# Facet wrap of the previous graph to separate them
ggplot(data = Percent_Humidty, mapping = aes(x = YEAR, y = Humidity)) +
geom_point() +
geom_smooth(mapping = aes(color = MONTH), se = FALSE) +
facet_wrap(~ MONTH, nrow = 5)
Here we are doing the same thing that we did to the previous section of temperature. We renamed the RH2m to humidity and then changed the ANN to annual_humidity_percent. Next we removed the na values to focus in on the information that we needed and then graphed it to start drawing conclusions.
# Looking at the columns that will be used
Para_split %>%
select(YEAR, MONTH, PRECTOTCORR_SUM, ANN) %>%
na.omit(PRECTOTCORR_SUM)
# A tibble: 480 x 4
YEAR MONTH PRECTOTCORR_SUM ANN
<int> <chr> <dbl> <dbl>
1 1981 NOV 38.9 1082.
2 1981 JAN 9.95 1082.
3 1981 FEB 46.3 1082.
4 1981 MAR 84.7 1082.
5 1981 APR 77.1 1082.
6 1981 MAY 154. 1082.
7 1981 JUN 103. 1082.
8 1981 JUL 36.2 1082.
9 1981 AUG 53.8 1082.
10 1981 SEP 75.6 1082.
# ... with 470 more rows
# changing the precipitation from mm to inches and then renaming.
Precipitation_Inches <- Para_split %>%
rename(Precipitation = PRECTOTCORR_SUM) %>%
mutate(Precipitation_annual = ANN / 25.4) %>%
mutate(Precipitation_Monthly = Precipitation / 25.4)
# Placing in a clean readable order and omitting the NA values.
Final_Precipitation <- Precipitation_Inches %>%
select(YEAR, MONTH, Precipitation_Monthly, Precipitation_annual) %>%
na.omit(Precipitation_Monthly)
Final_Precipitation
# A tibble: 480 x 4
YEAR MONTH Precipitation_Monthly Precipitation_annual
<int> <chr> <dbl> <dbl>
1 1981 NOV 1.53 42.6
2 1981 JAN 0.392 42.6
3 1981 FEB 1.82 42.6
4 1981 MAR 3.33 42.6
5 1981 APR 3.04 42.6
6 1981 MAY 6.08 42.6
7 1981 JUN 4.05 42.6
8 1981 JUL 1.42 42.6
9 1981 AUG 2.12 42.6
10 1981 SEP 2.98 42.6
# ... with 470 more rows
# When was precipitation the highest (OCT of 1981)
Precipitation_Inches %>%
select(YEAR, MONTH, Precipitation_Monthly, Precipitation_annual) %>%
arrange(desc(Precipitation_Monthly)) %>%
na.omit(Precipitation_Monthly)
# A tibble: 480 x 4
YEAR MONTH Precipitation_Monthly Precipitation_annual
<int> <chr> <dbl> <dbl>
1 1981 OCT 15.6 42.6
2 2015 MAY 15.2 58.0
3 1982 MAY 11.3 38.5
4 1989 MAY 10.6 43.0
5 2018 OCT 10.4 40.1
6 2004 JUN 10.2 45.4
7 2007 JUN 10.2 44.8
8 1989 JUN 9.64 43.0
9 1990 APR 9.36 46.7
10 1991 DEC 8.75 47.5
# ... with 470 more rows
# When was it the lowest (Jan of 1986)
Final_Precipitation %>%
select(YEAR, MONTH, Precipitation_Monthly, Precipitation_annual) %>%
arrange(Precipitation_Monthly) %>%
na.omit(Precipitation_Monthly)
# A tibble: 480 x 4
YEAR MONTH Precipitation_Monthly Precipitation_annual
<int> <chr> <dbl> <dbl>
1 1986 JAN 0.00984 39.0
2 2011 JUL 0.0130 22.5
3 2000 AUG 0.0248 32.1
4 2011 MAR 0.0787 22.5
5 1993 JUL 0.0823 36.9
6 2012 NOV 0.124 28.7
7 2018 JAN 0.170 40.1
8 2014 JAN 0.185 23.9
9 2005 NOV 0.193 18.3
10 1981 DEC 0.206 42.6
# ... with 470 more rows
# selecting only one year that way i can do a facet grid on it by month
Precipitation_1987 <- Precipitation_Inches %>%
select(YEAR, MONTH, Precipitation_Monthly, Precipitation_annual) %>%
filter(YEAR > 1985, YEAR < 1987) %>%
na.omit(Precipitation_Monthly)
# Graph based on the changes of the annual precipitation changes over the years
ggplot(data = Final_Precipitation) +
geom_smooth(mapping = aes(x = YEAR, y = Precipitation_annual), se = FALSE)
ggplot(data = Final_Precipitation, mapping = aes(x = YEAR, y = Precipitation_Monthly)) +
geom_smooth(mapping = aes(color = MONTH), se = FALSE)
# Facet grid by month of the precipitation changes over the years
ggplot(data = Final_Precipitation, mapping = aes(x = YEAR, y = Precipitation_Monthly)) +
geom_point() +
geom_smooth(mapping = aes(color = MONTH), se = FALSE) +
facet_wrap(~ MONTH, nrow = 5) +
labs(title = "Each month's change in precipitation over 40 years", x = "Year", y = "Precipitation per month in Inches")
Once more we engaged in a similar process of isolating the information to draw conclusions by renaming and then selecting that information that is important. However, on this one we needed to change the precipitation from mm to inches which required us to divide the ANN and the monthly column by 25.4. During this analysis we aimed towards finding when it was the highest and lowest and learned they were within 5 years of each other.
Towards the bottom I tried to graph based on one year which was 1986 when the precipitation had reached a low point. However, the graph did not turn out so I need to rework the graph and possibly use a different one to visualize this information.
Precipitation has changed over the last 40 years however, it has dropped and then returned to the original amount. The highest was in OCT of 1981 and the lowest it hit was in Jan of 1986
Humidity has changed in the last 40 years and has followed a very similar graph compared to the precipitation. It started at roughly 67% and then took a dip towards the lower 60’s and now has climbed to 70%.There is a correlation between precipitation and humidity however, the temperature has increased since the come of the 21st century.
There is an increase of temperature over the past 40 years, We notice roughly an increase of about 1.5 degrees and a major spike in temperature change between the years of 1990-2000.
This one will be for the correlation analysis.
(“These data were obtained from the NASA Langley Research Center (LaRC) POWER Project funded through the NASA Earth Science/Applied Science Program.”)
NOTES TO CHANGE- each segment that shows important information create own title and r area for it to emphasis these parts - learn more graphs to visualize the information better -
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Campbell (2022, March 6). Data Analytics and Computational Social Science: Ethan Campbell HW4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomethancampbell874360/
BibTeX citation
@misc{campbell2022ethan, author = {Campbell, Ethan}, title = {Data Analytics and Computational Social Science: Ethan Campbell HW4}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomethancampbell874360/}, year = {2022} }