Since I’m looking into UFO shapes, and there are a lot of individual shapes, I opted to first remove “shapes” that aren’t literally shapes (including descriptions like “flash” or “formation” or “changing” as well as the super-specific shape of “light”). Then I collapsed the remaining UFO shape descriptions into 5 categories: Triangular, Cylindrical, Oval, Round (more properly circular than oval), and Parallelograms.
Probability Distributions of Encounter Length
# Plot the probability & CDF for variablesprobability <- ufo_sighting_data$length_of_encounter_secondsplot(probability,xlab ="Length of UFO Encounter in Seconds",main ="Probability",col ="navyblue", cex =0.8)
cum_probability <-cumsum(probability)plot(cum_probability,xlab ="Length of UFO Encounter in Seconds",ylab ="Cumulative Probability",main ="Cumulative Probability Distribution",col ="red",cex =0.8)
Grouping Data by Shape Category
# Group data by the 2 variables of interest. Get mean & STDEV for at least 1 variable and print resultsround <-c("disk", "round", "sphere", "circle", "dome")triangular <-c("triangle", "cone", "pyramid", "delta", "chevron")oval <-c("oval", "teardrop", "egg", "crescent")parallelogram <-c("cross", "diamond", "hexagon", "rectangle")cylinder <-c("cylinder", "cigar")avgs <- ufo_sighting_data %>%group_by(Shape_Category) %>%summarize(mean(length_of_encounter_seconds),sd(length_of_encounter_seconds),n())colnames(avgs) <-c("Shape", "mean_seconds", "sd_seconds", "n_seconds")print(avgs)
# Generate a plot library(ggplot2)ggplot(avgs, aes(as.factor(Shape), mean_seconds, fill = Shape)) +geom_bar(stat ="identity") +labs(y ="Average encounter length (seconds)", x ="UFO Shape Category", title="Mean length of UFO encounter by Craft Shape") +scale_fill_manual(values =c("darkred", "darkorange2", "darkgreen", "darkcyan", "darkorchid4"))
As my independent variable is nominal, I figured a bar chart would be the best way to show how the mean encounter length varies for each of the 5 shape categories. This visualization suggests that people have spent the most time encountering round-shaped UFOs, but later data visualizations will show that the round UFO category has a notable outlier. After round UFOs, triangular and oval-shaped crafts have the highest average encounter length. I’ll focus on those 3 shape categories rather than all 5.
Regression with Dummy Variables
# Carry out simple regression analysis using lm(). Generate plot that regresses X variable on Y. Fit a line through observations. # Create dummy variables for categorical Shape (x) variablelibrary(fastDummies)
# Plot distribution for Triangular craftsTri_mean <-mean(ufo_sighting_data$length_of_encounter_seconds[ufo_sighting_data$Shape_Category_Triangular ==1])Ntri_mean <-mean(ufo_sighting_data$length_of_encounter_seconds[ufo_sighting_data$Shape_Category_Triangular ==0])plot(length_of_encounter_seconds ~ Shape_Category_Triangular, ufo_sighting_data, pch=19, col="steelblue", main="Triangular-Shaped UFOs", ylab="Length of Encounter (Seconds)",xlab="Triangular-Shaped Craft versus Non-Triangular Crafts")points(y = Tri_mean, x=1, col="red", pch=19) points(y = Ntri_mean, x=0, col="red", pch=19)
Plot for Oval UFOs
# Plot distribution for Oval craftsOval_mean <-mean(ufo_sighting_data$length_of_encounter_seconds[ufo_sighting_data$Shape_Category_Oval ==1])Nov_mean <-mean(ufo_sighting_data$length_of_encounter_seconds[ufo_sighting_data$Shape_Category_Oval ==0])plot(length_of_encounter_seconds ~ Shape_Category_Oval, ufo_sighting_data, pch=19, col="steelblue", main="Oval-Shaped UFOs", xlab="Oval Crafts versus Non-Oval Crafts",ylab="Length of Encounter (Seconds)")points(y = Oval_mean, x=1, col="red", pch=19)points(y = Nov_mean, x=0, col="red", pch=19)
# Plot distribution for Round craftsRo_mean <-mean(ufo_sighting_data$length_of_encounter_seconds[ufo_sighting_data$Shape_Category_Round ==1])Nro_mean <-mean(ufo_sighting_data$length_of_encounter_seconds[ufo_sighting_data$Shape_Category_Round ==0])plot(length_of_encounter_seconds ~ Shape_Category_Round, ufo_sighting_data, pch=19, col="steelblue",main="Round-Shaped UFOs",ylab="Length of Encounter (Seconds)",xlab="Round vs. Non-Round Crafts")points(y = Ro_mean, x=1, col="red", pch=19)points(y = Nro_mean, x=0, col="red", pch=19)
This plot allows us to see the outlier point in the round shape category that may influence the relatively high value of its mean when compared to the other shape categories.
