Visualize data
data=read.csv(file=file.choose(),header = T)
data=data[,-1]
names(data)
[1] "Vehicle.brand" "location" "style"
[4] "emission" "gear" "model"
[7] "time" "kilometres" "price"
[10] "original.price" "hedge.ratio" "group1"
[13] "group2"
head(data)
Vehicle.brand location style emission gear model time
1 others 外国 2011款 6.2L 自动 (进口) 2014/11/1
2 others 外国 2011款 6.2L 自动 (进口) 2014/11/1
3 others 外国 2011款 6.2L 自动 (进口) 2013/7/1
4 others 外国 2011款 6.2L 自动 (进口) 2013/12/1
5 others 外国 2010款 6.0T 自动 (进口) 2010/3/1
6 others 外国 2010款 5.7L 自动 豪华型(进口) 2012/9/1
kilometres price original.price hedge.ratio group1 group2
1 2.9 48.0 71.4 0.67 小于10万 高级轿车
2 2.9 48.0 71.4 0.67 小于10万 高级轿车
3 5.0 41.5 71.4 0.58 小于10万 高级轿车
4 5.3 42.5 71.4 0.60 小于10万 高级轿车
5 4.3 100.0 358.0 0.28 小于10万 高级轿车
6 8.0 80.0 171.5 0.47 小于10万 高级轿车
In addition to overall means, medians, and SDs, use group_by() and summarise() to compute |
mean/median/SD for any relevant groupings. |
library(dplyr)
car_style <- group_by(data, style)
summarise(car_style, mean(price),median(price),sd(price), na.rm = TRUE)
# A tibble: 12 x 5
style `mean(price)` `median(price)` `sd(price)` na.rm
<chr> <dbl> <dbl> <dbl> <lgl>
1 2005款 38.5 38.5 NA TRUE
2 2006款 11.9 11 1.95 TRUE
3 2007款 12.1 8.5 9.07 TRUE
4 2008款 6.99 5.8 3.60 TRUE
5 2009款 8.81 8.8 5.18 TRUE
6 2010款 14.6 13.6 11.4 TRUE
7 2011款 16.8 12.6 12.1 TRUE
8 2012款 18.4 14.5 12.6 TRUE
9 2013款 23.1 19.2 13.2 TRUE
10 2014款 32.4 30.8 20.7 TRUE
11 2015款 30.5 18.8 28.4 TRUE
12 2016款 21.7 17 14.2 TRUE
# A tibble: 12 x 3
style n freq
<chr> <int> <dbl>
1 2005款 1 0.000801
2 2006款 5 0.00400
3 2007款 19 0.0152
4 2008款 79 0.0633
5 2009款 125 0.100
6 2010款 158 0.127
7 2011款 190 0.152
8 2012款 230 0.184
9 2013款 228 0.183
10 2014款 142 0.114
11 2015款 60 0.0480
12 2016款 12 0.00961
What variable(s) you are visualizing?
the price of the used car.
What question(s) you are attempting to answer with the visualization?
Does the price of a used car have a certain distribution law?
What conclusions you can make from the visualization?
Used cars with a price of 0-250,000 are more common in the market, and used cars with a price higher than 750,000 are relatively rare in the market.
library(ggplot2)
ggplot(data, aes(price, fill = style)) + geom_histogram()
labs(title = "price") +
theme_bw() +
facet_wrap(vars(style), scales = "free")
NULL
What variable(s) you are visualizing
Types of used cars and retention rates.
What question(s) you are attempting to answer with the visualization
Is there a relationship between the type of used car and the retention rate?
What conclusions you can make from the visualization
As can be seen from the figure, the more recent the used car, the higher the value hedging rate.
ggplot(data, aes(x=style, y=hedge.ratio)) + geom_boxplot()+
geom_violin(scale="count",fill="lightblue",alpha=.3)+
stat_summary(fun = "mean",geom="point")+
labs(title="Value retention rate of different cars",x="style", y="hedging rate") +
theme_bw()
• What questions are left unanswered with your visualizations
no unanswered questions.
• What about the visualizations may be unclear to a naive viewer
I think the visualization is pretty good
• How could you improve the visualizations for the final project
More visualizations will be added to the final project for analysis, and since the average value of the retention rate is visualized, the error will also be visualized later.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Xiaotong (2022, May 11). Data Analytics and Computational Social Science: hw_4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomtoni901233/
BibTeX citation
@misc{xiaotong2022hw_4, author = {Xiaotong, }, title = {Data Analytics and Computational Social Science: hw_4}, url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomtoni901233/}, year = {2022} }