Visualize data
data=read.csv(file=file.choose(),header = T)
data=data[,-1]
names(data)
[1] "Vehicle.brand" "location" "style"
[4] "emission" "gear" "model"
[7] "time" "kilometres" "price"
[10] "original.price" "hedge.ratio" "group1"
[13] "group2"
head(data)
Vehicle.brand location style emission gear model time
1 others 外国 2011款 6.2L 自动 (进口) 2014/11/1
2 others 外国 2011款 6.2L 自动 (进口) 2014/11/1
3 others 外国 2011款 6.2L 自动 (进口) 2013/7/1
4 others 外国 2011款 6.2L 自动 (进口) 2013/12/1
5 others 外国 2010款 6.0T 自动 (进口) 2010/3/1
6 others 外国 2010款 5.7L 自动 豪华型(进口) 2012/9/1
kilometres price original.price hedge.ratio group1 group2
1 2.9 48.0 71.4 0.67 小于10万 高级轿车
2 2.9 48.0 71.4 0.67 小于10万 高级轿车
3 5.0 41.5 71.4 0.58 小于10万 高级轿车
4 5.3 42.5 71.4 0.60 小于10万 高级轿车
5 4.3 100.0 358.0 0.28 小于10万 高级轿车
6 8.0 80.0 171.5 0.47 小于10万 高级轿车
library(ggplot2)
ggplot(data, aes(price, fill =style)) + geom_histogram()
labs(title = "price") +
theme_bw() +
facet_wrap(vars(style), scales = "free")
NULL
ggplot(data,aes(x=hedge.ratio))+geom_density(fill="pink")+
labs(title="Hedging Rate Density Plot",x="Hedging Rate", y="Density") +
theme_bw()
ggplot(data, aes(x=style, y=hedge.ratio)) + geom_boxplot()+
geom_violin(scale="count",fill="pink")+
stat_summary(fun = "mean",geom="point")+
labs(title="Value retention rate of different cars",x="style", y="hedging rate") +
theme(axis.text=element_text(size=4))+
facet_wrap(vars(location), scales = "free")
library(dplyr)
sd_plot<-group_by(data, style)%>%
summarise(n=n(), mean=mean(hedge.ratio),sd=sd(hedge.ratio), na.rm = TRUE)
ggplot(sd_plot,aes(style,mean,fill=style))+
geom_col()+
geom_errorbar(aes(style,ymin=mean-sd,ymax=mean+sd,color=style))+
labs(title="Uncertainty of hedging rate",x="style", y="hedge.ratio")+
theme(axis.text=element_text(size=6))
sd_plot
# A tibble: 12 x 5
style n mean sd na.rm
<chr> <int> <dbl> <dbl> <lgl>
1 2005款 1 0.25 NA TRUE
2 2006款 5 0.576 0.230 TRUE
3 2007款 19 0.379 0.119 TRUE
4 2008款 79 0.373 0.0716 TRUE
5 2009款 125 0.396 0.0567 TRUE
6 2010款 158 0.431 0.0663 TRUE
7 2011款 190 0.493 0.0721 TRUE
8 2012款 230 0.570 0.0771 TRUE
9 2013款 228 0.619 0.0829 TRUE
10 2014款 142 0.669 0.0809 TRUE
11 2015款 60 0.741 0.0518 TRUE
12 2016款 12 0.819 0.0844 TRUE
– What is missing (if anything) in your analysis process so far? |
I don’t think something I missing in my analysis process so far. |
– What conclusions can you make about your research questions at this point? |
1. The price of most used cars is between 0-500,000, of which 0-250,000 accounts for the vast majority. |
2. Older cars are cheaper than new ones |
3. The value preservation rate is generally in the range of 0.4-0.7, and the proportion of used cars with a value preservation rate higher than 0.8 or lower than 0.3 is very small |
4. Whether it is an imported car or a domestic car, the closer the model year is, the higher the value retention rate. |
5.Most used cars are from 2009 to 2014. |
– What do you think a naive reader would need to fully understand your graphs? |
I think it may be necessary to understand some basics of charts before being able to understand. |
– Is there anything you want to answer with your dataset, but can’t? |
No. |
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Xiaotong (2022, May 11). Data Analytics and Computational Social Science: hw_5. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomtoni901234/
BibTeX citation
@misc{xiaotong2022hw_5, author = {Xiaotong, }, title = {Data Analytics and Computational Social Science: hw_5}, url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomtoni901234/}, year = {2022} }