Data Analytics and Computational Social Science: hw_5

Xiaotong

data=read.csv(file=file.choose(),header = T)
data=data[,-1]
names(data)

 [1] "Vehicle.brand"  "location"       "style"         
 [4] "emission"       "gear"           "model"         
 [7] "time"           "kilometres"     "price"         
[10] "original.price" "hedge.ratio"    "group1"        
[13] "group2"

head(data)

  Vehicle.brand location  style emission gear        model      time
1        others     外国 2011款     6.2L 自动       (进口) 2014/11/1
2        others     外国 2011款     6.2L 自动       (进口) 2014/11/1
3        others     外国 2011款     6.2L 自动       (进口)  2013/7/1
4        others     外国 2011款     6.2L 自动       (进口) 2013/12/1
5        others     外国 2010款     6.0T 自动       (进口)  2010/3/1
6        others     外国 2010款     5.7L 自动 豪华型(进口)  2012/9/1
  kilometres price original.price hedge.ratio   group1   group2
1        2.9  48.0           71.4        0.67 小于10万 高级轿车
2        2.9  48.0           71.4        0.67 小于10万 高级轿车
3        5.0  41.5           71.4        0.58 小于10万 高级轿车
4        5.3  42.5           71.4        0.60 小于10万 高级轿车
5        4.3 100.0          358.0        0.28 小于10万 高级轿车
6        8.0  80.0          171.5        0.47 小于10万 高级轿车

library(ggplot2)
ggplot(data, aes(price, fill =style)) + geom_histogram()

  labs(title = "price") + 
  theme_bw() +
  facet_wrap(vars(style), scales = "free")

NULL

ggplot(data,aes(x=hedge.ratio))+geom_density(fill="pink")+
  labs(title="Hedging Rate Density Plot",x="Hedging Rate", y="Density") +
  theme_bw()

ggplot(data, aes(x=style, y=hedge.ratio)) + geom_boxplot()+
  geom_violin(scale="count",fill="pink")+
  stat_summary(fun = "mean",geom="point")+
  labs(title="Value retention rate of different cars",x="style", y="hedging rate") + 
  theme(axis.text=element_text(size=4))+
  facet_wrap(vars(location), scales = "free")

library(dplyr)
sd_plot<-group_by(data, style)%>%
summarise(n=n(), mean=mean(hedge.ratio),sd=sd(hedge.ratio), na.rm = TRUE)

ggplot(sd_plot,aes(style,mean,fill=style))+
  geom_col()+
  geom_errorbar(aes(style,ymin=mean-sd,ymax=mean+sd,color=style))+
  labs(title="Uncertainty of hedging rate",x="style", y="hedge.ratio")+
  theme(axis.text=element_text(size=6))

sd_plot

# A tibble: 12 x 5
   style      n  mean      sd na.rm
   <chr>  <int> <dbl>   <dbl> <lgl>
 1 2005款     1 0.25  NA      TRUE 
 2 2006款     5 0.576  0.230  TRUE 
 3 2007款    19 0.379  0.119  TRUE 
 4 2008款    79 0.373  0.0716 TRUE 
 5 2009款   125 0.396  0.0567 TRUE 
 6 2010款   158 0.431  0.0663 TRUE 
 7 2011款   190 0.493  0.0721 TRUE 
 8 2012款   230 0.570  0.0771 TRUE 
 9 2013款   228 0.619  0.0829 TRUE 
10 2014款   142 0.669  0.0809 TRUE 
11 2015款    60 0.741  0.0518 TRUE 
12 2016款    12 0.819  0.0844 TRUE

– What is missing (if anything) in your analysis process so far?

I don’t think something I missing in my analysis process so far.

– What conclusions can you make about your research questions at this point?

1. The price of most used cars is between 0-500,000, of which 0-250,000 accounts for the vast majority.

2. Older cars are cheaper than new ones

3. The value preservation rate is generally in the range of 0.4-0.7, and the proportion of used cars with a value preservation rate higher than 0.8 or lower than 0.3 is very small

4. Whether it is an imported car or a domestic car, the closer the model year is, the higher the value retention rate.

5.Most used cars are from 2009 to 2014.

– What do you think a naive reader would need to fully understand your graphs?

I think it may be necessary to understand some basics of charts before being able to understand.

– Is there anything you want to answer with your dataset, but can’t?

No.

Comment on this article Share:

hw_5

Reuse

Citation