DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Final Project

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Introduction
  • Data & Methodology:
  • Visualizations
    • Bonus Section
  • Reflection
  • Conclusion
  • Bibliography
  • Appendix
    • Appendix A:
    • Appendix B:
    • Appendix C:
    • Appendix D:
    • Appendix E:
    • Appendix F:

Final Project

  • Show All Code
  • Hide All Code

  • View Source
Final Project (Second Submission)
Used Vehicle Data
Owen Tibby
Author

Owen Tibby

Published

December 20, 2022

Code
library(distill)
library(tidyverse)
library(tidyr)
library(summarytools)
library(dplyr)
library(readr)
library(readxl)
library(stringr)
library(lubridate)
library(vindecodr)
Error in library(vindecodr): there is no package called 'vindecodr'
Code
library(plotrix)
library(sparklyr)
Error in library(sparklyr): there is no package called 'sparklyr'
Code
library(purrr)
library(rmarkdown)
library(knitr)
library(psych) 


knitr::opts_chunk$set(echo = TRUE, warning = FALSE, paged.print= TRUE)

Introduction

Introduction & Purpose of the Study:

The automotive sector is a major part of the US economy and is among the hardest hit by the ripple effects of the COVID-19 pandemic. Factory closures and lockdowns led to a supply shortage of critical vehicle components necessary for new car production. Consequently, this has significantly increased the demand and value of used cars. However, some experts believe the market will return to ‘normal’ by the end of 2023. But what did the market look like before the pandemic?

The purpose of this study is to explore the realm of the used car market, specifically, the wholesale sector. The primary data was gathered from a public domain and represents Manheim auction sales between wholesalers/ remarketers and wholesale buyers/ dealers of used vehicles. I am interested in exploring the retained value of vehicles based on their segment, body type, and age. Correlations between vehicle condition and mileage will also be analyzed.

Research Question: How do age, mileage, body style, and condition affect the retained value of a used vehicle?

Data & Methodology:

Reading in the Data

Code
# # #RStudio Desktop
carprices <-read_excel("_data/car_prices2.xlsx", col_types = c("text", "text", "text", "text", "text", "text", "text", "text", "numeric", "numeric", "text", "text", 
        "text", "numeric", "numeric", "text",  "skip"))
df <- carprices
exchange_rate <- read_excel("_data/Historical_Prices.xlsx", col_types = c("numeric", "text", "skip", "skip", "skip", "numeric"))

#Second Read-In of ca_catalog (to make data fields private)
ca_catalog <- read_csv("_data/ca_catalog2.csv")

#Read-In from code chunk
decoded_keys <- read_csv("_data/decoded.keys1.csv")

Description of Data:

The primary data used in the analysis was the ‘carprices.csv’ data set from Kaggle (see appendix A). This data set contains historical used vehicle auction transactions that took place between 2014 and 2015. The data was collected from Manheim auction houses across the 34 states in the USA. Each observation details a wholesale auction transaction that includes: the vehicle information, state/province, selling price, sale date, and the market average price i.e., the Manheim Market Report price (MMR) and other specifics about the sale.

Primary Objectives

Data Cleansing:

  1. Have consistency in the names of vehicle makes and models by using appropriate packages and joins.

  2. Obtain MSRP from by joining with a catalog

Data Analysis:

  1. Calculate vehicle age as a variable in both years and months.

  2. Compute retained value as a percentage of each vehicle’s MSRP

  3. Compute standard error as a percentage of the mean MSRP for each model and filter transactions with a low error rating for deeper analysis.

  4. Analyze the impact of age, mileage, condition, and segment on retained value

  5. Determine the segments (eg. compact cars, midsize cars, etc.) and vehicles with the highest retained value percentage.

Methods:

Retained Value (RV) is calculated by dividing the selling price (auction value) by the vehicle’s MSRP.

Retained Value% = Wholesale Price/ MSRP 

I was able to source MSRPs from a Canadian vehicle catalog (Appendix B) . To protect the data confidentiality, the MSRPs in the Canadian dollar catalog were averaged for each year, make, and model by trim level. In other words, I aggregated and derived the mean of the MSRPs of all trims belonging to a specific year, make and model, or YMM to reduce the level of detail. Therefore, MSRPs for trims belonging to a particular year, make and model shows the mean MSRP for the entire YMM group. 

Using the YMM as a key common to the ‘carprices’ data as well as the ca_catalog, I can combine both and fetch the MSRP. Unsurprisingly, this left one more challenge, the MSRPs are listed in Canadian Dollars. In order to convert it to US Dollars, I will have to find and divide each vehicle MSRP by the USD: CAD exchange rate at the time of release. I was able to source the FX rate from 2006 and onwards using ‘exchange_rate.csv’ data obtained from WSJ Markets. Considering the below assumptions, I joined the exchange_rate.csv to carprices.csv using launch year as the common key between both data frames. Each MSRP was then divided by the daily close rate of the USD-CAD. 

Important assumptions to get MSRP in USD: 

  1. Canadian vehicles & trims are priced similarly with US trims. 

  2. All vehicles were released in September of the previous year. i.e., a 2013 vehicle was available for sale in September 2012. 

  3. The exchange rate at the time of release was the closing FX rate at Aug-31st 

Data Cleansing:

Observing the data
Code
summary(df)

Table 1.0: Auction Data (Preview)

Code
print(kable(head(carprices)))


|year |make   |model               |trim       |body  |transmission |vin               |state | condition| odometer|color |interior |seller                                             |   mmr| sellingprice|saledate                                |
|:----|:------|:-------------------|:----------|:-----|:------------|:-----------------|:-----|---------:|--------:|:-----|:--------|:--------------------------------------------------|-----:|------------:|:---------------------------------------|
|2015 |Kia    |Sorento             |LX         |SUV   |automatic    |5xyktca69fg566472 |ca    |       5.0|    16639|white |black    |kia motors america, inc                            | 20500|        21500|Tue Dec 16 2014 12:30:00 GMT-0800 (PST) |
|2015 |Kia    |Sorento             |LX         |SUV   |automatic    |5xyktca69fg561319 |ca    |       5.0|     9393|white |beige    |kia motors america, inc                            | 20800|        21500|Tue Dec 16 2014 12:30:00 GMT-0800 (PST) |
|2014 |BMW    |3 Series            |328i SULEV |Sedan |automatic    |wba3c1c51ek116351 |ca    |       4.5|     1331|gray  |black    |financial services remarketing (lease)             | 31900|        30000|Thu Jan 15 2015 04:30:00 GMT-0800 (PST) |
|2015 |Volvo  |S60                 |T5         |Sedan |automatic    |yv1612tb4f1310987 |ca    |       4.1|    14282|white |black    |volvo na rep/world omni                            | 27500|        27750|Thu Jan 29 2015 04:30:00 GMT-0800 (PST) |
|2014 |BMW    |6 Series Gran Coupe |650i       |Sedan |automatic    |wba6b2c57ed129731 |ca    |       4.3|     2641|gray  |black    |financial services remarketing (lease)             | 66000|        67000|Thu Dec 18 2014 12:30:00 GMT-0800 (PST) |
|2015 |Nissan |Altima              |2.5 S      |Sedan |automatic    |1n4al3ap1fn326013 |ca    |       1.0|     5554|gray  |black    |enterprise vehicle exchange / tra / rental / tulsa | 15350|        10900|Tue Dec 30 2014 12:00:00 GMT-0800 (PST) |
Code
#Observing and removing all records for model year 2006 and older & all erroneous prices
carprices<- carprices %>% filter(`year`> 2006, sellingprice>100)

#Checking the consistency in the names of states and body types
carprices %>%  select(state) %>% distinct()
carprices %>% select(body) %>% distinct() 

carprices$state <- toupper(carprices$state)
carprices$body <- toupper(carprices$body)
Cleaning the Data: Mapping and Joining
Code
#Checking the consistency in the names of Body types}
carprices %>% select(body) %>% unique()

#Creating a Body Type column
carprices <- mutate(carprices, Body_Type = case_when(str_detect(body, "EDAN")~ "Sedan", str_detect(body, "SUV") ~ "SUV", str_detect(body, "AB") ~ "Pickup", str_detect(body, "OUP")~ "Coupe", str_detect(`body`, "VAN")~ "Minivan", str_detect(`body`,"ONV")~ "Convertible", str_detect(`body`,"AGO") ~ "Wagon", str_detect(`body`,"ATCH")~ "Hatchback", str_detect(`body`,"CREW")~ "Pickup" ))

Joining multiple files to get the MSRP from a historical catalog. The code below shows what I did to write a new CSV file (that does not include the original MSRP but rather the mean MSRP and the other useful fields like trim count and Error.

Code
#Creating a Key in the CDN catalog
ca_catalog <-  ca_catalog %>% mutate(YearOfLaunch= `Model Year`-1)

#Creating Catalog with MSRP in CAD
ca_catalog <- ca_catalog %>% group_by(`Combined`, `YearOfLaunch`, `Segment` ) %>% 
summarise( Trim_Count=n(), AVG_MSRP_CAD= mean(`MSRP`, na.rm= TRUE), MSRP_Range_CAD= (max(`MSRP`)- min(`MSRP`)), StdDev_MSRP_CAD= sd(`MSRP`, na.rm=FALSE), Error.MSRP_CAD= std.error(`MSRP`, na.rm=FALSE))%>% 
ungroup()

#Writing CSV to exclude confidential MSRPs and only include the AVG_MSRP field
write.csv(ca_catalog, "C:\\Users\\Owen-Matthew\\Documents\\Github\\601_Fall_2022\\posts\\ca_catalog.csv", row.names=FALSE)

Join #1

Code
#Joining the data to convert CAD MSRP to USD MSRP later
ca_catalog <- left_join(ca_catalog, exchange_rate, c("YearOfLaunch"= "Year"))

#Creating US$ Catalog and Converting to MSRP from CAD to USD
us_catalog <- ca_catalog %>% 
  mutate(AVG_MSRP= round((`AVG_MSRP_CAD` / `Close`), digits = 0), MSRP_Range= `MSRP_Range_CAD` / `Close`, StdDev_MSRP= `StdDev_MSRP_CAD`/ `Close`, Error.MSRP= `Error.MSRP_CAD`/ `Close` )

#Finalizing US Catalog for Joining
us_catalog <- us_catalog %>% 
select(Combined, Segment, AVG_MSRP, MSRP_Range, Trim_Count, StdDev_MSRP, Error.MSRP) %>% 
  mutate(`Error %`= round((`Error.MSRP`/ `AVG_MSRP` * 100), digits = 1), `%MSRP_Range` = round((`MSRP_Range`/ `AVG_MSRP` *100), digits=1))

#Creating a common key to join carprices and us_catalog
carprices<- carprices %>% mutate(year2=year, make2=make, model2=model)
carprices <- carprices %>%  unite(`YMM`, `year2`, `make2`, `model2`, sep=" ")

Join #2

Code
#Joining Data with us_catalog
carprices <- left_join(carprices, us_catalog, c("YMM"= "Combined" )) 

#saving our raw records before filtering
raw_carprices <- carprices

#Filtering NAs out the keys that were not mapped to an MSRP using YMM
#(Will run these false keys through a vin decoder to get exact matches)
false_keys <- carprices %>% filter(is.na(AVG_MSRP))

#Removing the false_keys from main dataset
carprices <- carprices %>% filter(!is.na(AVG_MSRP)) 

us_catalog$Combined <- toupper(us_catalog$Combined)
false_keys$vin <- toupper(false_keys$vin)
false_keys$make <- toupper(false_keys$make)
false_keys$model <- toupper(false_keys$model)
Cleaning the Data: Decoding & More Joining

Vin deocder function used to decode all VINs that were not matched. This was available from the vindecodr package in R contributed by the NHTSA.

The code below shows how I created the ‘decoded_keys’ csv file that I read in earlier. I have set eval=FALSE because of the lengthy run time.

Code
# To clean Year, make, Model fields in false_keys (code run time = 3hrs + 13mins)

VinIn <-false_keys %>% select(vin) 
VinIn <- VinIn$vin

decode <- function(x) {
 VIN<- decode_vin(x)}

decoded_keys <- map_dfr(VinIn,  decode)

#Saving the above output as a CSV
write.csv(decoded_keys, "C:\\Users\\Owen-Matthew\\Documents\\Github\\601_Fall_2022\\posts\\decoded_keys.csv", row.names=FALSE)
Code
# # #R Studio Desktop Read In
##Cleaned False keys (Imported at read in so I set eval= FALSE)
decoded_keys <- read_csv("decoded.keys.csv")

#R Studio Cloud Read In
# decoded_keys <- read_csv("/cloud/project/Final_project/decoded.keys1.csv")

Join #3

Code
#Selecting useful columns in false_keys
f_keys <- false_keys
false_keys <- false_keys %>%
select(trim, body, Body_Type, transmission, vin, state , condition, odometer, color, interior,seller, mmr, sellingprice, saledate)

#Join with false keys to get sale records
#cleaned_keys= inner_join(a_keys, false_keys, c("VIN" = "vin")) %>% select(1:26) %>% rename("vin" = "VIN")
cleaned_keys <-  inner_join(false_keys, a_keys, c("vin" = "VIN"))

#saving unmatched NAs for future wrangling
action_keys <- action_keys %>%   select(model_year, make,model, VIN) %>%  mutate(year= model_year, make2=make, model2=model) %>% unite (YMM, model_year, make2, model2, sep = " ")

#saving unmatched keys
unmatched_keys <- anti_join(f_keys, cleaned_keys, c("vin"= "vin"))

Join #4-5

Code
#Attempting to clean the most common unmatched keys manually
unmatched_keys_clean <- mutate(unmatched_keys, year2=year, year3=year, make2=make, make3=make, model2 = case_when(str_detect(model,"6 SERIES")~ "6 SERIES", str_detect(model, "3 SERIES")~ "3 SERIES", str_detect(model, "4 SERIES")~ "4 SERIES",str_detect(model, "5 SERIES")~ "5 SERIES", str_detect(model, "CAPTIVA")~ "CAPTIVA",str_detect(model, "OUTLANDER")~ "OUTLANDER", str_detect(model, "RAM PICKUP 1500")~ "RAM 1500", str_detect(model, "TOWN AND COUNTRY")~ "TOWN & COUNTRY",  str_detect(model, c("750I", "7 SERIES",  "750LI"))~ "7 SERIES" ), model3=model2)
Error in `mutate()`:
! Problem while computing `model2 = case_when(...)`.
Caused by error in `str_detect()`:
! Can't recycle `string` (size 20826) to match `pattern` (size 3).
Code
unmatched_keys_clean <-  unmatched_keys_clean %>%  unite(YMM2, year2, make2, model2, sep= " ")
Error in unite(., YMM2, year2, make2, model2, sep = " "): object 'unmatched_keys_clean' not found
Code
unmatched_keys_clean <- unmatched_keys_clean %>%  
  select(year3, make3, model3, trim, body, Body_Type, transmission, vin, state , condition, odometer, color, interior,seller, mmr, sellingprice, saledate, YMM2) %>%
  filter(!is.na(make3), !is.na(model3)) 
Error in select(., year3, make3, model3, trim, body, Body_Type, transmission, : object 'unmatched_keys_clean' not found
Code
#An additional 6860 records were cleaned
cleaned_keys2 <- inner_join(unmatched_keys_clean, us_catalog, c("YMM2"= "Combined" )) %>% rename(year= year3, make=make3, model=model3, YMM=YMM2)
Error in inner_join(unmatched_keys_clean, us_catalog, c(YMM2 = "Combined")): object 'unmatched_keys_clean' not found
Code
#-------
unmatched_keys <- anti_join(unmatched_keys, cleaned_keys2, c("vin"= "vin"))
Error in is.data.frame(y): object 'cleaned_keys2' not found
Code
unmatched_unique <- unmatched_keys %>% select(vin) %>% distinct()
a <- (str_c(count(unmatched_unique), " ", "unique vehicle VINs; "))

unmatched_makes <- unmatched_keys %>% select(make) %>% distinct()
unmatched_models <- unmatched_keys %>% select(model) %>% distinct()
unmatched_YMM <- unmatched_keys %>% select(YMM) %>% distinct()
b <- (str_c(count(unmatched_makes), " ", "unmatched makes; "))
c <- (str_c(count(unmatched_models), " ", "unmatched models: "))
d <- (str_c(count(unmatched_YMM), " ", "unmatched YMM keys (Year-Make-Model)."))

print(str_c(a,b,c,d))
[1] "20507 unique vehicle VINs; 42 unmatched makes; 136 unmatched models: 349 unmatched YMM keys (Year-Make-Model)."

Binding cleaned data sets

Code
#---------
#get column names & order from carprices
check_cols <-  colnames(carprices) %>% as.factor()


#checking to see if the column order is the same
cleaned_keys <- cleaned_keys[, check_cols]
cleaned_keys2 <- cleaned_keys2[, check_cols]
Error in eval(expr, envir, enclos): object 'cleaned_keys2' not found
Code
#Merging both sets of cleaned data (Auto and manually cleaned)
cleaned_keys <-  rbind(cleaned_keys, cleaned_keys2)
Error in rbind(deparse.level, ...): object 'cleaned_keys2' not found
Code
print(kable(head(cleaned_keys), caption = "Table 1.1: Displaying cleaned keys"))


Table: Table 1.1: Displaying cleaned keys

| year|make     |model |trim        |body    |transmission |vin               |state | condition| odometer|color |interior |seller                   |   mmr| sellingprice|saledate                                |Body_Type |YMM               |Segment               | AVG_MSRP| MSRP_Range| Trim_Count| StdDev_MSRP| Error.MSRP| Error %| %MSRP_Range|
|----:|:--------|:-----|:-----------|:-------|:------------|:-----------------|:-----|---------:|--------:|:-----|:--------|:------------------------|-----:|------------:|:---------------------------------------|:---------|:-----------------|:---------------------|--------:|----------:|----------:|-----------:|----------:|-------:|-----------:|
| 2015|LEXUS    |RX    |Base        |SUV     |automatic    |2T2ZK1BA7FC155782 |CA    |       4.3|     7770|white |black    |lexus financial services | 41200|        38400|Tue Dec 30 2014 00:00:00 GMT-0800 (PST) |SUV       |2015 LEXUS RX     |Compact SUV/Crossover |    53276|   11216.33|          3|    5728.452|   3307.324|     6.2|        21.1|
| 2013|INFINITI |G37   |G37 Sport   |G COUPE |automatic    |JN1CV6EK4DM921604 |CA    |       4.5|    14716|white |black    |nissan infiniti lt       | 25900|        25000|Thu Dec 18 2014 12:30:00 GMT-0800 (PST) |Coupe     |2013 INFINITI G37 |Mid-Size Car          |    56018|   23618.85|         17|    7296.869|   1769.751|     3.2|        42.2|
| 2013|INFINITI |G37   |G37 Journey |G SEDAN |automatic    |JN1CV6AP3DM726059 |CA    |       4.6|       68|black |black    |nissan infiniti lt       | 25700|        27250|Thu Dec 18 2014 12:30:00 GMT-0800 (PST) |Sedan     |2013 INFINITI G37 |Mid-Size Car          |    56018|   23618.85|         17|    7296.869|   1769.751|     3.2|        42.2|
| 2013|INFINITI |G37   |G37 Journey |G SEDAN |automatic    |JN1CV6APXDM714541 |CA    |       4.5|    24112|white |black    |nissan infiniti lt       | 22800|        22500|Thu Dec 18 2014 12:30:00 GMT-0800 (PST) |Sedan     |2013 INFINITI G37 |Mid-Size Car          |    56018|   23618.85|         17|    7296.869|   1769.751|     3.2|        42.2|
| 2013|INFINITI |G37   |G37 Sport   |G COUPE |automatic    |JN1CV6EK8DM920763 |CA    |       4.7|     4859|black |black    |nissan infiniti lt       | 27200|        26750|Thu Dec 18 2014 12:30:00 GMT-0800 (PST) |Coupe     |2013 INFINITI G37 |Mid-Size Car          |    56018|   23618.85|         17|    7296.869|   1769.751|     3.2|        42.2|
| 2013|INFINITI |G37   |G37 Sport   |G COUPE |automatic    |JN1CV6EK8DM900321 |CA    |       4.5|    21126|black |beige    |nissan infiniti lt       | 25100|        26500|Thu Dec 18 2014 12:30:00 GMT-0800 (PST) |Coupe     |2013 INFINITI G37 |Mid-Size Car          |    56018|   23618.85|         17|    7296.869|   1769.751|     3.2|        42.2|
Code
print(kable(head(cleaned_keys2), caption = "Table 1.2: Displaying cleaned keys2"))
Error in head(cleaned_keys2): object 'cleaned_keys2' not found

1st iteration of cleaning through decoder: 67% of the previous NA records were automatically cleaned (38208 of 57328 records) 2nd iteration of cleaning (Manually): 6860 records were manually cleaned. In total, 79% of the NAs were cleaned in the Year, Make, and Model fields

Code
#reorganize cleaned keys to rbind with carprices data
carprices <-  rbind(carprices, cleaned_keys)

Considerations:

This approximation of the MSRP using the YMM formed the basis for a lot of the analysis:

  1. Computation was narrowed down to model level. By using the mean MSRP for all trim levels, I can only analyze the retained value on the Year, Make, Model level i.e., not specific to the trim. For example, I can only analyze how a 3-year-old Toyota Camry retains its value but not a Toyota Camry XSE. 

  2. Normal distribution of mean price. The mean of the MSRP assumes that the purchase of trims/packages above the base price mimics a normal distribution. That means that most customers will buy at or around the mean priced trim. Additionally, because trims often follow interval pricing, the mean MSRP of most passenger vehicles tends to be close to the median. Therefore, for the standard vehicle, there is generally not much variation between the median trim price and the mean price of all trims. 

  3. Useful statistics: it was valuable to assess the number of trims for each model, the standard deviation, and the standard error in my approximation of MSRP. To further the analysis, I also included the Range in MSRP (price difference between top and base trim) as a percentage of mean MSRP, along with Error % in MSRP- which is the standard error divided by the mean MSRP. Since MSRP is the denominator in our RV equation, these metrics provide an idea of how accurate our retained value (RV) percentage is. They also serve as appropriate means of filtering out low-quality data and as a sanity check to statistical tests that were conducted on correlations and regressions.

The next critical variable is age. For this analysis, I will treat age as continuous in Age_months and Age_years. Like the above assumptions, I will assume that all vehicles of model year x are released in the Fall of the previous year (x-1). This will be done by creating the launch date variable and finding the difference from the sale date in months and years, to obtain the age of the vehicle at the time of sale. A deeper analysis will be placed on 2-year- to 6-year-old vehicle transactions. See the Appendix for more details about the variables.

Calculations:

Code
dim(carprices)

#Checking for NA values in AVG_MSRP and YMM
# carprices %>% filter(is.na(AVG_MSRP)) 
# carprices %>% filter(is.na(YMM)) 

#Replacing na records in Error%
carprices$`Error %` <- carprices$`Error %` %>% replace_na(0)
carprices <- carprices# %>% filter(`Error %` <7)

#Observing new NA records for Bodytype
na.body <- carprices %>%  filter(is.na(Body_Type))
Code
#Creating Launch Date Step 1
carprices$year <- as.numeric(as.character(carprices$year))

carprices<- carprices %>%  mutate(launch_month= "9", launch_day= "1", launch_year= (`year`-1))

#Creating Launch Date Step 2
carprices <- carprices %>% mutate(launch_date= make_date(year= `launch_year`, month = `launch_month`, day=`launch_day`))

#Converting saledate to a date variable
carprices <- carprices %>% separate(saledate, into= c("ddd", "sale_month", "sale_day", "sale_year", "sale_time", "GMT", "TMZ"), sep=" ")   

#Recoding the format of the sale month
carprices <- mutate(carprices, sale_month2= recode(sale_month, `Jan` ="1", `Feb`= "2", `Mar`="3", `Apr`="4", `May`="5", `Jun`="6", `Jul`="7", `Aug`="8", `Sep`="9", `Oct`="10", `Nov`="11", `Dec`="12" ))

#Creating Date_Sold
#carprices <- mutate(carprices, sale_year2=sale_year)
carprices <- mutate(carprices, sale_year2=sale_year, Date_Sold = make_date(year=sale_year2, month = sale_month2, day = sale_day))

# Tidyiing up: selecting important columns
carprices <- carprices %>% 
select(year, make, model, trim, transmission, Body_Type, state, condition, odometer, color, mmr, sellingprice, sale_year, launch_date, Date_Sold, AVG_MSRP, Trim_Count, `Error %`, `%MSRP_Range`, Segment ) 

#Age of vehicle
carprices <- mutate(carprices, Age_months= round(as.numeric(difftime(Date_Sold, launch_date, units= "days"))/30))
Code
#Checking for negatives in Vehicle Age
carprices %>%  select(Age_months) %>%  filter(Age_months<0)

#Recoding Vehicle Age
carprices<-carprices %>% mutate(Age_months=recode(Age_months, `-8`=0))

#Creating Age in years
carprices<-carprices %>% mutate(Age_years=round(Age_months/12, 0))

Below is the calculation of Retained Value:

Code
#Retained value and removing irrational RV%
carprices <- carprices %>% 
mutate(RV_percent= round(`sellingprice` / `AVG_MSRP` *100, digits = 1))# %>% filter(RV_percent<125)
print(kable(head(carprices)))


| year|make   |model    |trim       |transmission |Body_Type |state | condition| odometer|color |   mmr| sellingprice|sale_year |launch_date |Date_Sold  | AVG_MSRP| Trim_Count| Error %| %MSRP_Range|Segment                | Age_months| Age_years| RV_percent|
|----:|:------|:--------|:----------|:------------|:---------|:-----|---------:|--------:|:-----|-----:|------------:|:---------|:-----------|:----------|--------:|----------:|-------:|-----------:|:----------------------|----------:|---------:|----------:|
| 2015|Kia    |Sorento  |LX         |automatic    |SUV       |CA    |       5.0|    16639|white | 20500|        21500|2014      |2014-09-01  |2014-12-16 |    30886|         10|     4.6|        44.9|Mid-Size SUV/Crossover |          4|         0|       69.6|
| 2015|Kia    |Sorento  |LX         |automatic    |SUV       |CA    |       5.0|     9393|white | 20800|        21500|2014      |2014-09-01  |2014-12-16 |    30886|         10|     4.6|        44.9|Mid-Size SUV/Crossover |          4|         0|       69.6|
| 2014|BMW    |3 Series |328i SULEV |automatic    |Sedan     |CA    |       4.5|     1331|gray  | 31900|        30000|2015      |2013-09-01  |2015-01-15 |    43928|         17|     3.2|        48.2|Compact Car            |         17|         1|       68.3|
| 2015|Volvo  |S60      |T5         |automatic    |Sedan     |CA    |       4.1|    14282|white | 27500|        27750|2015      |2014-09-01  |2015-01-29 |    41472|         32|     1.6|        37.2|Compact Car            |          5|         0|       66.9|
| 2015|Nissan |Altima   |2.5 S      |automatic    |Sedan     |CA    |       1.0|     5554|gray  | 15350|        10900|2014      |2014-09-01  |2014-12-30 |    25683|          5|     6.6|        36.9|Mid-Size Car           |          4|         0|       42.4|
| 2014|BMW    |M5       |Base       |automatic    |Sedan     |CA    |       3.4|    14943|black | 69000|        65000|2014      |2013-09-01  |2014-12-17 |    96332|          2|     0.0|         0.0|Mid-Size Car           |         16|         1|       67.5|
Code
#How does the distribution of RV% look?
#summary(carprices$RV_percent)


#Annual KMs
carprices <- carprices %>% 
  mutate(Annual_miles= round(odometer/Age_months * 12, digits=0))

#Removing Salvage title and damaged vehicles
carprices <- carprices %>% filter(condition>1.4999999)

Visualizations

Before analyzing the impact of the independent variables on the retained value, I will first look at the relationship that each of the IVs have on each other using multiple dimensions.

Code
#Establishing vehicle groups
#Error in Nissan records
mainstream <- c("Toyota", "Honda", "Hyundai", "Nissan", "Hyundai", "Kia", "Mazda", "Subaru", "Volkswagen")
domestic <- c("Buick", "Cadillac", "Chevrolet", "Chrysler", "Dodge", "Ford", "GMC", "Jeep", "Lincoln", "Ram")
luxury <- c("BMW", "Audi", "Mercedes-Benz", "Porsche", "Lexus", "INFINITI", "Acura", "Genesis", "Volvo", "Cadillac", "Lincoln")
passenger <-  c("Sedan", "SUV", "Coupe", "Convertible", "Wagon", "Hatchback", "Minivan")
pickup <- c("Pickup")
Japanese <-  c("Toyota", "Honda", "Nissan", "Mazda", "Subaru", "Scion", "Mitsubishi", "Suzuki")
Korean <- c("Kia", "Hyundai", "Genesis")
Pop.European <- c("Audi", "BMW", "Mercedes-Benz", "Jaguar", "Land Rover" )

comp.seg <- c(mainstream, domestic, luxury)
comp.makes <- c(Japanese, Korean, Pop.European)

Correlation: Mileage, Age, Body, and Condition

Code
#Correlation between mileage and age (in months and years)
MLG.AGE1 <- round(cor(carprices$odometer, carprices$Age_months, use= "pairwise.complete.obs")  , digits=3)
MLG.AGE2 <- round(cor(carprices$odometer, carprices$Age_years, use= "pairwise.complete.obs") , digits=3)

#Correlation between mileage and condition
MLG.CON1. <- round(cor(carprices$odometer, carprices$condition, use= "pairwise.complete.obs"), digits=3)

#Correlation between Age and Condition
AGE.CON1 <- round(cor(carprices$Age_months, carprices$condition, use= "pairwise.complete.obs"), digits=3)
AGE.CON2 <- round(cor(carprices$Age_years, carprices$condition, use= "pairwise.complete.obs"), digits=3)

#Correlation between RV_percent and mileage
RV.MLG <- round(var(carprices$RV_percent, carprices$odometer, use= "pairwise.complete.obs"), digits= 3)
#Correlation between RV_percent and condition
RV.CON <- round(cor(carprices$RV_percent, carprices$condition, use= "pairwise.complete.obs"), digits= 3) 
Code
plot1.1 <- carprices %>% 
  filter(make== c(`mainstream`, `luxury`, `domestic`), Body_Type== c(passenger, pickup), condition>0)
  
ggplot(data=plot1.1) + 
aes(y =condition , x = (odometer), color= Age_years) +
 geom_point(position = "jitter")+
  geom_smooth(color="red")+
  xlim(0,210000)+
  ylim(2,5)+
  #facet_wrap(~Body_Type)+
  labs(y=" Condition", x= "Mileage", color= "Age (Years)")+
  ggtitle("Graph 1.1: Relationship between Mileage, Condition and Age", subtitle = str_c("Correlation between Mileage and Condition ", " = ", MLG.AGE1))

Code
plot1.2 <- plot1.1
ggplot(data=plot1.2) + 
aes(y =condition , x = (Age_years), color= odometer) +
 geom_point(position = "jitter")+
  geom_smooth(color="red")+
  ylim(2,5)+
  #facet_wrap(~Body_Type)+
  labs(y=" Condition", x= "Age", color= "Mileage")+
  ggtitle("Graph 1.2: Relationship between Mileage, Condition and Age", subtitle = str_c("Correlation between Age and Condition ", " = ", AGE.CON1))

Graphs 1.1 and 1.2 above show the inverse relationships across the condition, mileage, and age. While this relationship is very intuitive and almost trivial, a closer look will highlight that a vehicle’s condition is more sensitive to mileage at lower ages. In other words, As vehicles age, the impact of mileage on condition decreases. This was illustrated in Graph 1.1 where the slope of the curve steadily decreased over time.

This may also be caused by the fact that older high-mileage vehicles tend not to be driven as much as younger low-mileage vehicles, due to cost and several other factors like reliability issues. An exception to this would be the rare case where older vehicles (or antiques) already in very good condition are intentionally driven less to preserve their condition. Therefore, aging but maintaining a higher condition rating. Fortunately, I am only analyzing models from 2006 and newer, so this “antique effect” is of no significance but is still worthy of mention for future research purposes.

Code
plot1.3 <- carprices %>% 
  filter(make== c(`mainstream`, `luxury`, `domestic`), Body_Type== c(passenger, pickup))
  
ggplot(data=plot1.3) + 
aes(y =condition , x = (odometer), color= Age_years) +
 geom_point(position = "jitter")+
  geom_smooth(color="red")+
  xlim(0, 180000)+
  ylim(1,5)+
  facet_wrap(~Body_Type)+
  labs(y=" Condition", x= "Mileage", color= "Age (Years)")+
  ggtitle("Graph 1.3: Relationship between Mileage, Condition, Age and Body Type")  
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Code
          # ,subtitle = str_c("Correlation between Mileage and Condition ", " = ", MLG.AGE1))

plot1.4 <- plot1.3
ggplot(data=plot1.4) + 
aes(y =condition , x = Age_years, color=odometer) +
 geom_point(position = "jitter")+
  geom_smooth(color="red")+
  #xlim(0, 200000)+
  ylim(1,5)+
  facet_wrap(~Body_Type)+
  labs(y=" Condition", x= "Age", color= "Mileage")+
  ggtitle("Graph 1.4: Relationship between Mileage, Age, Body Type and Condition")  
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Code
          # ,subtitle = str_c("Correlation between Mileage and Condition ", " = ", MLG.AGE1))

Keeping in mind the previous takeaways, graphs 1.3 and 1.4 illustrate the correlations by body type.

Unsurprisingly, among all body types, pickups and SUVs are least sensitive to mileage and age. This may be because most pickups and many SUVs are used for a wide range of purposes. They are engineered and designed to be driven more than other body types.

Insufficient data points to make conclusions about wagons and convertibles.

Correlation: Mileage, Age, Body, Condition and Retained Value

Code
#Correlation between RV_percent and mileage
plot2.1 <- carprices %>% filter(make== c(`mainstream`, `luxury`), Body_Type==c(passenger, pickup), RV_percent >15) %>% arrange(desc(RV_percent)) 


#Correlation between RV_percent and mileage
RV.MLG <- round(var(plot2.1$RV_percent, plot2.1$odometer, use= "pairwise.complete.obs"), digits= 3)


ggplot(data = plot2.1) + aes(y = RV_percent, x = (odometer) , color= Age_years) +
geom_point(position = "jitter")+
geom_smooth(color="orange")+
labs(x=" Mileage", y= "Retained Value %", color= "Age (Years)")+
  ggtitle("Graph 2.1: Correlation between Retained Value and Mileage", subtitle = str_c("Correlation", " = ", RV.MLG))
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Code
#Correlation between RV_percent and condition
plot2.2 <- carprices %>% 
  filter(Body_Type== c(passenger, pickup)) %>% 
group_by(Age_months)

RV.CON <- round(cor(plot2.2$RV_percent, plot2.2$condition, use= "pairwise.complete.obs"), digits= 3)

ggplot(data = plot2.2) + aes(x =condition, y = (RV_percent)) +
 geom_smooth()+
  labs(x=" Condition", y= "Retained Value %")+
  ggtitle("Graph 2.2: Correlation between Retained Value and Condition", subtitle = str_c("Correlation", " = ", RV.CON)) 

Graph 2.1 shows the inverse relationship and diminishing effect that mileage and age have on retained value. However, as seen in graph 2.2, on aggregate, retained value is more strongly correlated to the condition of the car than its mileage.

Auction Transactions by High-Volume States

This section aims to take a more comprehensive look at our data. I’ll analyze the states with the most transactions to see if there are any unique patterns to observe.

Code
#Analyzing the number of transactions by state by the age 
vol.states<- carprices %>% 
  filter(Age_years < 7, Age_years >0) %>% 
  group_by(Age_years, state) %>% tally() %>% 
  arrange(desc(n)) %>% slice(1:5)

vol.states %>% 
  ggplot()+ aes(x=reorder(state, -n), y= n) +
  geom_bar(stat = "identity", fill= "dark blue") +
labs(x= "State", y="Number of Transactions") +
facet_wrap(vars(Age_years))+
ggtitle("Graph 3.1: Number of Transactions by State (Faceted by Age in Years)")

From Graph 3.1 above, we see that most vehicles are auctioned between ages 2 and 4 years. This corresponds to the popular lease terms of 24 to 48 months.

Code
top.states <- vol.states %>% ungroup() %>%   group_by(state, Age_years) %>%  arrange(desc(n)) %>% slice(1:3)

plot3.2 <- carprices %>%  
filter(state==vol.states$state, Body_Type== c(passenger, pickup), RV_percent> 15, Annual_miles<25000 ) 

#Correlation between RV_percent and age
RV.AGE <- round(cor(plot3.2$RV_percent, plot3.2$Age_months, use= "pairwise.complete.obs"), digits= 3)

ggplot(data = plot3.2) + aes(y = RV_percent, x = (Age_months),  color= Annual_miles) +
 geom_point(position = "jitter")+
geom_smooth(color="green")+
  labs(x=" Age (Months)", y= "Retained Value %", color= "Mileage")+
  facet_wrap(~state)+
  xlim(0,90)+
  ggtitle("Graph 3.2: Retained Value vs Age in High Volume States", subtitle = str_c("Correlation", " = ", RV.AGE))
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Graph 3.2 shows that generally, across the states with the most auction transactions, vehicles experience similar depreciation curves and correlations between age and mileage.

I filtered for the annual miles driven AnnualMiles to be less than 25,000/ year to filter out vehicles used for commercial purposes. This was done because commercial vehicles (eg. rental cars, taxis, fleet vehicles, etc. ) add another layer of complexity that is beyond the scope of this paper.

Comparative Analysis

This section will delve into more specific analyses with the aim of providing additional insights.

Code
#Analysis by segment
by_segment <- carprices %>% 
  filter(Age_years < 8, !is.na(Segment)) %>% #, make==c(mainstream, luxury)) %>% 
  group_by(Age_years, Segment, sale_year, Age_months) %>% 
  summarise(Segment_RV= round(mean(RV_percent), digits=1)) %>% 
  arrange(desc(Segment_RV)) %>%arrange((Age_years)) %>% arrange((sale_year)) %>% slice(1:5) 

ggplot(data = by_segment) + 
geom_smooth(mapping = aes(x = Age_months, y = Segment_RV, color = `Segment`, position= "jitter"), se= F)+
  xlim(10,75)+
  labs(x= "Age (Months)", y= "Retained Value %", fill= "Segment")+
  ggtitle("Graph 4.1 Retained Value % by Segment")

Code
#Finding the Segments with the highest RV
best_value_segments <- carprices %>%
  filter(Age_years> 1, Age_years <6, Body_Type== c(passenger, "Pickup"), `Error %` < 10, !is.na(Segment), Segment !="HD Full-Size Pickup")%>% 
  group_by(Segment,Age_years,  RV_percent, odometer, condition, model) %>% tally() %>% 
  summarise(RV_percentage = mean(RV_percent), odometer=mean(odometer),condition=mean(condition), .groups = "keep") %>% tally() %>% 
  summarise(Segment=Segment, Age_years= Age_years, RV_percent= mean(RV_percent), odometer=mean(odometer),condition=mean(condition),  n = sum(n), .groups = "keep") %>%
ungroup() %>% 
   group_by(Segment, Age_years) %>% 
  summarise(Segment=Segment, Age_years= Age_years, RV_percent= mean(RV_percent),Mileage=mean(odometer), Condition=mean(condition),  Sample_Size = sum(n), .groups = "keep") %>%
  slice(1:1) %>% 
  arrange(desc(RV_percent)) %>% arrange((Age_years))
     
print(kable(best_value_segments))


|Segment                  | Age_years| RV_percent|  Mileage| Condition| Sample_Size|
|:------------------------|---------:|----------:|--------:|---------:|-----------:|
|Mid-Size Pickup          |         2|   84.49062| 23863.53|  4.235938|          64|
|LD Full-Size Pickup      |         2|   65.05410| 24103.60|  4.165395|         708|
|Mid-Size SUV/Crossover   |         2|   62.48915| 31570.73|  4.113248|        1170|
|Subcompact SUV/Crossover |         2|   62.45069| 23784.00|  4.109972|         361|
|Compact SUV/Crossover    |         2|   60.55754| 31623.59|  4.147182|        1526|
|Full-Size SUV/Crossover  |         2|   59.17824| 37386.29|  3.983682|         239|
|Subcompact Car           |         2|   58.65251| 26016.39|  3.857011|         756|
|Compact Car              |         2|   57.62080| 28570.99|  3.850645|        2636|
|Sports Car               |         2|   51.65524| 26379.09|  3.810995|         382|
|Mid-Size Car             |         2|   50.26370| 30218.83|  3.895602|        3479|
|Minivan                  |         2|   48.67544| 38333.18|  3.810142|         562|
|Full-Size Car            |         2|   46.37824| 33590.26|  3.802490|         763|
|Mid-Size Pickup          |         3|   69.97907| 37121.28|  3.669767|          43|
|LD Full-Size Pickup      |         3|   61.14357| 43369.34|  3.918182|         319|
|Mid-Size SUV/Crossover   |         3|   53.53922| 40163.03|  3.974729|         645|
|Subcompact SUV/Crossover |         3|   52.79717| 38716.22|  3.824528|         106|
|Compact SUV/Crossover    |         3|   50.99491| 43130.34|  3.882127|         884|
|Compact Car              |         3|   49.83264| 36667.74|  3.636541|        1590|
|Full-Size SUV/Crossover  |         3|   48.60702| 48826.74|  3.700000|         114|
|Subcompact Car           |         3|   44.57835| 36470.44|  3.654464|         448|
|Sports Car               |         3|   44.01447| 37149.31|  3.594340|         159|
|Mid-Size Car             |         3|   41.90520| 38922.92|  3.649016|        2540|
|Minivan                  |         3|   38.73265| 59047.21|  3.515510|         247|
|Full-Size Car            |         3|   36.99537| 48094.70|  3.482927|         410|
|Mid-Size Pickup          |         4|   63.71803| 57009.02|  3.825410|         122|
|LD Full-Size Pickup      |         4|   57.18272| 57539.25|  3.756790|         648|
|Subcompact SUV/Crossover |         4|   49.24941| 49667.81|  3.642941|         170|
|Compact SUV/Crossover    |         4|   48.86856| 46028.75|  3.832457|        1673|
|Mid-Size SUV/Crossover   |         4|   48.09937| 48797.56|  3.858866|        1111|
|Full-Size SUV/Crossover  |         4|   46.27714| 61289.26|  3.845714|         140|
|Compact Car              |         4|   46.23107| 41537.70|  3.549270|        2330|
|Sports Car               |         4|   45.71592| 44484.68|  3.571144|         201|
|Subcompact Car           |         4|   43.42934| 46042.44|  3.575000|         484|
|Minivan                  |         4|   40.06356| 61850.17|  3.389894|         376|
|Mid-Size Car             |         4|   39.91684| 43051.56|  3.608307|        4059|
|Full-Size Car            |         4|   38.91142| 47993.60|  3.608391|         727|
|Mid-Size Pickup          |         5|   58.94933| 62382.53|  3.728000|          75|
|LD Full-Size Pickup      |         5|   56.89715| 68722.36|  3.613415|         246|
|Compact SUV/Crossover    |         5|   48.49789| 61335.65|  3.696677|         331|
|Sports Car               |         5|   44.80435| 62525.55|  3.401449|          69|
|Subcompact SUV/Crossover |         5|   42.87000| 65213.82|  3.363333|          60|
|Mid-Size SUV/Crossover   |         5|   42.75421| 69558.88|  3.680135|         297|
|Full-Size SUV/Crossover  |         5|   42.50333| 75887.97|  3.683333|          90|
|Compact Car              |         5|   40.57613| 61687.34|  3.242661|         513|
|Subcompact Car           |         5|   38.78800| 68058.28|  3.123000|         100|
|Mid-Size Car             |         5|   34.83214| 65151.75|  3.364083|         529|
|Minivan                  |         5|   34.31500| 84112.60|  3.111250|          80|
|Full-Size Car            |         5|   34.07561| 66903.28|  3.385366|         164|
Code
#Removing a few missing fields in Segments that were missing in Catalog
BVS <- best_value_segments %>%  filter(!is.na(Segment)) 


BVS %>% 
ggplot() + aes(x = Age_years, y =(RV_percent)) +
  geom_bar(stat= "identity", fill="darkred") +
  facet_wrap(~reorder(Segment, -RV_percent))+
    labs(x= "Age (Years)", y= "Retained Value %")+
  ggtitle("Graph 4.2: Retained Value % vs. Age Ranked by Segment")

The graphical representation of data in Graphs 4.1 and 4.2 suggests that there is a hierarchy that exists among body types. Pickups have generally retained the best value across all years followed by SUVs and cars (sedans, coupes, convertibles, etc).

Bonus Section

How does RV% compare among volume models?

Code
#Volume models

vol.models<- carprices %>% 
  filter( Age_years >0) %>% 
  group_by(model) %>% 
  tally() %>% 
  arrange(desc(n)) %>% 
  ungroup() %>% 
  slice(1:10)

vol.models.passenger <- carprices %>% 
  filter(Age_years >0, Body_Type== passenger) %>% 
  group_by(model) %>% 
  tally() %>% 
  arrange(desc(n)) %>% 
  ungroup() %>% 
  slice(1:10)

popular.models <-  vol.models.passenger$model
Code
vol.models.passenger %>% 
  ggplot()+ aes(x=reorder(model, -n), y= n) +
  geom_bar(stat = "identity", fill= "blue") +
  labs(x= "Model", y="Number of Transactions")+
  ggtitle("Graph 5.1: Most Auctioned Models")

From Graph 5.1, it is easy to assume that the Nissan Altima is a highly sought-after vehicle and may therefore hold its value well. However, that may not be entirely true. Graph 5.2 will explore this further.

Do the most auctioned vehicles retain the best value ?

Code
#Finding the models with the highest RV

BVM <- carprices %>% filter(model== vol.models.passenger$model) %>% 
  filter(Age_years> 1, Age_years <6, Body_Type== passenger) %>% 
  group_by(make, model,Age_years,  RV_percent, `Error %`, odometer, condition, Segment) %>% tally() %>% 
  summarise(RV_percentage = mean(RV_percent), `Error %`=mean(`Error %`, na.rm=TRUE),  odometer=mean(odometer), condition=mean(condition), .groups = "keep") %>% tally() %>% 
  summarise(make=make, model=model, Age_years= Age_years, RV_percent= mean(RV_percent), `Error %`=mean(`Error %`, na.rm=TRUE), odometer=mean(odometer),condition=mean(condition),  n = sum(n), .groups = "keep") %>% ungroup() %>% 
   group_by(make, model, Age_years) %>% 
  summarise(make=make, model=model, Age_years= Age_years, RV_percent= mean(RV_percent),`Error %`=mean(`Error %`, na.rm=TRUE), mileage=mean(odometer), condition=mean(condition),  n = sum(n), .groups = "keep") %>%
  slice(1:1) %>% arrange(desc(RV_percent)) %>%
   ungroup() %>% 
  #Filtering a sample size 20 or more
  filter(n>19) %>%
arrange((Age_years))
(kable(BVM))
make model Age_years RV_percent Error % mileage condition n
Ford Escape 2 59.80893 5.371429 32660.32 4.105357 56
Ford Focus 2 54.93243 5.670270 30007.46 4.068919 74
Ford Fusion 2 53.94091 5.018182 26895.30 4.066667 66
Dodge Charger 2 52.56486 4.002703 27400.24 3.859459 37
Toyota Camry 2 51.63448 5.879310 32260.52 3.751724 29
Nissan Altima 2 50.76418 4.607463 32063.15 3.797015 67
Dodge Grand Caravan 2 43.03421 6.200000 43253.37 3.742105 38
INFINITI G37 2 40.27436 3.200000 21123.33 4.069231 39
Chevrolet Impala 2 37.74390 3.300000 45675.93 3.465854 41
Ford Escape 3 51.96250 5.818750 39425.44 3.896875 32
Ford Focus 3 45.75909 4.818182 41420.00 3.945454 22
Toyota Camry 3 45.49048 5.928571 41978.05 3.519048 21
Ford Fusion 3 44.69697 5.978788 45612.97 3.896970 33
Nissan Altima 3 41.21406 4.793750 38131.33 3.482813 64
INFINITI G37 3 36.94400 3.144000 23883.64 3.896000 25
Chevrolet Impala 3 30.49630 2.703704 56683.44 3.311111 27
Ford Focus 4 44.11364 3.872727 51600.68 3.709091 22
Toyota Camry 4 43.64286 5.735714 39723.24 3.280952 42
Honda Accord 4 40.65000 3.453125 40321.59 3.281250 32
Ford Escape 4 40.50000 6.360606 48104.48 3.906061 33
Nissan Altima 4 39.35612 5.263265 40394.43 3.583674 98
Ford Fusion 4 38.64286 6.714286 54026.71 3.592857 56
INFINITI G37 4 38.01042 3.000000 35770.04 3.814583 48
Chevrolet Impala 4 28.63704 2.137037 62979.44 3.307407 27

Graph 5.2 shows that the most auctioned vehicles do not necessarily retain the best value. This may be due to an oversupply of these vehicles at auction, and with a higher supply, a lower price often follows.

Another reason could be low consumer sentiment toward these vehicles. After looking at the nature of the sellers, one can conclude that most of these vehicles are lease returns. Vehicles with high lease returns can imply a low consumer sentiment, therefore decreasing the long-term resale value.

Code
vol.p.models <- vol.models.passenger$model %>% as.factor()
vol.vs.best.passenger <- BVM %>% filter(model== vol.p.models)

BVM %>%
  ggplot()+ aes(x=reorder(model, -RV_percent), y= RV_percent) +
  geom_bar(stat = "identity",position = "dodge", fill= "dark orange" ) +
  labs(x= "Model", y="Mean Retained Value %")+
  ggtitle("Graph 5.2: Retained values of the Most Auctioned Models" , subtitle = "Mean Retained Value Percentage from Year 2 to 5")

Best mainstream cars to buy post-2015

Code
#Finding the models with the highest RV
best_value_models <- carprices %>%
  filter(Age_years> 1, Age_years <6, Body_Type== passenger, `Error %` < 5) %>% 
  group_by(make, model,Age_years,  RV_percent, `Error %`, odometer, condition, Segment) %>% tally() %>% 
  summarise(RV_percentage = mean(RV_percent), `Error %`=mean(`Error %`, na.rm=TRUE),  odometer=mean(odometer), condition=mean(condition), .groups = "keep") %>% tally() %>% 
  summarise(make=make, model=model, Age_years= Age_years, RV_percent= mean(RV_percent), `Error %`=mean(`Error %`, na.rm=TRUE), odometer=mean(odometer),condition=mean(condition),  n = sum(n), .groups = "keep") %>% ungroup() %>% 
   group_by(make, model, Age_years) %>% 
  summarise(make=make, model=model, Age_years= Age_years, RV_percent= mean(RV_percent),`Error %`=mean(`Error %`, na.rm=TRUE), mileage=mean(odometer), condition=mean(condition),  n = sum(n), .groups = "keep") %>%
  slice(1:1) %>% arrange(desc(RV_percent)) %>%
   ungroup() %>% 
  #Filtering a sample size 20 or more
  filter(n>19) %>%
arrange((Age_years))
print(kable(best_value_models, caption= "Passengers Cars with Best RV"))


Table: Passengers Cars with Best RV

|make          |model             | Age_years| RV_percent|   Error %|  mileage| condition|   n|
|:-------------|:-----------------|---------:|----------:|---------:|--------:|---------:|---:|
|Jeep          |Wrangler          |         2|   86.01758| 4.3000000| 20580.66|  4.551648|  91|
|BMW           |4 Series          |         2|   77.63913| 3.4000000| 10463.70|  4.413043|  23|
|Jeep          |Cherokee          |         2|   75.13750| 2.5000000| 19183.85|  4.295833|  48|
|Toyota        |Corolla           |         2|   73.25143| 3.7000000| 22960.35|  3.816190| 105|
|GMC           |Acadia            |         2|   70.34545| 4.5000000| 24056.27|  4.281818|  22|
|Nissan        |Versa Note        |         2|   70.17021| 4.7000000| 17958.79|  4.031915|  47|
|Subaru        |Impreza           |         2|   70.00741| 2.9000000| 19390.85|  4.137037|  27|
|Subaru        |Forester          |         2|   69.92857| 3.7178571| 29487.36|  4.185714|  28|
|Subaru        |Outback           |         2|   69.26308| 3.8615385| 27320.11|  4.324615|  65|
|Mercedes-Benz |GLK-Class         |         2|   68.75676| 1.7945946| 24537.11|  4.237838|  37|
|Subaru        |Legacy            |         2|   68.14815| 4.5000000| 18589.44|  4.437037|  27|
|Dodge         |Durango           |         2|   67.41754| 3.8701754| 28284.00|  4.330702| 114|
|Honda         |CR-V              |         2|   66.99825| 4.6421053| 23874.30|  4.161403|  57|
|Ford          |Escape            |         2|   66.94753| 4.8000000| 28125.73|  4.154321| 162|
|Nissan        |Sentra            |         2|   66.43500| 4.9000000| 27945.42|  3.745000| 120|
|INFINITI      |Q50               |         2|   66.01951| 4.0000000| 16612.83|  4.075610|  41|
|Ford          |Expedition        |         2|   66.01094| 2.7125000| 30222.42|  4.381250|  64|
|FORD          |FUSION            |         2|   65.81795| 4.7000000| 32849.51|  4.120513|  39|
|INFINITI      |JX35              |         2|   65.78182| 0.0000000| 37006.95|  3.827273|  22|
|Honda         |Civic             |         2|   65.74870| 4.4000000| 20880.17|  3.703478| 115|
|Chevrolet     |Tahoe             |         2|   65.53418| 3.9341772| 43381.96|  3.941772|  79|
|Honda         |Pilot             |         2|   65.27895| 4.8000000| 26278.13|  4.405263|  38|
|Ford          |Explorer          |         2|   65.15935| 4.9000000| 37372.12|  4.013415| 248|
|Toyota        |RAV4              |         2|   64.64468| 4.9000000| 43028.23|  3.844681|  47|
|Jeep          |Compass           |         2|   63.02941| 3.1705882| 25850.37|  3.990196|  51|
|Toyota        |Venza             |         2|   62.92727| 2.3181818| 37372.82|  3.795454|  22|
|Lincoln       |MKX               |         2|   62.45714| 0.0000000| 24495.98|  4.382143|  56|
|Jeep          |Patriot           |         2|   61.94444| 3.2875000| 27918.83|  3.979167|  72|
|Honda         |Accord            |         2|   61.91981| 3.3000000| 21560.34|  3.904717| 106|
|HYUNDAI       |SANTA FE          |         2|   61.59630| 4.3000000| 22150.04|  4.066667|  27|
|Hyundai       |Accent            |         2|   60.31239| 2.7982301| 32131.84|  3.645133| 113|
|Hyundai       |Santa Fe          |         2|   60.24667| 2.9500000| 34383.43|  4.243333|  30|
|Ford          |Edge              |         2|   60.19865| 4.3407407| 29632.14|  4.168350| 297|
|BMW           |3 Series          |         2|   59.97794| 3.5441176| 21352.10|  4.235294|  68|
|Hyundai       |Veloster          |         2|   59.82917| 3.4250000| 19199.08|  3.962500|  24|
|Ford          |Fiesta            |         2|   59.62087| 4.0404348| 28080.17|  3.996957| 230|
|Chevrolet     |Cruze             |         2|   59.39677| 4.3705069| 32445.41|  3.726728| 434|
|Lincoln       |MKZ               |         2|   59.17288| 3.6135593| 27934.53|  3.993220|  59|
|Audi          |A4                |         2|   59.09655| 3.1000000| 23538.52|  4.113793|  29|
|Chevrolet     |Suburban          |         2|   58.27467| 3.4400000| 45680.87|  3.742667|  75|
|GMC           |Terrain           |         2|   58.20926| 2.9629630| 29662.24|  4.175926|  54|
|Toyota        |Camry Hybrid      |         2|   58.01724| 2.4551724| 52920.24|  3.827586|  29|
|Toyota        |Prius             |         2|   57.60000| 0.0000000| 39000.53|  3.828889|  45|
|Kia           |Rio               |         2|   57.48706| 3.3482353| 24315.29|  3.796471|  85|
|Ford          |Fusion            |         2|   56.95585| 4.7000000| 29392.94|  4.018729| 299|
|Kia           |Forte             |         2|   56.76382| 4.5197368| 25516.53|  3.965790| 152|
|Kia           |Soul              |         2|   56.64366| 4.4415493| 26490.04|  4.057042| 142|
|Nissan        |Xterra            |         2|   56.34545| 2.6060606| 29998.09|  4.275758|  33|
|Hyundai       |Elantra           |         2|   56.19311| 3.2101639| 32625.45|  3.751148| 305|
|Jeep          |Grand Cherokee    |         2|   56.14301| 4.6709677| 28248.74|  4.338710|  93|
|FIAT          |500L              |         2|   55.99667| 3.7000000| 15683.33|  3.946667|  30|
|Nissan        |Pathfinder        |         2|   55.98182| 4.9000000| 38281.15|  3.878788|  33|
|GMC           |SAVANA            |         2|   55.83457| 2.8000000|  9982.42|  4.019753|  81|
|Dodge         |Dart              |         2|   55.76623| 3.0870130| 23011.52|  3.833766|  77|
|Kia           |Sorento           |         2|   54.32987| 4.2000000| 37149.03|  4.176623|  77|
|Toyota        |Avalon            |         2|   53.99524| 2.7714286| 28262.24|  3.676191|  21|
|Chrysler      |300               |         2|   53.95328| 2.2934426| 20879.19|  4.071312| 122|
|Toyota        |Sienna            |         2|   52.97969| 4.9000000| 50011.92|  3.328125|  64|
|Dodge         |Charger           |         2|   52.67958| 3.9783784| 25396.25|  3.830931| 333|
|Mazda         |Mazda5            |         2|   52.32812| 3.7000000| 42385.16|  3.881250|  32|
|Lincoln       |Navigator         |         2|   52.25294| 0.0000000| 32941.73|  4.147059|  51|
|FIAT          |500               |         2|   51.46765| 4.7735294| 22999.85|  4.108823|  34|
|Chevrolet     |Sonic             |         2|   51.28548| 4.5612903| 30371.71|  3.675000| 124|
|Volkswagen    |Jetta             |         2|   51.22667| 4.8391667| 32209.28|  3.727500| 120|
|Kia           |Optima            |         2|   50.68600| 4.1000000| 36728.58|  3.728000| 100|
|Chevrolet     |Traverse          |         2|   49.00820| 4.8000000| 41554.28|  3.914754|  61|
|Chevrolet     |Equinox           |         2|   48.76236| 2.9528090| 47249.62|  3.961236| 178|
|GMC           |Yukon XL          |         2|   48.65833| 3.1750000| 51680.88|  3.704167|  24|
|Buick         |LaCrosse          |         2|   48.16296| 2.7444444| 37040.11|  3.759259|  27|
|Nissan        |Altima            |         2|   47.72500| 4.1000000| 35259.59|  3.787393| 468|
|Volkswagen    |Passat            |         2|   47.06605| 3.0715596| 39397.33|  3.747706| 109|
|FORD          |E-350             |         2|   47.00460| 1.5275862| 28985.15|  4.006897|  87|
|Volkswagen    |Beetle            |         2|   46.64167| 3.1666667| 25872.67|  3.904167|  24|
|CHEVROLET     |IMPALA            |         2|   45.22981| 4.1000000| 20998.53|  3.725961| 104|
|Chevrolet     |Camaro            |         2|   44.90741| 4.6000000| 34273.46|  3.477778|  54|
|Lincoln       |MKS               |         2|   44.73000| 4.5000000| 28769.10|  4.015000|  20|
|Nissan        |Maxima            |         2|   43.70693| 0.0000000| 40495.78|  3.629703| 101|
|Buick         |Regal             |         2|   43.63333| 2.2428571| 30861.29|  3.838095|  21|
|Chevrolet     |Malibu            |         2|   43.43846| 3.0000000| 41455.15|  3.652308| 130|
|Hyundai       |Sonata            |         2|   43.27382| 3.2000000| 36382.62|  3.816754| 191|
|Chevrolet     |Impala            |         2|   41.39091| 1.2263158| 41388.05|  3.441627| 211|
|INFINITI      |G37               |         2|   40.44035| 3.2000000| 21670.57|  4.022807| 342|
|Ford          |C-Max Energi      |         2|   33.81351| 0.0000000| 54144.97|  3.905405|  37|
|Jeep          |Wrangler          |         3|   71.61875| 3.7750000| 35712.94|  4.085417|  48|
|INFINITI      |JX35              |         3|   65.75714| 0.0000000| 38125.10|  3.980952|  21|
|Nissan        |Sentra            |         3|   64.52836| 4.9000000| 25291.78|  3.770149|  67|
|Toyota        |RAV4              |         3|   63.29111| 4.9000000| 43456.78|  3.857778|  45|
|Honda         |CR-V              |         3|   60.71282| 4.5358974| 36615.13|  4.100000|  39|
|Ford          |Explorer          |         3|   60.43046| 4.8072848| 43170.23|  3.885431| 151|
|Mercedes-Benz |GLK-Class         |         3|   58.55172| 1.8758621| 35145.90|  3.989655|  29|
|Toyota        |Sienna            |         3|   56.21071| 4.8571429| 47409.21|  3.385714|  28|
|Hyundai       |Accent            |         3|   55.46000| 2.6250000| 36618.82|  3.385000|  40|
|Hyundai       |Santa Fe          |         3|   55.39130| 3.4521739| 37180.09|  4.026087|  23|
|Chevrolet     |Tahoe             |         3|   55.35128| 4.1153846| 48663.49|  4.015385|  39|
|Lincoln       |MKX               |         3|   54.81290| 0.0000000| 29539.19|  3.925807|  31|
|Ford          |Edge              |         3|   54.52848| 4.6000000| 36691.99|  4.040000| 165|
|Honda         |Pilot             |         3|   54.51569| 4.8000000| 37825.63|  3.729412|  51|
|Honda         |Civic             |         3|   54.50256| 4.4807692| 32930.03|  3.392949| 156|
|Hyundai       |Elantra           |         3|   54.36608| 3.1699647| 30497.44|  3.709541| 283|
|Nissan        |Pathfinder        |         3|   52.74167| 4.9000000| 43492.21|  3.433333|  24|
|Dodge         |Dart              |         3|   52.73830| 2.8000000| 34579.32|  3.455319|  47|
|LEXUS         |ES                |         3|   51.63871| 0.0000000| 27737.00|  3.664516|  31|
|GMC           |Acadia            |         3|   50.96364| 4.5000000| 47060.68|  3.940909|  22|
|Dodge         |Durango           |         3|   50.93488| 3.8930233| 42862.72|  3.904651|  43|
|Mazda         |Mazda3            |         3|   49.81905| 4.3000000| 34426.67|  3.547619|  21|
|Kia           |Optima            |         3|   49.03590| 4.1000000| 37632.88|  3.642308|  78|
|Nissan        |Versa             |         3|   48.90952| 3.8000000| 47725.57|  3.247619|  21|
|Chevrolet     |Cruze             |         3|   48.42838| 4.4891892| 49285.97|  3.312162| 148|
|Ford          |Fiesta            |         3|   48.33125| 4.1500000| 43942.94|  3.900000|  48|
|Audi          |A4                |         3|   48.30625| 2.8500000| 31768.03|  3.518750|  32|
|Jeep          |Grand Cherokee    |         3|   48.27143| 4.0714286| 38348.69|  4.261905|  42|
|BMW           |3 Series          |         3|   47.76333| 3.9633333| 31383.45|  4.036667|  60|
|Acura         |TSX               |         3|   47.29722| 3.1972222| 33591.17|  3.686111|  36|
|Toyota        |Prius             |         3|   46.66500| 0.0000000| 48570.85|  3.675000|  20|
|Kia           |Soul              |         3|   46.65000| 4.3640000| 39954.70|  3.710000|  50|
|Chrysler      |300               |         3|   46.54146| 2.4268293| 40909.68|  3.963415|  41|
|Kia           |Sorento           |         3|   46.07667| 4.4166667| 42597.77|  4.010000|  30|
|Chevrolet     |Camaro            |         3|   45.67059| 4.4823529| 36956.50|  3.438235|  34|
|Volkswagen    |Jetta             |         3|   45.37532| 4.4883117| 38001.36|  3.358442|  77|
|Lincoln       |MKZ               |         3|   45.22394| 2.4661972| 28236.54|  3.898592|  71|
|Honda         |Accord            |         3|   44.64216| 3.4686486| 33459.25|  3.463243| 185|
|Dodge         |Charger           |         3|   44.60811| 3.7945946| 44637.21|  3.473874| 111|
|Nissan        |Altima            |         3|   44.50635| 4.1000000| 39511.74|  3.561905| 252|
|Cadillac      |SRX               |         3|   44.38500| 3.3000000| 43314.55|  4.205000|  20|
|Kia           |Forte             |         3|   43.51000| 4.6000000| 48393.67|  3.600000|  30|
|Hyundai       |Sonata            |         3|   43.47830| 3.2000000| 32270.52|  3.860377| 106|
|Acura         |TL                |         3|   43.32791| 3.1976744| 35469.05|  3.548837|  43|
|Chevrolet     |Equinox           |         3|   43.06573| 2.7825175| 55362.20|  3.783916| 143|
|Ford          |Focus             |         3|   42.85327| 3.4000000| 46549.72|  3.690654| 107|
|Mazda         |Mazda5            |         3|   42.20000| 3.6500000| 50315.96|  3.291667|  24|
|Chevrolet     |Malibu            |         3|   41.72375| 3.0000000| 46994.26|  3.345000|  80|
|Chevrolet     |Sonic             |         3|   41.24321| 4.7962963| 46250.40|  3.301235|  81|
|Nissan        |Maxima            |         3|   40.47798| 0.0000000| 44073.43|  3.494495| 109|
|Volkswagen    |Passat            |         3|   39.28133| 3.2573333| 44069.43|  3.406667|  75|
|INFINITI      |G37               |         3|   39.24766| 3.1437500| 24226.81|  3.978906| 256|
|Jeep          |Liberty           |         3|   39.12692| 3.6000000| 52681.38|  3.861538|  26|
|Chevrolet     |Traverse          |         3|   36.26923| 4.3615385| 55017.27|  3.776923|  26|
|Chevrolet     |Impala            |         3|   30.75497| 0.2251462| 55432.18|  3.361988| 171|
|Chevrolet     |Volt              |         3|   30.53810| 0.0000000| 54729.71|  3.857143|  21|
|Jeep          |Wrangler          |         4|   75.20741| 2.9888889| 42801.36|  4.300000|  81|
|Subaru        |Impreza           |         4|   67.69259| 3.6962963| 39555.85|  3.507407|  27|
|Toyota        |Venza             |         4|   61.15909| 2.0000000| 39317.82|  4.054545|  22|
|Ford          |Explorer          |         4|   58.60196| 4.5000000| 50376.91|  3.871569| 102|
|Lexus         |CT 200h           |         4|   57.79583| 0.0000000| 35932.88|  3.712500|  24|
|Audi          |Q5                |         4|   57.32059| 3.9000000| 51596.26|  4.023529|  34|
|Honda         |CR-V              |         4|   56.23788| 4.5431818| 40221.67|  3.818182| 132|
|Mercedes-Benz |GLK-Class         |         4|   55.13077| 2.2000000| 37288.62|  3.896154|  26|
|Subaru        |Forester          |         4|   54.71481| 3.8444444| 48844.67|  3.725926|  27|
|MINI          |Cooper Countryman |         4|   53.85217| 3.8347826| 44235.48|  4.069565|  23|
|Toyota        |RAV4              |         4|   53.72235| 3.3000000| 36436.60|  3.804706|  85|
|Ford          |Expedition        |         4|   53.00937| 2.8000000| 60793.34|  3.925000|  32|
|Toyota        |Sienna            |         4|   52.66667| 4.8000000| 53710.91|  3.372727|  33|
|Honda         |Civic             |         4|   52.57509| 4.4762264| 37476.50|  3.323774| 265|
|Scion         |tC                |         4|   52.50000| 2.5100000| 46759.65|  3.345000|  20|
|Hyundai       |Elantra           |         4|   52.22784| 3.4350515| 40491.87|  3.551546|  97|
|Toyota        |Prius             |         4|   51.12903| 0.0000000| 45233.87|  3.717742|  62|
|Ford          |Edge              |         4|   50.97045| 3.4761364| 43969.42|  3.969886| 176|
|INFINITI      |QX56              |         4|   50.61034| 0.0000000| 58183.31|  3.817241|  29|
|Jeep          |Compass           |         4|   50.58182| 2.7181818| 52089.86|  3.672727|  22|
|Toyota        |Yaris             |         4|   50.53226| 4.8000000| 50187.90|  3.235484|  31|
|Dodge         |Durango           |         4|   50.33409| 3.7113636| 52097.09|  3.777273|  44|
|Honda         |Pilot             |         4|   50.17882| 4.8000000| 47525.02|  3.737647|  85|
|Chevrolet     |Camaro            |         4|   49.75714| 3.9500000| 41388.69|  3.828571|  70|
|Chevrolet     |Tahoe             |         4|   49.31053| 4.0657895| 68148.87|  3.665789|  38|
|Jeep          |Grand Cherokee    |         4|   49.24959| 4.0024390| 47589.85|  4.069919| 123|
|LEXUS         |ES                |         4|   48.72388| 0.0000000| 33727.04|  3.425373|  67|
|Lincoln       |MKX               |         4|   48.10857| 0.0000000| 44200.06|  3.877143|  35|
|Subaru        |Outback           |         4|   48.09375| 3.7468750| 63853.06|  3.696875|  32|
|Ford          |Fiesta            |         4|   48.03623| 4.4405797| 48355.91|  3.775362|  69|
|Mazda         |Mazda3            |         4|   47.47419| 4.3000000| 44949.55|  3.351613|  62|
|GMC           |Acadia            |         4|   47.40779| 4.5441558| 48458.53|  3.864935|  77|
|Kia           |Optima            |         4|   47.14811| 4.1000000| 38191.49|  3.704717| 106|
|Hyundai       |Santa Fe          |         4|   46.72692| 4.3000000| 54209.92|  3.776923|  26|
|Acura         |TSX               |         4|   46.55806| 3.2338710| 35527.16|  3.601613|  62|
|Kia           |Soul              |         4|   46.52222| 3.7698413| 47183.29|  3.549206|  63|
|Chevrolet     |Cruze             |         4|   46.50952| 4.2000000| 50090.81|  3.493651| 126|
|GMC           |Terrain           |         4|   46.19615| 2.4673077| 59465.44|  4.036538|  52|
|Dodge         |Charger           |         4|   45.67246| 3.5434783| 57199.70|  3.581159|  69|
|Cadillac      |SRX               |         4|   45.38431| 3.6764706| 39831.75|  3.803922|  51|
|Hyundai       |Accent            |         4|   45.34324| 3.0621622| 52956.81|  3.121622|  37|
|Audi          |A4                |         4|   44.86735| 2.5204082| 38563.92|  3.542857|  49|
|BMW           |3 Series          |         4|   44.27928| 3.4432432| 36599.33|  3.978679| 333|
|Jeep          |Patriot           |         4|   44.02500| 3.0250000| 58410.16|  3.381250|  32|
|Nissan        |Versa             |         4|   43.88049| 3.8878049| 55532.85|  3.086585|  82|
|Nissan        |Maxima            |         4|   43.59048| 0.0000000| 38026.68|  3.641604| 399|
|Buick         |Enclave           |         4|   43.54242| 3.6969697| 56386.21|  3.875758|  33|
|Ford          |Focus             |         4|   43.50830| 3.4000000| 45173.65|  3.722707| 229|
|Acura         |TL                |         4|   43.22727| 3.2136364| 36481.30|  3.693939|  66|
|BMW           |5 Series          |         4|   42.88000| 4.2000000| 48081.27|  3.886667|  30|
|Chrysler      |300               |         4|   42.42273| 3.1000000| 49107.95|  3.527273|  22|
|Honda         |Accord            |         4|   42.41246| 3.4483283| 38273.13|  3.450456| 329|
|Volkswagen    |Jetta             |         4|   42.33014| 4.2506849| 51149.84|  3.064384| 146|
|Kia           |Rio               |         4|   42.14500| 3.1300000| 52014.50|  3.360000|  20|
|Chevrolet     |Equinox           |         4|   40.78000| 2.6700000| 63188.88|  3.845625| 160|
|Cadillac      |CTS               |         4|   40.33673| 2.5346939| 38945.00|  3.934694|  49|
|Buick         |LaCrosse          |         4|   40.19730| 3.1729730| 52565.89|  3.881081|  37|
|BMW           |6 Series          |         4|   40.18837| 1.6000000| 32316.05|  4.048837|  43|
|Jeep          |Liberty           |         4|   39.92308| 3.4153846| 57545.36|  3.898718|  78|
|Kia           |Forte             |         4|   39.80517| 4.5120690| 53322.69|  3.232759|  58|
|Kia           |Sorento           |         4|   39.73827| 3.6876543| 63941.74|  3.790123|  81|
|INFINITI      |G37               |         4|   37.71002| 3.0000000| 35357.35|  3.795918| 539|
|Volkswagen    |Passat            |         4|   37.59808| 3.3000000| 39239.24|  3.173077| 104|
|Chevrolet     |Sonic             |         4|   36.81333| 4.2000000| 52144.40|  3.526667|  30|
|Lincoln       |MKZ               |         4|   36.78070| 1.7210526| 30884.04|  3.722807|  57|
|Chevrolet     |Traverse          |         4|   36.68434| 4.2000000| 56128.76|  3.781928|  83|
|Lincoln       |MKS               |         4|   35.97826| 3.3652174| 48471.30|  3.721739|  23|
|Hyundai       |Sonata            |         4|   35.90896| 4.7000000| 67052.13|  3.322388|  67|
|Nissan        |Altima            |         4|   35.44706| 4.9000000| 57169.67|  3.390588|  85|
|Buick         |Regal             |         4|   35.15000| 2.2681818| 40792.77|  3.490909|  22|
|Chevrolet     |Aveo              |         4|   29.87000| 3.5000000| 71829.75|  2.920000|  20|
|Chevrolet     |Volt              |         4|   29.78148| 0.0000000| 52625.59|  3.785185|  27|
|Chevrolet     |Impala            |         4|   27.49082| 1.6990338| 71288.84|  3.237198| 207|
|Chevrolet     |HHR               |         4|   26.44483| 3.3000000| 83317.52|  3.172414|  29|
|NISSAN        |LEAF              |         4|   26.13492| 2.0000000| 20135.89|  4.234921|  63|
|Jeep          |Wrangler          |         5|   78.32105| 2.9947368| 55195.53|  4.057895|  38|
|Chevrolet     |Camaro            |         5|   52.46944| 4.0666667| 51603.28|  3.494444|  36|
|Chevrolet     |Tahoe             |         5|   51.20909| 3.9318182| 81026.32|  3.740909|  22|
|Honda         |CR-V              |         5|   50.10714| 4.6000000| 62472.29|  3.685714|  28|
|Toyota        |RAV4              |         5|   50.08667| 3.2333333| 49581.80|  3.553333|  30|
|Honda         |Civic             |         5|   45.74762| 3.6761905| 67440.90|  2.952381|  21|
|Jeep          |Grand Cherokee    |         5|   43.58000| 3.9100000| 71344.35|  3.775000|  20|
|Nissan        |Maxima            |         5|   41.72857| 0.3857143| 45851.00|  3.307936|  63|
|BMW           |3 Series          |         5|   39.89468| 2.8148936| 46153.87|  3.747872|  94|
|Chevrolet     |Equinox           |         5|   39.25789| 2.8657895| 80041.71|  3.647368|  38|
|INFINITI      |G37               |         5|   39.16471| 3.0191176| 37705.79|  3.848529|  68|
|Kia           |Soul              |         5|   39.04412| 3.3000000| 80797.09|  3.264706|  34|
|Honda         |Accord            |         5|   37.88077| 2.8615385| 70439.73|  3.084615|  26|
|Volkswagen    |Jetta             |         5|   37.68667| 3.5800000| 68689.33|  2.860000|  30|
|Ford          |Edge              |         5|   36.96667| 0.9800000| 71057.47|  3.653333|  30|
|Kia           |Sorento           |         5|   36.25862| 2.7000000| 88477.97|  3.703448|  29|
|BMW           |5 Series          |         5|   35.42000| 4.1800000| 58319.00|  3.495000|  20|
|Chevrolet     |Traverse          |         5|   34.70000| 4.2000000| 70663.95|  3.463636|  22|
|Hyundai       |Sonata            |         5|   34.47755| 4.6510204| 76378.59|  3.234694|  49|
|Dodge         |Charger           |         5|   33.86207| 2.7344828| 78959.48|  2.924138|  29|
|Nissan        |Altima            |         5|   33.25417| 4.8500000| 86557.52|  3.175000|  48|
|Chevrolet     |HHR               |         5|   29.40500| 3.3000000| 77872.90|  3.275000|  20|
|Chevrolet     |Impala            |         5|   25.73364| 2.0439252| 85397.18|  3.039252| 107|

While I won’t be making any formal recommendations in this paper, the above table provides a guide on which vehicles can be expected to retain good value in the future. Please pay close attention to the error rating and the sample size.

Reflection

This project, though very time-consuming, has truly been worth every moment spent working on it. Possessing a solid foundation in MS Excel (my daily tool for analysis at work), it took some getting used to, and at times felt counterproductive trying to accomplish the same things in R. Although I’m new to programming, I intentionally chose to do as challenging a project as I possibly could with the knowledge I’ve learned in this course. The data that I used is almost identical to the type of data I analyze at work and is an area of great interest for me- the used car market. Despite the intrigue, I was a bit overwhelmed with the amount of tidying that was needed to complete the analysis. In truth, more time was spent on cleaning the data than anything else.

An issue I had to contend with was that Canadian vehicle trims do not always have the same naming conventions as US trims. For example, a 2021 VW Jetta Highline in Canada may be listed as a 2021 VW Jetta SE/ SEL in the US. This, however, simplified the joining of the datasets because I was better able to match the YMM in both the auction data (carprices.csv) and the catalog now in USD (us_catalog).

However, after joining the catalog I had 50,342 unmatched records that would be later passed through an iteration map function using the vindecoder() package. Subsequently, I still had to contend with 20,826 (out of 50,342 records) missing unmatched keys/MSRP values. Observing the distribution of vehicles in unmatched_keys (see appendix E), I was able to identify which models had a low map rate in the VIN decoder. After about 12 lines of code, I was able to clean an additional 6,860 records, leaving the total unmatched at 13,965.

Instead of deleting them entirely, I saved them in a CSV file named ‘unmatched_keys.csv’. As I progress through the DACSS program, I aim to practice more advanced ways to handle these situations. If time permits, my strategy to handle these untidy fields and unmatched_keys would be to use functions str_detect() and advanced mapping functions to clean the data as best as possible.

Despite 13,965 out of 457,588 records (3%) of records not being matched, I do not believe that it will have a significant impact on the results. The unmatched records were due to the unavailability in the Canadian market of some US models as well as naming inconsistencies across YMM fields. Due to the fact I am mainly conducting my analysis on high-volume vehicle segments and models, this will only have a marginal to negligible effect on the quality of results.

There are far more insights that can be drawn from this data. For example, comparing the average yearly mileage by state, comparing resale values for vehicles in colder states vs warmer states, as well as the popularity and distribution of various makes across the US. Those are just a few of the burning questions that I have not been able to address in this project. What is certain is that I will be continuing to work on this data with the hope to manipulate similar vehicle data for the Canadian wholesale marketplace for more work-related reasons.

Conclusion

The data analyzed was able to substantially answer the research question. Indeed factors like body type, vehicle condition, mileage, and age affect the retained value of vehicles. Excluding age for trivial reasons, overall, the condition had the largest impact (positive) followed by mileage (negative) of all the variables on RV. However, some segments were more sensitive to this effect than others. For example, convertibles were slightly more sensitive to mileage than any other body type; a stark contrast when compared to pickups and SUVs.

Bearing in mind the key assumptions made when converting the MSRP from CAD to USD, I was quite surprised at how accurate the results were compared to actual public residual value data. It seems as though the pricing for Canadian trims coincides greatly with their US counterparts. Another observation was that the most auctioned/sold vehicles do not necessarily retain the best value. Despite, the higher error rating among various segments, I believe the estimation of RV% (based on dividing the auction price by mean MSRP), provides a solid understanding, once analyzed on aggregate and in context.

The data suggests that except for pickups, SUVs- subcompact, compact and midsize, hold better value than standard passenger car body types. There seems to also be a resistance to price as vehicles with higher MSRPs tend to retain less value over time. This was evident when looking at graphs 4.1 and 4.2, where a clear hierarchy among body types and segments is observed.

Bibliography

Data Sources:

Kaggle: Used Car Auction Prices | Kaggle

WSJ Markets: USD to CAD | Canadian Dollar Historical Prices - WSJ

Methodology:

DACSS Faculty, University of Massachusetts, Amherst

Grolemund, G., & Wickham, H. (2017). R for Data Science. O’Reilly Media.

The National Highway Traffic Safety Administration: VIN Decoder | NHTSA

Yan Holtz The R Graph Gallery

Supporting Tools & Publications:

Auto Trader Used Cars for Sale - Autotrader

Edmonds Top 10 Best-Selling Vehicles for 2013 on Edmunds.com

J.D. Power: 2018 Resale Awards and Ratings | J.D. Power

Kelley Blue Book Kelley Blue Book

VIN Decoder: Decode This

Appendix

Appendix A:

Original data set: car_prices.csv

Code
dfSummary(df)
Data Frame Summary  
df  
Dimensions: 514613 x 16  
Duplicates: 0  

-----------------------------------------------------------------------------------------------------------------------
No   Variable       Stats / Values                  Freqs (% of Valid)       Graph                 Valid      Missing  
---- -------------- ------------------------------- ------------------------ --------------------- ---------- ---------
1    year           1. 2012                         102315 (19.9%)           III                   514613     0        
     [character]    2. 2013                          98168 (19.1%)           III                   (100.0%)   (0.0%)   
                    3. 2014                          81070 (15.8%)           III                                       
                    4. 2011                          48548 ( 9.4%)           I                                         
                    5. 2008                          31502 ( 6.1%)           I                                         
                    6. 2007                          30845 ( 6.0%)           I                                         
                    7. 2006                          26913 ( 5.2%)           I                                         
                    8. 2010                          26485 ( 5.1%)           I                                         
                    9. 2005                          21394 ( 4.2%)                                                     
                    10. 2009                         20594 ( 4.0%)                                                     
                    [ 2 others ]                     26779 ( 5.2%)           I                                         

2    make           1. Ford                          87687 (17.3%)           III                   506671     7942     
     [character]    2. Chevrolet                     56022 (11.1%)           II                    (98.5%)    (1.5%)   
                    3. Nissan                        51594 (10.2%)           II                                        
                    4. Toyota                        36029 ( 7.1%)           I                                         
                    5. Dodge                         29544 ( 5.8%)           I                                         
                    6. Honda                         23220 ( 4.6%)                                                     
                    7. Hyundai                       21155 ( 4.2%)                                                     
                    8. BMW                           18983 ( 3.7%)                                                     
                    9. Kia                           17786 ( 3.5%)                                                     
                    10. Chrysler                     16613 ( 3.3%)                                                     
                    [ 76 others ]                   148038 (29.2%)           IIIII                                     

3    model          1. Altima                        18638 ( 3.7%)                                 506659     7954     
     [character]    2. F-150                         13584 ( 2.7%)                                 (98.5%)    (1.5%)   
                    3. Fusion                        12946 ( 2.6%)                                                     
                    4. Escape                        11578 ( 2.3%)                                                     
                    5. Camry                         11226 ( 2.2%)                                                     
                    6. Focus                          9999 ( 2.0%)                                                     
                    7. Grand Caravan                  7769 ( 1.5%)                                                     
                    8. Impala                         7505 ( 1.5%)                                                     
                    9. G Sedan                        7417 ( 1.5%)                                                     
                    10. 3 Series                      7394 ( 1.5%)                                                     
                    [ 782 others ]                  398603 (78.7%)           IIIIIIIIIIIIIII                           

4    trim           1. Base                          48584 ( 9.6%)           I                     506394     8219     
     [character]    2. SE                            41841 ( 8.3%)           I                     (98.4%)    (1.6%)   
                    3. LX                            18863 ( 3.7%)                                                     
                    4. Limited                       17396 ( 3.4%)                                                     
                    5. LT                            16636 ( 3.3%)                                                     
                    6. XLT                           15169 ( 3.0%)                                                     
                    7. LE                            11748 ( 2.3%)                                                     
                    8. S                             11647 ( 2.3%)                                                     
                    9. GLS                           11587 ( 2.3%)                                                     
                    10. LS                           11360 ( 2.2%)                                                     
                    [ 1524 others ]                 301563 (59.6%)           IIIIIIIIIII                               

5    body           1. Sedan                        183305 (36.3%)           IIIIIII               504678     9935     
     [character]    2. SUV                          110102 (21.8%)           IIII                  (98.1%)    (1.9%)   
                    3. sedan                         39476 ( 7.8%)           I                                         
                    4. suv                           23120 ( 4.6%)                                                     
                    5. Hatchback                     20382 ( 4.0%)                                                     
                    6. Minivan                       20051 ( 4.0%)                                                     
                    7. Crew Cab                      12699 ( 2.5%)                                                     
                    8. Wagon                         12646 ( 2.5%)                                                     
                    9. Coupe                         12572 ( 2.5%)                                                     
                    10. Convertible                   7372 ( 1.5%)                                                     
                    [ 74 others ]                    62953 (12.5%)           II                                        

6    transmission   1. automatic                    440307 (96.8%)           IIIIIIIIIIIIIIIIIII   454932     59681    
     [character]    2. manual                        14599 ( 3.2%)                                 (88.4%)    (11.6%)  
                    3. sedan                            15 ( 0.0%)                                                     
                    4. Sedan                            11 ( 0.0%)                                                     

7    vin            1. automatic                        22 (  0.0%)                                514609     4        
     [character]    2. wbanv13588cz57827                 5 (  0.0%)                                (100.0%)   (0.0%)   
                    3. 1ftfw1cv5afb30053                 4 (  0.0%)                                                    
                    4. 5n1ar1nn2bc632869                 4 (  0.0%)                                                    
                    5. 5uxfe43579l274932                 4 (  0.0%)                                                    
                    6. trusc28n241022003                 4 (  0.0%)                                                    
                    7. wddgf56x78f009940                 4 (  0.0%)                                                    
                    8. 1c3bcbeb9dn526808                 3 (  0.0%)                                                    
                    9. 1c4pjlak8cw134088                 3 (  0.0%)                                                    
                    10. 1c4pjlak9cw151871                3 (  0.0%)                                                    
                    [ 506479 others ]               514553 (100.0%)          IIIIIIIIIIIIIIIIIII                       

8    state          1. fl                            76001 (14.8%)           II                    514613     0        
     [character]    2. ca                            66121 (12.8%)           II                    (100.0%)   (0.0%)   
                    3. pa                            51776 (10.1%)           II                                        
                    4. tx                            43185 ( 8.4%)           I                                         
                    5. ga                            31731 ( 6.2%)           I                                         
                    6. nj                            25933 ( 5.0%)           I                                         
                    7. il                            22648 ( 4.4%)                                                     
                    8. tn                            20312 ( 3.9%)                                                     
                    9. oh                            20028 ( 3.9%)                                                     
                    10. nc                           18478 ( 3.6%)                                                     
                    [ 54 others ]                   138400 (26.9%)           IIIII                                     

9    condition      Mean (sd) : 3.5 (0.9)           41 distinct values                   : .       506371     8242     
     [numeric]      min < med < max:                                                     : : :     (98.4%)    (1.6%)   
                    1 < 3.7 < 5                                                  :   : . : : : .                       
                    IQR (CV) : 1.4 (0.3)                                         : : : : : : : :                       
                                                                             .   : : : : : : : :                       

10   odometer       Mean (sd) : 61323 (46816.3)     154189 distinct values   :                     514573     40       
     [numeric]      min < med < max:                                         :                     (100.0%)   (0.0%)   
                    1 < 47172 < 999999                                       :                                         
                    IQR (CV) : 61684 (0.8)                                   :                                         
                                                                             : :                                       

11   color          1. black                        104598 (20.4%)           IIII                  513921     692      
     [character]    2. white                         99846 (19.4%)           III                   (99.9%)    (0.1%)   
                    3. gray                          78945 (15.4%)           III                                       
                    4. silver                        75627 (14.7%)           II                                        
                    5. blue                          46293 ( 9.0%)           I                                         
                    6. red                           39932 ( 7.8%)           I                                         
                    7. —                           24323 ( 4.7%)                                                     
                    8. gold                           8547 ( 1.7%)                                                     
                    9. burgundy                       7947 ( 1.5%)                                                     
                    10. green                         7723 ( 1.5%)                                                     
                    [ 36 others ]                    20140 ( 3.9%)                                                     

12   interior       1. black                        237563 (46.2%)           IIIIIIIII             513921     692      
     [character]    2. gray                         159295 (31.0%)           IIIIII                (99.9%)    (0.1%)   
                    3. beige                         53104 (10.3%)           II                                        
                    4. tan                           37782 ( 7.4%)           I                                         
                    5. —                           13575 ( 2.6%)                                                     
                    6. brown                          7925 ( 1.5%)                                                     
                    7. red                            1256 ( 0.2%)                                                     
                    8. silver                          972 ( 0.2%)                                                     
                    9. blue                            648 ( 0.1%)                                                     
                    10. off-white                      478 ( 0.1%)                                                     
                    [ 7 others ]                      1323 ( 0.3%)                                                     

13   seller         1. nissan-infiniti lt            19693 ( 3.8%)                                 514613     0        
     [character]    2. ford motor credit company     19160 ( 3.7%)                                 (100.0%)   (0.0%)   
                    3. the hertz corporation         18236 ( 3.5%)                                                     
                    4. santander consumer            15078 ( 2.9%)                                                     
                    5. avis corporation              12540 ( 2.4%)                                                     
                    6. nissan infiniti lt             9962 ( 1.9%)                                                     
                    7. wells fargo dealer servic      8457 ( 1.6%)                                                     
                    8. tdaf remarketing               6897 ( 1.3%)                                                     
                    9. enterprise veh exchange/r      6810 ( 1.3%)                                                     
                    10. hyundai motor finance         6660 ( 1.3%)                                                     
                    [ 13125 others ]                391120 (76.0%)           IIIIIIIIIIIIIII                           

14   mmr            Mean (sd) : 14740.9 (9452.6)    1101 distinct values     :                     514587     26       
     [numeric]      min < med < max:                                         :                     (100.0%)   (0.0%)   
                    25 < 12900 < 182000                                      :                                         
                    IQR (CV) : 10375 (0.6)                                   : .                                       
                                                                             : :                                       

15   sellingprice   Mean (sd) : 14573.8 (9537.2)    1869 distinct values     :                     514613     0        
     [numeric]      min < med < max:                                         :                     (100.0%)   (0.0%)   
                    1 < 12900 < 230000                                       :                                         
                    IQR (CV) : 10500 (0.7)                                   :                                         
                                                                             : :                                       

16   saledate       1. Tue Feb 10 2015 01:30:00       4984 ( 1.0%)                                 514613     0        
     [character]    2. Tue Feb 17 2015 01:30:00       4760 ( 0.9%)                                 (100.0%)   (0.0%)   
                    3. Tue Jan 27 2015 01:30:00       4729 ( 0.9%)                                                     
                    4. Tue Jan 20 2015 01:30:00       4464 ( 0.9%)                                                     
                    5. Tue Mar 03 2015 01:30:00       4409 ( 0.9%)                                                     
                    6. Tue Feb 03 2015 01:30:00       4301 ( 0.8%)                                                     
                    7. Tue Jun 02 2015 02:30:00       4114 ( 0.8%)                                                     
                    8. Tue Jun 16 2015 02:30:00       3983 ( 0.8%)                                                     
                    9. Tue Mar 10 2015 02:30:00       3806 ( 0.7%)                                                     
                    10. Tue Feb 24 2015 01:30:00      3448 ( 0.7%)                                                     
                    [ 3619 others ]                 471615 (91.6%)           IIIIIIIIIIIIIIIIII                        
-----------------------------------------------------------------------------------------------------------------------

Appendix B:

Canadian vehicle catalog

Code
dfSummary(ca_catalog)
Data Frame Summary  
ca_catalog  
Dimensions: 5268 x 10  
Duplicates: 0  

------------------------------------------------------------------------------------------------------------------------
No   Variable          Stats / Values                  Freqs (% of Valid)     Graph                 Valid      Missing  
---- ----------------- ------------------------------- ---------------------- --------------------- ---------- ---------
1    Combined          1. 2002 Cadillac DeVille           2 ( 0.0%)                                 5268       0        
     [character]       2. 2002 Chevrolet Impala           2 ( 0.0%)                                 (100.0%)   (0.0%)   
                       3. 2002 Chevrolet Silverado        2 ( 0.0%)                                                     
                       4. 2002 Chevrolet Silverado        2 ( 0.0%)                                                     
                       5. 2002 Chrysler Intrepid          2 ( 0.0%)                                                     
                       6. 2002 Ford Excursion             2 ( 0.0%)                                                     
                       7. 2002 Ford Super Duty F-35       2 ( 0.0%)                                                     
                       8. 2002 Ford Super Duty F-35       2 ( 0.0%)                                                     
                       9. 2002 GMC Savana Passenger       2 ( 0.0%)                                                     
                       10. 2002 GMC Sierra 2500           2 ( 0.0%)                                                     
                       [ 5037 others ]                 5248 (99.6%)           IIIIIIIIIIIIIIIIIII                       

2    YearOfLaunch      Mean (sd) : 2007.6 (4.8)        17 distinct values           . .   .   . :   5268       0        
     [numeric]         min < med < max:                                         :   : :   :   : :   (100.0%)   (0.0%)   
                       1999 < 2008 < 2015                                     : :   : : . : . : :                       
                       IQR (CV) : 8 (0)                                       : : : : : : : : : :                       
                                                                              : : : : : : : : : :                       

3    Segment           1. Compact Car                  721 (16.8%)            III                   4295       973      
     [character]       2. Mid-Size Car                 638 (14.9%)            II                    (81.5%)    (18.5%)  
                       3. Mid-Size SUV/Crossover       501 (11.7%)            II                                        
                       4. Compact SUV/Crossover        372 ( 8.7%)            I                                         
                       5. Minivan                      354 ( 8.2%)            I                                         
                       6. Full-Size Car                338 ( 7.9%)            I                                         
                       7. Subcompact Car               324 ( 7.5%)            I                                         
                       8. Sports Car                   266 ( 6.2%)            I                                         
                       9. Full-Size SUV/Crossover      260 ( 6.1%)            I                                         
                       10. HD Full-Size Pickup         155 ( 3.6%)                                                      
                       [ 3 others ]                    366 ( 8.5%)            I                                         

4    Trim_Count        Mean (sd) : 16.2 (58.3)         172 distinct values    :                     5268       0        
     [numeric]         min < med < max:                                       :                     (100.0%)   (0.0%)   
                       1 < 5 < 2524                                           :                                         
                       IQR (CV) : 8 (3.6)                                     :                                         
                                                                              :                                         

5    AVG_MSRP_CAD      Mean (sd) : 46772.7 (27047)     3695 distinct values     :                   5268       0        
     [numeric]         min < med < max:                                         :                   (100.0%)   (0.0%)   
                       15 < 39895 < 251165                                      :                                       
                       IQR (CV) : 25285.8 (0.6)                                 : .                                     
                                                                              : : : .                                   

6    MSRP_Range_CAD    Mean (sd) : 13307.5 (16252.5)   554 distinct values    :                     5268       0        
     [numeric]         min < med < max:                                       :                     (100.0%)   (0.0%)   
                       0 < 9600 < 211900                                      :                                         
                       IQR (CV) : 12200 (1.2)                                 :                                         
                                                                              : :                                       

7    StdDev_MSRP_CAD   Mean (sd) : 5559.8 (6429.4)     3764 distinct values   :                     4674       594      
     [numeric]         min < med < max:                                       :                     (88.7%)    (11.3%)  
                       0 < 3988.3 < 63828                                     :                                         
                       IQR (CV) : 3664.6 (1.2)                                :                                         
                                                                              : :                                       

8    Error.MSRP_CAD    Mean (sd) : 2290.2 (3396.2)     3695 distinct values   :                     4674       594      
     [numeric]         min < med < max:                                       :                     (88.7%)    (11.3%)  
                       0 < 1396.9 < 36851.1                                   :                                         
                       IQR (CV) : 1443.9 (1.5)                                :                                         
                                                                              : .                                       

9    Date              1. 08/29/08                     345 (10.2%)            II                    3370       1898     
     [character]       2. 08/29/14                     354 (10.5%)            II                    (64.0%)    (36.0%)  
                       3. 08/30/13                     342 (10.1%)            II                                        
                       4. 08/31/06                     340 (10.1%)            II                                        
                       5. 08/31/07                     335 ( 9.9%)            I                                         
                       6. 08/31/09                     319 ( 9.5%)            I                                         
                       7. 08/31/10                     318 ( 9.4%)            I                                         
                       8. 08/31/11                     327 ( 9.7%)            I                                         
                       9. 08/31/12                     332 ( 9.9%)            I                                         
                       10. 08/31/15                    358 (10.6%)            II                                        

10   Close             Mean (sd) : 1.1 (0.1)           10 distinct values     I                     3370       1898     
     [numeric]         min < med < max:                                       I                     (64.0%)    (36.0%)  
                       1 < 1.1 < 1.3                                          II                                        
                       IQR (CV) : 0 (0.1)                                     I                                         
                                                                              II                                        
                                                                              I                                         
                                                                              II                                        
                                                                              I                                         
                                                                              II                                        
                                                                              II                                        
------------------------------------------------------------------------------------------------------------------------

Appendix C:

First iteration of unmatched records labelled as false_keys

Code
dfSummary(false_keys)
Data Frame Summary  
false_keys  
Dimensions: 57328 x 14  
Duplicates: 0  

----------------------------------------------------------------------------------------------------------------------
No   Variable       Stats / Values                  Freqs (% of Valid)      Graph                 Valid      Missing  
---- -------------- ------------------------------- ----------------------- --------------------- ---------- ---------
1    trim           1. Base                         13031 (24.7%)           IIII                  52679      4649     
     [character]    2. Touring                       4393 ( 8.3%)           I                     (91.9%)    (8.1%)   
                    3. G37x                          4172 ( 7.9%)           I                                         
                    4. G37 Journey                   2902 ( 5.5%)           I                                         
                    5. E-250                         1557 ( 3.0%)                                                     
                    6. 2500                          1097 ( 2.1%)                                                     
                    7. Lariat                        1064 ( 2.0%)                                                     
                    8. S                             1003 ( 1.9%)                                                     
                    9. SL                             993 ( 1.9%)                                                     
                    10. LT Fleet                      963 ( 1.8%)                                                     
                    [ 418 others ]                  21504 (40.8%)           IIIIIIII                                  

2    body           1. SEDAN                        11677 (22.5%)           IIII                  51866      5462     
     [character]    2. SUV                           8030 (15.5%)           III                   (90.5%)    (9.5%)   
                    3. G SEDAN                       7417 (14.3%)           II                                        
                    4. MINIVAN                       6020 (11.6%)           II                                        
                    5. HATCHBACK                     4448 ( 8.6%)           I                                         
                    6. VAN                           3291 ( 6.3%)           I                                         
                    7. CREW CAB                      2614 ( 5.0%)           I                                         
                    8. E-SERIES VAN                  1822 ( 3.5%)                                                     
                    9. G COUPE                       1593 ( 3.1%)                                                     
                    10. QUAD CAB                      974 ( 1.9%)                                                     
                    [ 25 others ]                    3980 ( 7.7%)           I                                         

3    Body_Type      1. Convertible                   1293 ( 2.5%)                                 51866      5462     
     [character]    2. Coupe                         2744 ( 5.3%)           I                     (90.5%)    (9.5%)   
                    3. Hatchback                     4448 ( 8.6%)           I                                         
                    4. Minivan                      11211 (21.6%)           IIII                                      
                    5. Pickup                        4294 ( 8.3%)           I                                         
                    6. Sedan                        19094 (36.8%)           IIIIIII                                   
                    7. SUV                           8030 (15.5%)           III                                       
                    8. Wagon                          752 ( 1.4%)                                                     

4    transmission   1. automatic                    48340 (96.6%)           IIIIIIIIIIIIIIIIIII   50066      7262     
     [character]    2. manual                        1726 ( 3.4%)                                 (87.3%)    (12.7%)  

5    vin            1. 1FBSS3BL7CDA01868                3 ( 0.0%)                                 57328      0        
     [character]    2. 1FT7W2BTXDEA03416                3 ( 0.0%)                                 (100.0%)   (0.0%)   
                    3. 1FT8W3DT1CEB32666                3 ( 0.0%)                                                     
                    4. 1FTSW21568EA33810                3 ( 0.0%)                                                     
                    5. 1FTSW21R48EB47680                3 ( 0.0%)                                                     
                    6. 1FTXW43R28EA88031                3 ( 0.0%)                                                     
                    7. 1GYEE637780184619                3 ( 0.0%)                                                     
                    8. 1ZVBP8JZ2E5263940                3 ( 0.0%)                                                     
                    9. 2C4RC1BG2DR821210                3 ( 0.0%)                                                     
                    10. 2FABP7BV5AX113795               3 ( 0.0%)                                                     
                    [ 56475 others ]                57298 (99.9%)           IIIIIIIIIIIIIIIIIII                       

6    state          1. FL                           10682 (18.6%)           III                   57328      0        
     [character]    2. CA                            8762 (15.3%)           III                   (100.0%)   (0.0%)   
                    3. PA                            4972 ( 8.7%)           I                                         
                    4. TX                            4545 ( 7.9%)           I                                         
                    5. GA                            3274 ( 5.7%)           I                                         
                    6. IL                            3051 ( 5.3%)           I                                         
                    7. TN                            2925 ( 5.1%)           I                                         
                    8. NJ                            2816 ( 4.9%)                                                     
                    9. OH                            1667 ( 2.9%)                                                     
                    10. MO                           1518 ( 2.6%)                                                     
                    [ 28 others ]                   13116 (22.9%)           IIII                                      

7    condition      Mean (sd) : 3.7 (0.9)           41 distinct values                  . : .     56763      565      
     [numeric]      min < med < max:                                                    : : :     (99.0%)    (1.0%)   
                    1 < 3.8 < 5                                                     .   : : : .                       
                    IQR (CV) : 1.3 (0.2)                                        : . : : : : : :                       
                                                                                : : : : : : : :                       

8    odometer       Mean (sd) : 51493.7 (44181.5)   43772 distinct values   :                     57325      3        
     [numeric]      min < med < max:                                        :                     (100.0%)   (0.0%)   
                    1 < 35757 < 999999                                      :                                         
                    IQR (CV) : 50951 (0.9)                                  :                                         
                                                                            : .                                       

9    color          1. white                        16292 (28.4%)           IIIII                 57267      61       
     [character]    2. black                        11733 (20.5%)           IIII                  (99.9%)    (0.1%)   
                    3. gray                          8879 (15.5%)           III                                       
                    4. silver                        6166 (10.8%)           II                                        
                    5. blue                          4576 ( 8.0%)           I                                         
                    6. red                           3244 ( 5.7%)           I                                         
                    7. —                           2798 ( 4.9%)                                                     
                    8. gold                           817 ( 1.4%)                                                     
                    9. burgundy                       656 ( 1.1%)                                                     
                    10. beige                         569 ( 1.0%)                                                     
                    [ 9 others ]                     1537 ( 2.7%)                                                     

10   interior       1. black                        27397 (47.8%)           IIIIIIIII             57267      61       
     [character]    2. gray                         17084 (29.8%)           IIIII                 (99.9%)    (0.1%)   
                    3. beige                         6468 (11.3%)           II                                        
                    4. tan                           3547 ( 6.2%)           I                                         
                    5. brown                         1155 ( 2.0%)                                                     
                    6. —                           1062 ( 1.9%)                                                     
                    7. red                            191 ( 0.3%)                                                     
                    8. silver                         116 ( 0.2%)                                                     
                    9. blue                            68 ( 0.1%)                                                     
                    10. white                          46 ( 0.1%)                                                     
                    [ 7 others ]                      133 ( 0.2%)                                                     

11   seller         1. nissan infiniti lt            9958 (17.4%)           III                   57328      0        
     [character]    2. lexus financial services      3627 ( 6.3%)           I                     (100.0%)   (0.0%)   
                    3. u-haul                        1439 ( 2.5%)                                                     
                    4. nissan-infiniti lt            1420 ( 2.5%)                                                     
                    5. gm remarketing                1241 ( 2.2%)                                                     
                    6. the hertz corporation         1053 ( 1.8%)                                                     
                    7. ge fleet services for its     1015 ( 1.8%)                                                     
                    8. enterprise veh exchange/r     1009 ( 1.8%)                                                     
                    9. santander consumer             874 ( 1.5%)                                                     
                    10. avis corporation              830 ( 1.4%)                                                     
                    [ 4601 others ]                 34862 (60.8%)           IIIIIIIIIIII                              

12   mmr            Mean (sd) : 19003.7 (10836)     1062 distinct values    : :                   57328      0        
     [numeric]      min < med < max:                                        : :                   (100.0%)   (0.0%)   
                    175 < 18600 < 182000                                    : :                                       
                    IQR (CV) : 10600 (0.6)                                  : :                                       
                                                                            : : .                                     

13   sellingprice   Mean (sd) : 18860 (10893.5)     1083 distinct values    : .                   57328      0        
     [numeric]      min < med < max:                                        : :                   (100.0%)   (0.0%)   
                    300 < 18400 < 183000                                    : :                                       
                    IQR (CV) : 11100 (0.6)                                  : :                                       
                                                                            : : .                                     

14   saledate       1. Tue Jan 27 2015 01:30:00       633 ( 1.1%)                                 57328      0        
     [character]    2. Tue Feb 17 2015 01:30:00       628 ( 1.1%)                                 (100.0%)   (0.0%)   
                    3. Tue Feb 03 2015 01:30:00       604 ( 1.1%)                                                     
                    4. Tue Mar 03 2015 01:30:00       577 ( 1.0%)                                                     
                    5. Tue Jan 20 2015 01:30:00       570 ( 1.0%)                                                     
                    6. Tue Jun 16 2015 02:30:00       562 ( 1.0%)                                                     
                    7. Tue Feb 10 2015 01:30:00       550 ( 1.0%)                                                     
                    8. Tue Jun 02 2015 02:30:00       476 ( 0.8%)                                                     
                    9. Thu Jan 15 2015 04:30:00       452 ( 0.8%)                                                     
                    10. Tue Jan 06 2015 01:30:00      436 ( 0.8%)                                                     
                    [ 2131 others ]                 51840 (90.4%)           IIIIIIIIIIIIIIIIII                        
----------------------------------------------------------------------------------------------------------------------

This dataset contains all the YMM keys from carprices that failed to map to the us_catalog.

Appendix D:

All records cleaned through NHTSA’s vindecodr package lablled as cleaned_keys

Code
dfSummary(cleaned_keys)
Data Frame Summary  
cleaned_keys  
Dimensions: 38208 x 26  
Duplicates: 1090  

----------------------------------------------------------------------------------------------------------------------
No   Variable       Stats / Values                  Freqs (% of Valid)      Graph                 Valid      Missing  
---- -------------- ------------------------------- ----------------------- --------------------- ---------- ---------
1    year           Mean (sd) : 2011.5 (2.2)        2007 : 3684 ( 9.6%)     I                     38208      0        
     [numeric]      min < med < max:                2008 : 2518 ( 6.6%)     I                     (100.0%)   (0.0%)   
                    2007 < 2012 < 2015              2009 : 1194 ( 3.1%)                                               
                    IQR (CV) : 2 (0)                2010 : 1856 ( 4.9%)                                               
                                                    2011 : 4979 (13.0%)     II                                        
                                                    2012 : 8787 (23.0%)     IIII                                      
                                                    2013 : 8609 (22.5%)     IIII                                      
                                                    2014 : 6340 (16.6%)     III                                       
                                                    2015 :  241 ( 0.6%)                                               

2    make           1. INFINITI                     13977 (36.6%)           IIIIIII               38208      0        
     [character]    2. LEXUS                         9007 (23.6%)           IIII                  (100.0%)   (0.0%)   
                    3. FORD                          4012 (10.5%)           II                                        
                    4. CHEVROLET                     3208 ( 8.4%)           I                                         
                    5. NISSAN                        2272 ( 5.9%)           I                                         
                    6. GMC                            686 ( 1.8%)                                                     
                    7. MAZDA                          661 ( 1.7%)                                                     
                    8. HYUNDAI                        538 ( 1.4%)                                                     
                    9. VOLKSWAGEN                     505 ( 1.3%)                                                     
                    10. DODGE                         474 ( 1.2%)                                                     
                    [ 25 others ]                    2868 ( 7.5%)           I                                         

3    model          1. G37                           8779 (23.0%)           IIII                  38208      0        
     [character]    2. RX                            2727 ( 7.1%)           I                     (100.0%)   (0.0%)   
                    3. IS                            2213 ( 5.8%)           I                                         
                    4. ES                            2055 ( 5.4%)           I                                         
                    5. E-350                         1651 ( 4.3%)                                                     
                    6. IMPALA                        1406 ( 3.7%)                                                     
                    7. LEAF                          1359 ( 3.6%)                                                     
                    8. G25                            995 ( 2.6%)                                                     
                    9. M37                            964 ( 2.5%)                                                     
                    10. EXPRESS                       885 ( 2.3%)                                                     
                    [ 170 others ]                  15174 (39.7%)           IIIIIII                                   

4    trim           1. Base                         11399 (32.9%)           IIIIII                34646      3562     
     [character]    2. G37x                          4250 (12.3%)           II                    (90.7%)    (9.3%)   
                    3. G37 Journey                   2956 ( 8.5%)           I                                         
                    4. 2500                          1108 ( 3.2%)                                                     
                    5. E-350 Super Duty XLT           956 ( 2.8%)                                                     
                    6. G37                            906 ( 2.6%)                                                     
                    7. SL                             843 ( 2.4%)                                                     
                    8. SV                             607 ( 1.8%)                                                     
                    9. LT Fleet                       583 ( 1.7%)                                                     
                    10. SE                            579 ( 1.7%)                                                     
                    [ 276 others ]                  10459 (30.2%)           IIIIII                                    

5    body           1. SEDAN                        10160 (30.0%)           IIIII                 33919      4289     
     [character]    2. G SEDAN                       7567 (22.3%)           IIII                  (88.8%)    (11.2%)  
                    3. SUV                           6266 (18.5%)           III                                       
                    4. VAN                           3128 ( 9.2%)           I                                         
                    5. HATCHBACK                     2423 ( 7.1%)           I                                         
                    6. G COUPE                       1617 ( 4.8%)                                                     
                    7. CONVERTIBLE                    593 ( 1.7%)                                                     
                    8. E-SERIES VAN                   454 ( 1.3%)                                                     
                    9. COUPE                          403 ( 1.2%)                                                     
                    10. G CONVERTIBLE                 329 ( 1.0%)                                                     
                    [ 15 others ]                     979 ( 2.9%)                                                     

6    transmission   1. automatic                    32662 (97.0%)           IIIIIIIIIIIIIIIIIII   33672      4536     
     [character]    2. manual                        1010 ( 3.0%)                                 (88.1%)    (11.9%)  

7    vin            1. 1FBSS3BL7CDA01868                9 ( 0.0%)                                 38208      0        
     [character]    2. 1GYEE637780184619                9 ( 0.0%)                                 (100.0%)   (0.0%)   
                    3. 1ZVBP8JZ2E5263940                9 ( 0.0%)                                                     
                    4. 2FABP7BV5AX113795                9 ( 0.0%)                                                     
                    5. 2T2HK31U67C016215                9 ( 0.0%)                                                     
                    6. JN1CV6AP0DM715990                9 ( 0.0%)                                                     
                    7. JN1CV6APXCM936687                9 ( 0.0%)                                                     
                    8. JN1DV6AR6CM860083                9 ( 0.0%)                                                     
                    9. JNKBV61F67M805602                9 ( 0.0%)                                                     
                    10. JTHBK1EG6B2428322               9 ( 0.0%)                                                     
                    [ 35968 others ]                38118 (99.8%)           IIIIIIIIIIIIIIIIIII                       

8    state          1. FL                           8073 (21.1%)            IIII                  38208      0        
     [character]    2. CA                           6531 (17.1%)            III                   (100.0%)   (0.0%)   
                    3. PA                           2950 ( 7.7%)            I                                         
                    4. TX                           2498 ( 6.5%)            I                                         
                    5. TN                           2482 ( 6.5%)            I                                         
                    6. GA                           2374 ( 6.2%)            I                                         
                    7. NJ                           2087 ( 5.5%)            I                                         
                    8. IL                           2039 ( 5.3%)            I                                         
                    9. MO                           1119 ( 2.9%)                                                      
                    10. OH                          1040 ( 2.7%)                                                      
                    [ 28 others ]                   7015 (18.4%)            III                                       

9    condition      Mean (sd) : 3.7 (0.8)           41 distinct values                  . : :     37891      317      
     [numeric]      min < med < max:                                                    : : :     (99.2%)    (0.8%)   
                    1 < 3.9 < 5                                                     .   : : : :                       
                    IQR (CV) : 1.1 (0.2)                                        : . : : : : : :                       
                                                                                : : : : : : : :                       

10   odometer       Mean (sd) : 48049.3 (42571.8)   29571 distinct values   :                     38203      5        
     [numeric]      min < med < max:                                        :                     (100.0%)   (0.0%)   
                    1 < 32961 < 999999                                      :                                         
                    IQR (CV) : 41625 (0.9)                                  :                                         
                                                                            : .                                       

11   color          1. white                        10119 (26.5%)           IIIII                 38176      32       
     [character]    2. black                         8876 (23.3%)           IIII                  (99.9%)    (0.1%)   
                    3. gray                          6824 (17.9%)           III                                       
                    4. silver                        3821 (10.0%)           II                                        
                    5. blue                          2699 ( 7.1%)           I                                         
                    6. —                           2108 ( 5.5%)           I                                         
                    7. red                           1671 ( 4.4%)                                                     
                    8. gold                           390 ( 1.0%)                                                     
                    9. burgundy                       351 ( 0.9%)                                                     
                    10. beige                         294 ( 0.8%)                                                     
                    [ 9 others ]                     1023 ( 2.7%)                                                     

12   interior       1. black                        18588 (48.7%)           IIIIIIIII             38176      32       
     [character]    2. gray                         10169 (26.6%)           IIIII                 (99.9%)    (0.1%)   
                    3. beige                         5328 (14.0%)           II                                        
                    4. tan                           2450 ( 6.4%)           I                                         
                    5. brown                          647 ( 1.7%)                                                     
                    6. —                            614 ( 1.6%)                                                     
                    7. red                            136 ( 0.4%)                                                     
                    8. silver                          80 ( 0.2%)                                                     
                    9. blue                            37 ( 0.1%)                                                     
                    10. white                          36 ( 0.1%)                                                     
                    [ 7 others ]                       91 ( 0.2%)                                                     

13   seller         1. nissan infiniti lt           10064 (26.3%)           IIIII                 38208      0        
     [character]    2. lexus financial services      3674 ( 9.6%)           I                     (100.0%)   (0.0%)   
                    3. nissan-infiniti lt            1268 ( 3.3%)                                                     
                    4. gm remarketing                 885 ( 2.3%)                                                     
                    5. avis budget group              688 ( 1.8%)                                                     
                    6. infiniti financial servic      683 ( 1.8%)                                                     
                    7. nissan north america inc.      630 ( 1.6%)                                                     
                    8. ge fleet services for its      594 ( 1.6%)                                                     
                    9. r hollenshead auto sales       518 ( 1.4%)                                                     
                    10. u-haul                        486 ( 1.3%)                                                     
                    [ 3213 others ]                 18718 (49.0%)           IIIIIIIII                                 

14   mmr            Mean (sd) : 19735.9 (9262.3)    989 distinct values       :                   38208      0        
     [numeric]      min < med < max:                                          : .                 (100.0%)   (0.0%)   
                    250 < 19700 < 106000                                      : :                                     
                    IQR (CV) : 10400 (0.5)                                  : : :                                     
                                                                            : : : .                                   

15   sellingprice   Mean (sd) : 19619.4 (9364.1)    878 distinct values       :                   38208      0        
     [numeric]      min < med < max:                                          :                   (100.0%)   (0.0%)   
                    325 < 19500 < 113000                                      : .                                     
                    IQR (CV) : 10600 (0.5)                                  : : :                                     
                                                                            : : : .                                   

16   saledate       1. Tue Jan 27 2015 01:30:00       507 ( 1.3%)                                 38208      0        
     [character]    2. Tue Feb 17 2015 01:30:00       492 ( 1.3%)                                 (100.0%)   (0.0%)   
                    3. Tue Jan 20 2015 01:30:00       487 ( 1.3%)                                                     
                    4. Tue Feb 03 2015 01:30:00       471 ( 1.2%)                                                     
                    5. Tue Jun 16 2015 02:30:00       444 ( 1.2%)                                                     
                    6. Tue Mar 03 2015 01:30:00       430 ( 1.1%)                                                     
                    7. Thu Jan 15 2015 04:30:00       412 ( 1.1%)                                                     
                    8. Tue Feb 10 2015 01:30:00       404 ( 1.1%)                                                     
                    9. Tue Jan 06 2015 01:30:00       367 ( 1.0%)                                                     
                    10. Tue Jun 02 2015 02:30:00      365 ( 1.0%)                                                     
                    [ 1756 others ]                 33829 (88.5%)           IIIIIIIIIIIIIIIII                         

17   Body_Type      1. Convertible                   1043 ( 3.1%)                                 33919      4289     
     [character]    2. Coupe                         2374 ( 7.0%)           I                     (88.8%)    (11.2%)  
                    3. Hatchback                     2423 ( 7.1%)           I                                         
                    4. Minivan                       3614 (10.7%)           II                                        
                    5. Pickup                          95 ( 0.3%)                                                     
                    6. Sedan                        17727 (52.3%)           IIIIIIIIII                                
                    7. SUV                           6266 (18.5%)           III                                       
                    8. Wagon                          377 ( 1.1%)                                                     

18   YMM            1. 2013 INFINITI G37             3669 ( 9.6%)           I                     38208      0        
     [character]    2. 2012 INFINITI G37             2812 ( 7.4%)           I                     (100.0%)   (0.0%)   
                    3. 2011 INFINITI G37             2037 ( 5.3%)           I                                         
                    4. 2014 CHEVROLET IMPALA         1116 ( 2.9%)                                                     
                    5. 2012 LEXUS RX                 1000 ( 2.6%)                                                     
                    6. 2014 FORD E-350                859 ( 2.2%)                                                     
                    7. 2012 LEXUS IS                  802 ( 2.1%)                                                     
                    8. 2013 NISSAN LEAF               802 ( 2.1%)                                                     
                    9. 2012 INFINITI M37              766 ( 2.0%)                                                     
                    10. 2013 LEXUS RX                 705 ( 1.8%)                                                     
                    [ 483 others ]                  23640 (61.9%)           IIIIIIIIIIII                              

19   Segment        1. Mid-Size Car                 14266 (39.6%)           IIIIIII               36048      2160     
     [character]    2. Compact Car                   6068 (16.8%)           III                   (94.3%)    (5.7%)   
                    3. Compact SUV/Crossover         4214 (11.7%)           II                                        
                    4. Minivan                       2715 ( 7.5%)           I                                         
                    5. Full-Size Car                 2180 ( 6.0%)           I                                         
                    6. Subcompact Car                1715 ( 4.8%)                                                     
                    7. Mid-Size SUV/Crossover        1669 ( 4.6%)                                                     
                    8. Subcompact SUV/Crossover      1200 ( 3.3%)                                                     
                    9. Full-Size SUV/Crossover       1145 ( 3.2%)                                                     
                    10. Sports Car                    404 ( 1.1%)                                                     
                    [ 2 others ]                      472 ( 1.3%)                                                     

20   AVG_MSRP       Mean (sd) : 45858.3 (14383)     511 distinct values         . :               38208      0        
     [numeric]      min < med < max:                                            : :               (100.0%)   (0.0%)   
                    14 < 45530 < 151762                                         : :                                   
                    IQR (CV) : 18086 (0.3)                                    . : :                                   
                                                                              : : : .                                 

21   MSRP_Range     Mean (sd) : 12627 (8787.1)      415 distinct values     :                     38208      0        
     [numeric]      min < med < max:                                        : .                   (100.0%)   (0.0%)   
                    0 < 11672 < 116984                                      : :                                       
                    IQR (CV) : 12425 (0.7)                                  : : .                                     
                                                                            : : :                                     

22   Trim_Count     Mean (sd) : 8.3 (8.4)           30 distinct values      :                     38208      0        
     [numeric]      min < med < max:                                        :                     (100.0%)   (0.0%)   
                    1 < 7 < 131                                             :                                         
                    IQR (CV) : 11 (1)                                       :                                         
                                                                            : :                                       

23   StdDev_MSRP    Mean (sd) : 5388.7 (3148.9)     453 distinct values       :                   35115      3093     
     [numeric]      min < med < max:                                        : :                   (91.9%)    (8.1%)   
                    0 < 5693.7 < 47165.7                                    : :                                       
                    IQR (CV) : 3900.9 (0.6)                                 : :                                       
                                                                            : : .                                     

24   Error.MSRP     Mean (sd) : 2234.8 (1857.4)     453 distinct values     :                     35115      3093     
     [numeric]      min < med < max:                                        :                     (91.9%)    (8.1%)   
                    0 < 1683.4 < 21093.1                                    :                                         
                    IQR (CV) : 1199 (0.8)                                   : .                                       
                                                                            : : . .                                   

25   Error %        Mean (sd) : 4.7 (2.8)           103 distinct values       : :                 35115      3093     
     [numeric]      min < med < max:                                          : : .               (91.9%)    (8.1%)   
                    0 < 4.1 < 15.1                                            : : : .                                 
                    IQR (CV) : 3.1 (0.6)                                      : : : :                                 
                                                                            : : : : : : . .   .                       

26   %MSRP_Range    Mean (sd) : 28.1 (17.1)         310 distinct values           :               38208      0        
     [numeric]      min < med < max:                                              :               (100.0%)   (0.0%)   
                    0 < 30.8 < 107.2                                        :   . :                                   
                    IQR (CV) : 25.9 (0.6)                                   : : : :                                   
                                                                            : : : : : .                               
----------------------------------------------------------------------------------------------------------------------

Appendix E:

Remaining records from false_keys that failed to map after decoding VINs labelled as unmatched_keys.

Code
dfSummary(unmatched_keys)
Data Frame Summary  
unmatched_keys  
Dimensions: 20826 x 26  
Duplicates: 0  

-----------------------------------------------------------------------------------------------------------------------
No   Variable       Stats / Values                  Freqs (% of Valid)      Graph                 Valid      Missing   
---- -------------- ------------------------------- ----------------------- --------------------- ---------- ----------
1    year           1. 2007                         1612 ( 7.7%)            I                     20826      0         
     [character]    2. 2008                         3207 (15.4%)            III                   (100.0%)   (0.0%)    
                    3. 2009                          803 ( 3.9%)                                                       
                    4. 2010                         1817 ( 8.7%)            I                                          
                    5. 2011                         1847 ( 8.9%)            I                                          
                    6. 2012                         2605 (12.5%)            II                                         
                    7. 2013                         2676 (12.8%)            II                                         
                    8. 2014                         5654 (27.1%)            IIIII                                      
                    9. 2015                          605 ( 2.9%)                                                       

2    make           1. CHRYSLER                     5334 (27.7%)            IIIII                 19287      1539      
     [character]    2. FORD                         5026 (26.1%)            IIIII                 (92.6%)    (7.4%)    
                    3. DODGE                        1343 ( 7.0%)            I                                          
                    4. CHEVROLET                    1066 ( 5.5%)            I                                          
                    5. MITSUBISHI                    705 ( 3.7%)                                                       
                    6. MINI                          663 ( 3.4%)                                                       
                    7. MERCURY                       646 ( 3.3%)                                                       
                    8. VOLKSWAGEN                    593 ( 3.1%)                                                       
                    9. NISSAN                        565 ( 2.9%)                                                       
                    10. SCION                        540 ( 2.8%)                                                       
                    [ 31 others ]                   2806 (14.5%)            II                                         

3    model          1. TOWN AND COUNTRY             5319 (27.6%)            IIIII                 19287      1539      
     [character]    2. F-250 SUPER DUTY             1946 (10.1%)            II                    (92.6%)    (7.4%)    
                    3. E-SERIES VAN                 1372 ( 7.1%)            I                                          
                    4. CAPTIVA SPORT                1016 ( 5.3%)            I                                          
                    5. RAM PICKUP 1500               996 ( 5.2%)            I                                          
                    6. F-350 SUPER DUTY              853 ( 4.4%)                                                       
                    7. COOPER                        663 ( 3.4%)                                                       
                    8. TRANSIT CONNECT               555 ( 2.9%)                                                       
                    9. OUTLANDER SPORT               381 ( 2.0%)                                                       
                    10. TC                           327 ( 1.7%)                                                       
                    [ 125 others ]                  5859 (30.4%)            IIIIII                                     

4    trim           1. Touring                      4333 (22.5%)            IIII                  19273      1553      
     [character]    2. Base                         2131 (11.1%)            II                    (92.5%)    (7.5%)    
                    3. E-250                        1550 ( 8.0%)            I                                          
                    4. Lariat                       1064 ( 5.5%)            I                                          
                    5. XLT                           815 ( 4.2%)                                                       
                    6. SLT                           760 ( 3.9%)                                                       
                    7. XL                            715 ( 3.7%)                                                       
                    8. S                             584 ( 3.0%)                                                       
                    9. Touring-L                     564 ( 2.9%)                                                       
                    10. LX                           411 ( 2.1%)                                                       
                    [ 171 others ]                  6346 (32.9%)            IIIIII                                     

5    body           1. MINIVAN                      5988 (31.4%)            IIIIII                19085      1741      
     [character]    2. CREW CAB                     2550 (13.4%)            II                    (91.6%)    (8.4%)    
                    3. HATCHBACK                    2129 (11.2%)            II                                         
                    4. SUV                          2052 (10.8%)            II                                         
                    5. SEDAN                        1816 ( 9.5%)            I                                          
                    6. E-SERIES VAN                 1372 ( 7.2%)            I                                          
                    7. QUAD CAB                      974 ( 5.1%)            I                                          
                    8. SUPERCAB                      438 ( 2.3%)                                                       
                    9. WAGON                         383 ( 2.0%)                                                       
                    10. VAN                          322 ( 1.7%)                                                       
                    [ 10 others ]                   1061 ( 5.6%)            I                                          

6    transmission   1. automatic                    17115 (95.6%)           IIIIIIIIIIIIIIIIIII   17912      2914      
     [character]    2. manual                         797 ( 4.4%)                                 (86.0%)    (14.0%)   

7    vin            1. 1FT7W2BTXDEA03416                3 ( 0.0%)                                 20826      0         
     [character]    2. 1FT8W3DT1CEB32666                3 ( 0.0%)                                 (100.0%)   (0.0%)    
                    3. 1FTSW21568EA33810                3 ( 0.0%)                                                      
                    4. 1FTSW21R48EB47680                3 ( 0.0%)                                                      
                    5. 1FTXW43R28EA88031                3 ( 0.0%)                                                      
                    6. 2C4RC1BG2DR821210                3 ( 0.0%)                                                      
                    7. 2G4WD582571126964                3 ( 0.0%)                                                      
                    8. 1D3HU18268J106114                2 ( 0.0%)                                                      
                    9. 1D7HA16N28J150504                2 ( 0.0%)                                                      
                    10. 1D7HA18278J186420               2 ( 0.0%)                                                      
                    [ 20497 others ]                20799 (99.9%)           IIIIIIIIIIIIIIIIIII                        

8    state          1. FL                           2926 (14.0%)            II                    20826      0         
     [character]    2. CA                           2470 (11.9%)            II                    (100.0%)   (0.0%)    
                    3. TX                           2208 (10.6%)            II                                         
                    4. PA                           2134 (10.2%)            II                                         
                    5. IL                           1063 ( 5.1%)            I                                          
                    6. GA                           1027 ( 4.9%)                                                       
                    7. MI                            959 ( 4.6%)                                                       
                    8. NJ                            800 ( 3.8%)                                                       
                    9. OH                            670 ( 3.2%)                                                       
                    10. MN                           621 ( 3.0%)                                                       
                    [ 27 others ]                   5948 (28.6%)            IIIII                                      

9    condition      Mean (sd) : 3.5 (0.9)           40 distinct values                    :       20543      283       
     [numeric]      min < med < max:                                                    : :       (98.6%)    (1.4%)    
                    1 < 3.7 < 5                                                 .   :   : : : .                        
                    IQR (CV) : 1.4 (0.3)                                        : . : : : : : :                        
                                                                                : : : : : : : :                        

10   odometer       Mean (sd) : 59768.7 (47714.6)   19162 distinct values   :                     20825      1         
     [numeric]      min < med < max:                                        :                     (100.0%)   (0.0%)    
                    1 < 46884 < 397857                                      : :                                        
                    IQR (CV) : 64874 (0.8)                                  : : :                                      
                                                                            : : : : .                                  

11   color          1. white                        6646 (32.0%)            IIIIII                20796      30        
     [character]    2. black                        3286 (15.8%)            III                   (99.9%)    (0.1%)    
                    3. silver                       2522 (12.1%)            II                                         
                    4. gray                         2247 (10.8%)            II                                         
                    5. blue                         1991 ( 9.6%)            I                                          
                    6. red                          1653 ( 7.9%)            I                                          
                    7. —                           772 ( 3.7%)                                                       
                    8. gold                          454 ( 2.2%)                                                       
                    9. burgundy                      328 ( 1.6%)                                                       
                    10. beige                        301 ( 1.4%)                                                       
                    [ 8 others ]                     596 ( 2.9%)                                                       

12   interior       1. black                        9610 (46.2%)            IIIIIIIII             20796      30        
     [character]    2. gray                         7364 (35.4%)            IIIIIII               (99.9%)    (0.1%)    
                    3. beige                        1356 ( 6.5%)            I                                          
                    4. tan                          1259 ( 6.1%)            I                                          
                    5. brown                         527 ( 2.5%)                                                       
                    6. —                           489 ( 2.4%)                                                       
                    7. red                            57 ( 0.3%)                                                       
                    8. silver                         44 ( 0.2%)                                                       
                    9. blue                           34 ( 0.2%)                                                       
                    10. purple                        15 ( 0.1%)                                                       
                    [ 6 others ]                      41 ( 0.2%)                                                       

13   seller         1. u-haul                         958 ( 4.6%)                                 20826      0         
     [character]    2. enterprise veh exchange/r      679 ( 3.3%)                                 (100.0%)   (0.0%)    
                    3. the hertz corporation          630 ( 3.0%)                                                      
                    4. avis corporation               544 ( 2.6%)                                                      
                    5. santander consumer             517 ( 2.5%)                                                      
                    6. enterprise holdings/gdp        503 ( 2.4%)                                                      
                    7. chrysler capital               497 ( 2.4%)                                                      
                    8. hertz corporation/gdp          485 ( 2.3%)                                                      
                    9. pv holding inc/gdp             454 ( 2.2%)                                                      
                    10. ge fleet services for its     432 ( 2.1%)                                                      
                    [ 2945 others ]                 15127 (72.6%)           IIIIIIIIIIIIII                             

14   mmr            Mean (sd) : 17493.5 (13122.6)   1032 distinct values    :                     20826      0         
     [numeric]      min < med < max:                                        :                     (100.0%)   (0.0%)    
                    175 < 16050 < 182000                                    : .                                        
                    IQR (CV) : 11100 (0.8)                                  : :                                        
                                                                            : : .                                      

15   sellingprice   Mean (sd) : 17296.7 (13121.9)   907 distinct values     :                     20826      0         
     [numeric]      min < med < max:                                        :                     (100.0%)   (0.0%)    
                    300 < 15900 < 183000                                    : .                                        
                    IQR (CV) : 11200 (0.8)                                  : :                                        
                                                                            : : .                                      

16   saledate       1. Tue Feb 10 2015 01:30:00       165 ( 0.8%)                                 20826      0         
     [character]    2. Tue Mar 03 2015 01:30:00       164 ( 0.8%)                                 (100.0%)   (0.0%)    
                    3. Tue Feb 17 2015 01:30:00       151 ( 0.7%)                                                      
                    4. Tue Feb 03 2015 01:30:00       149 ( 0.7%)                                                      
                    5. Thu Feb 05 2015 01:30:00       148 ( 0.7%)                                                      
                    6. Thu Feb 26 2015 03:00:00       142 ( 0.7%)                                                      
                    7. Thu Jan 22 2015 01:30:00       141 ( 0.7%)                                                      
                    8. Tue Jan 27 2015 01:30:00       139 ( 0.7%)                                                      
                    9. Tue Jun 16 2015 02:30:00       127 ( 0.6%)                                                      
                    10. Thu Mar 05 2015 01:30:00      125 ( 0.6%)                                                      
                    [ 1675 others ]                 19375 (93.0%)           IIIIIIIIIIIIIIIIII                         

17   Body_Type      1. Convertible                   282 ( 1.5%)                                  19085      1741      
     [character]    2. Coupe                         438 ( 2.3%)                                  (91.6%)    (8.4%)    
                    3. Hatchback                    2129 (11.2%)            II                                         
                    4. Minivan                      7760 (40.7%)            IIIIIIII                                   
                    5. Pickup                       4225 (22.1%)            IIII                                       
                    6. Sedan                        1816 ( 9.5%)            I                                          
                    7. SUV                          2052 (10.8%)            II                                         
                    8. Wagon                         383 ( 2.0%)                                                       

18   YMM            1. 2014 Chrysler Town and Co     2829 (13.6%)           II                    20826      0         
     [character]    2. 2014 Ford E-Series Van        1114 ( 5.3%)           I                     (100.0%)   (0.0%)    
                    3. 2013 Chrysler Town and Co      897 ( 4.3%)                                                      
                    4. 2012 NA NA                     781 ( 3.8%)                                                      
                    5. 2014 Chevrolet Captiva Sp      509 ( 2.4%)                                                      
                    6. 2008 Ford F-250 Super Dut      474 ( 2.3%)                                                      
                    7. 2008 Dodge Ram Pickup 150      473 ( 2.3%)                                                      
                    8. 2013 Chevrolet Captiva Sp      406 ( 1.9%)                                                      
                    9. 2008 Chrysler Town and Co      384 ( 1.8%)                                                      
                    10. 2010 Dodge Ram Pickup 150     379 ( 1.8%)                                                      
                    [ 339 others ]                  12580 (60.4%)           IIIIIIIIIIII                               

19   Segment        All NA's                                                                      0          20826     
     [character]                                                                                  (0.0%)     (100.0%)  

20   AVG_MSRP       All NA's                                                                      0          20826     
     [numeric]                                                                                    (0.0%)     (100.0%)  

21   MSRP_Range     All NA's                                                                      0          20826     
     [numeric]                                                                                    (0.0%)     (100.0%)  

22   Trim_Count     All NA's                                                                      0          20826     
     [numeric]                                                                                    (0.0%)     (100.0%)  

23   StdDev_MSRP    All NA's                                                                      0          20826     
     [numeric]                                                                                    (0.0%)     (100.0%)  

24   Error.MSRP     All NA's                                                                      0          20826     
     [numeric]                                                                                    (0.0%)     (100.0%)  

25   Error %        All NA's                                                                      0          20826     
     [numeric]                                                                                    (0.0%)     (100.0%)  

26   %MSRP_Range    All NA's                                                                      0          20826     
     [numeric]                                                                                    (0.0%)     (100.0%)  
-----------------------------------------------------------------------------------------------------------------------

Appendix F:

Finalized and cleaned auction data carprices.

Code
dfSummary(carprices)
Data Frame Summary  
carprices  
Dimensions: 440582 x 24  
Duplicates: 1043  

-----------------------------------------------------------------------------------------------------------------------
No   Variable       Stats / Values                  Freqs (% of Valid)       Graph                 Valid      Missing  
---- -------------- ------------------------------- ------------------------ --------------------- ---------- ---------
1    year           Mean (sd) : 2011.7 (2.1)        2007 :  29503 ( 6.7%)    I                     440582     0        
     [numeric]      min < med < max:                2008 :  28910 ( 6.6%)    I                     (100.0%)   (0.0%)   
                    2007 < 2012 < 2015              2009 :  20172 ( 4.6%)                                              
                    IQR (CV) : 2 (0)                2010 :  25510 ( 5.8%)    I                                         
                                                    2011 :  48099 (10.9%)    II                                        
                                                    2012 : 100873 (22.9%)    IIII                                      
                                                    2013 :  97936 (22.2%)    IIII                                      
                                                    2014 :  80986 (18.4%)    III                                       
                                                    2015 :   8593 ( 2.0%)                                              

2    make           1. Ford                          76396 (17.3%)           III                   440582     0        
     [character]    2. Chevrolet                     51555 (11.7%)           II                    (100.0%)   (0.0%)   
                    3. Nissan                        43409 ( 9.9%)           I                                         
                    4. Toyota                        31100 ( 7.1%)           I                                         
                    5. Dodge                         28416 ( 6.4%)           I                                         
                    6. Honda                         19364 ( 4.4%)                                                     
                    7. Hyundai                       18239 ( 4.1%)                                                     
                    8. Kia                           16219 ( 3.7%)                                                     
                    9. BMW                           15552 ( 3.5%)                                                     
                    10. INFINITI                     13916 ( 3.2%)                                                     
                    [ 59 others ]                   126416 (28.7%)           IIIII                                     

3    model          1. Altima                        17283 ( 3.9%)                                 440582     0        
     [character]    2. Fusion                        12607 ( 2.9%)                                 (100.0%)   (0.0%)   
                    3. Impala                        12084 ( 2.7%)                                                     
                    4. Focus                         11924 ( 2.7%)                                                     
                    5. F-150                         11780 ( 2.7%)                                                     
                    6. Escape                        10868 ( 2.5%)                                                     
                    7. Camry                         10407 ( 2.4%)                                                     
                    8. G37                            8765 ( 2.0%)                                                     
                    9. Charger                        7902 ( 1.8%)                                                     
                    10. Grand Caravan                 7134 ( 1.6%)                                                     
                    [ 508 others ]                  329828 (74.9%)           IIIIIIIIIIIIII                            

4    trim           1. SE                            43220 ( 9.9%)           I                     437241     3341     
     [character]    2. Base                          38248 ( 8.7%)           I                     (99.2%)    (0.8%)   
                    3. LX                            16546 ( 3.8%)                                                     
                    4. LT                            16386 ( 3.7%)                                                     
                    5. Limited                       15506 ( 3.5%)                                                     
                    6. XLT                           12476 ( 2.9%)                                                     
                    7. SXT                           10820 ( 2.5%)                                                     
                    8. GLS                           10553 ( 2.4%)                                                     
                    9. LE                            10456 ( 2.4%)                                                     
                    10. 2.5 S                        10374 ( 2.4%)                                                     
                    [ 1116 others ]                 252656 (57.8%)           IIIIIIIIIII                               

5    transmission   1. automatic                    380465 (97.3%)           IIIIIIIIIIIIIIIIIII   391040     49542    
     [character]    2. manual                        10575 ( 2.7%)                                 (88.8%)    (11.2%)  

6    Body_Type      1. Convertible                    7080 ( 1.6%)                                 436601     3981     
     [character]    2. Coupe                         14742 ( 3.4%)                                 (99.1%)    (0.9%)   
                    3. Hatchback                     22578 ( 5.2%)           I                                         
                    4. Minivan                       17672 ( 4.0%)                                                     
                    5. Pickup                        32990 ( 7.6%)           I                                         
                    6. Sedan                        213307 (48.9%)           IIIIIIIII                                 
                    7. SUV                          115740 (26.5%)           IIIII                                     
                    8. Wagon                         12492 ( 2.9%)                                                     

7    state          1. FL                            66449 (15.1%)           III                   440582     0        
     [character]    2. CA                            53754 (12.2%)           II                    (100.0%)   (0.0%)   
                    3. PA                            46643 (10.6%)           II                                        
                    4. TX                            37667 ( 8.5%)           I                                         
                    5. GA                            27253 ( 6.2%)           I                                         
                    6. NJ                            22365 ( 5.1%)           I                                         
                    7. IL                            20557 ( 4.7%)                                                     
                    8. TN                            19572 ( 4.4%)                                                     
                    9. OH                            17293 ( 3.9%)                                                     
                    10. MO                           14683 ( 3.3%)                                                     
                    [ 28 others ]                   114346 (26.0%)           IIIII                                     

8    condition      Mean (sd) : 3.6 (0.8)           36 distinct values                   . :       440582     0        
     [numeric]      min < med < max:                                                     : :       (100.0%)   (0.0%)   
                    1.5 < 3.8 < 5                                                  .   . : : : .                       
                    IQR (CV) : 1.3 (0.2)                                       :   : . : : : : :                       
                                                                               : : : : : : : : :                       

9    odometer       Mean (sd) : 51566.9 (37924.8)   125674 distinct values   :                     440569     13       
     [numeric]      min < med < max:                                         :                     (100.0%)   (0.0%)   
                    1 < 40632 < 999999                                       :                                         
                    IQR (CV) : 45156 (0.7)                                   :                                         
                                                                             : .                                       

10   color          1. black                        93179 (21.2%)            IIII                  439981     601      
     [character]    2. white                        86178 (19.6%)            III                   (99.9%)    (0.1%)   
                    3. gray                         69021 (15.7%)            III                                       
                    4. silver                       62762 (14.3%)            II                                        
                    5. blue                         36722 ( 8.3%)            I                                         
                    6. red                          34815 ( 7.9%)            I                                         
                    7. —                          24059 ( 5.5%)            I                                         
                    8. burgundy                      6431 ( 1.5%)                                                      
                    9. brown                         5583 ( 1.3%)                                                      
                    10. gold                         5469 ( 1.2%)                                                      
                    [ 10 others ]                   15762 ( 3.6%)                                                      

11   mmr            Mean (sd) : 16064.2 (9019.3)    1070 distinct values     :                     440582     0        
     [numeric]      min < med < max:                                         :                     (100.0%)   (0.0%)   
                    75 < 13950 < 176000                                      :                                         
                    IQR (CV) : 9900 (0.6)                                    : :                                       
                                                                             : :                                       

12   sellingprice   Mean (sd) : 15926 (9096.5)      1733 distinct values     :                     440582     0        
     [numeric]      min < med < max:                                         :                     (100.0%)   (0.0%)   
                    275 < 13900 < 173000                                     :                                         
                    IQR (CV) : 9900 (0.6)                                    : :                                       
                                                                             : :                                       

13   sale_year      1. 2014                          32075 ( 7.3%)           I                     440582     0        
     [character]    2. 2015                         408507 (92.7%)           IIIIIIIIIIIIIIIIII    (100.0%)   (0.0%)   

14   launch_date    1. 2006-09-01                   29503 ( 6.7%)            I                     440582     0        
     [Date]         2. 2007-09-01                   28910 ( 6.6%)            I                     (100.0%)   (0.0%)   
                    3. 2008-09-01                   20172 ( 4.6%)                                                      
                    4. 2009-09-01                   25510 ( 5.8%)            I                                         
                    5. 2010-09-01                   48099 (10.9%)            II                                        
                    6. 2011-09-01                   100873 (22.9%)           IIII                                      
                    7. 2012-09-01                   97936 (22.2%)            IIII                                      
                    8. 2013-09-01                   80986 (18.4%)            III                                       
                    9. 2014-09-01                   8593 ( 2.0%)                                                       

15   Date_Sold      min : 2014-01-01                168 distinct values                  . :       440582     0        
     [Date]         med : 2015-02-17                                                     : :   .   (100.0%)   (0.0%)   
                    max : 2015-07-21                                                     : :   :                       
                    range : 1y 6m 20d                                                    : :   :                       
                                                                                         : : . :                       

16   AVG_MSRP       Mean (sd) : 35480.9 (16141.4)   2166 distinct values     :                     440582     0        
     [numeric]      min < med < max:                                         : :                   (100.0%)   (0.0%)   
                    10478 < 31287 < 215813                                   : :                                       
                    IQR (CV) : 13329 (0.5)                                   : :                                       
                                                                             : : :                                     

17   Trim_Count     Mean (sd) : 14.1 (23.9)         81 distinct values       :                     440582     0        
     [numeric]      min < med < max:                                         :                     (100.0%)   (0.0%)   
                    1 < 8 < 224                                              :                                         
                    IQR (CV) : 7 (1.7)                                       :                                         
                                                                             :                                         

18   Error %        Mean (sd) : 4.7 (2.9)           159 distinct values      :                     440582     0        
     [numeric]      min < med < max:                                         :                     (100.0%)   (0.0%)   
                    0 < 4.6 < 99.9                                           :                                         
                    IQR (CV) : 2.7 (0.6)                                     :                                         
                                                                             :                                         

19   %MSRP_Range    Mean (sd) : 40.7 (22.7)         695 distinct values        : .                 440582     0        
     [numeric]      min < med < max:                                           : :                 (100.0%)   (0.0%)   
                    0 < 39.1 < 199.7                                           : :                                     
                    IQR (CV) : 20.2 (0.6)                                    : : :                                     
                                                                             : : : : . .                               

20   Segment        1. Mid-Size Car                 110030 (26.3%)           IIIII                 418309     22273    
     [character]    2. Compact Car                   80614 (19.3%)           III                   (94.9%)    (5.1%)   
                    3. Compact SUV/Crossover         51377 (12.3%)           II                                        
                    4. Mid-Size SUV/Crossover        45038 (10.8%)           II                                        
                    5. Full-Size Car                 26881 ( 6.4%)           I                                         
                    6. LD Full-Size Pickup           24419 ( 5.8%)           I                                         
                    7. Subcompact Car                20785 ( 5.0%)                                                     
                    8. Minivan                       16623 ( 4.0%)                                                     
                    9. Full-Size SUV/Crossover       13921 ( 3.3%)                                                     
                    10. Sports Car                   12985 ( 3.1%)                                                     
                    [ 3 others ]                     15636 ( 3.7%)                                                     

21   Age_months     Mean (sd) : 46.6 (25.8)         71 distinct values             :               440582     0        
     [numeric]      min < med < max:                                           . . :               (100.0%)   (0.0%)   
                    0 < 42 < 108                                               : : : :                                 
                    IQR (CV) : 29 (0.6)                                        : : : :       . .                       
                                                                             . : : : : : : : : :                       

22   Age_years      Mean (sd) : 3.9 (2.2)           0 :   3385 ( 0.8%)                             440582     0        
     [numeric]      min < med < max:                1 :  35432 ( 8.0%)       I                     (100.0%)   (0.0%)   
                    0 < 4 < 9                       2 : 112822 (25.6%)       IIIII                                     
                    IQR (CV) : 3 (0.6)              3 :  63738 (14.5%)       II                                        
                                                    4 : 103786 (23.6%)       IIII                                      
                                                    5 :  23061 ( 5.2%)       I                                         
                                                    6 :  30003 ( 6.8%)       I                                         
                                                    7 :  12962 ( 2.9%)                                                 
                                                    8 :  36563 ( 8.3%)       I                                         
                                                    9 :  18830 ( 4.3%)                                                 

23   RV_percent     Mean (sd) : 45.6 (17.7)         1230 distinct values           :               440582     0        
     [numeric]      min < med < max:                                             . : :             (100.0%)   (0.0%)   
                    0.7 < 45.1 < 124.7                                           : : :                                 
                    IQR (CV) : 23 (0.4)                                        : : : : :                               
                                                                             . : : : : : .                             

24   Annual_miles   Mean (sd) : Inf (NaN)           36291 distinct values                          440569     13       
     [numeric]      min < med < max:                                                               (100.0%)   (0.0%)   
                    0 < 12375 < Inf                                                                                    
                    IQR (CV) : 8504 (NaN)                                                                              
-----------------------------------------------------------------------------------------------------------------------

More about the NHTSA’s vindecodr package:

Below is an example of the API call from the NHTSA’s website: https://vpic.nhtsa.dot.gov/api/Home/Index/LanguageExamples

require(RJSONIO)
# User-defined functions ----------------------------------
VehicleAPIrConnect <- function(VINin){
    # Lookup VIN information from https://vpic.nhtsa.dot.gov/api
    #
    # Args:
    #   VINin: VIN
    #
    # Returns:
    #   Data frame with vehicle information.
    #
    # For Testing:
    # VINin <- "5UXWX7C5*BA"
    tempCall <- paste0("https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVinValues/", VINin, "?format=json")
    tempExtract <- fromJSON(tempCall)
    dfOut <- data.frame(t(unlist(tempExtract$Results)),stringsAsFactors=FALSE)
    dfOut
}
VehicleAPIrConnect("5UXWX7C5*BA")
Source Code
---
title: "Final Project"
author: "Owen Tibby"
desription: "Final Project"
date: "`r Sys.Date()`" 
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
    
output: distill::distill_article

categories:
  - Final Project (Second Submission)
  - Used Vehicle Data
  - Owen Tibby
---

```{r}
#| label: setup
#| warning: false
library(distill)
library(tidyverse)
library(tidyr)
library(summarytools)
library(dplyr)
library(readr)
library(readxl)
library(stringr)
library(lubridate)
library(vindecodr)
library(plotrix)
library(sparklyr)
library(purrr)
library(rmarkdown)
library(knitr)
library(psych) 


knitr::opts_chunk$set(echo = TRUE, warning = FALSE, paged.print= TRUE)


```

# **Introduction**

Introduction & Purpose of the Study:

The automotive sector is a major part of the US economy and is among the hardest hit by the ripple effects of the COVID-19 pandemic. Factory closures and lockdowns led to a supply shortage of critical vehicle components necessary for new car production. Consequently, this has significantly increased the demand and value of used cars. However, some experts believe the market will return to 'normal' by the end of 2023. But what did the market look like before the pandemic?

The purpose of this study is to explore the realm of the used car market, specifically, the wholesale sector. The primary data was gathered from a public domain and represents Manheim auction sales between wholesalers/ remarketers and wholesale buyers/ dealers of used vehicles. I am interested in exploring the retained value of vehicles based on their segment, body type, and age. Correlations between vehicle condition and mileage will also be analyzed.

Research Question: How do age, mileage, body style, and condition affect the retained value of a used vehicle?

# **Data & Methodology:**

#### Reading in the Data

```{r Read In, message=FALSE, warning=FALSE}
# # #RStudio Desktop
carprices <-read_excel("_data/car_prices2.xlsx", col_types = c("text", "text", "text", "text", "text", "text", "text", "text", "numeric", "numeric", "text", "text", 
        "text", "numeric", "numeric", "text",  "skip"))
df <- carprices
exchange_rate <- read_excel("_data/Historical_Prices.xlsx", col_types = c("numeric", "text", "skip", "skip", "skip", "numeric"))

#Second Read-In of ca_catalog (to make data fields private)
ca_catalog <- read_csv("_data/ca_catalog2.csv")

#Read-In from code chunk
decoded_keys <- read_csv("_data/decoded.keys1.csv")
```

```{r Cloud Read In, message=FALSE, warning=FALSE, include=FALSE}
# # # #RStudio Cloud
#  carprices <-read_csv("/cloud/project/Final_project/car_prices.csv",
#      col_types = cols(mmr = col_number(), year = col_character(), sellingprice = col_number(), ...17 = col_skip()))
#   ca_catalog <- read_csv("/cloud/project/Final_project/ca_catalog.csv")
#  exchange_rate <- read_excel("/cloud/project/Final_project/HistoricalPrices.xlsx",  col_types = c("numeric", "text", "skip", "skip", "skip", "numeric"))
```

**Description of Data:**

The primary data used in the analysis was the 'carprices.csv' data set from Kaggle (see appendix A). This data set contains historical used vehicle auction transactions that took place between 2014 and 2015. The data was collected from Manheim auction houses across the 34 states in the USA. Each observation details a wholesale auction transaction that includes: the vehicle information, state/province, selling price, sale date, and the market average price i.e., the Manheim Market Report price (MMR) and other specifics about the sale.

#### 

#### **Primary Objectives**

*Data Cleansing:*

1.  Have consistency in the names of vehicle makes and models by using appropriate packages and joins.

2.  Obtain MSRP from by joining with a catalog

*Data Analysis:*

3.  Calculate vehicle age as a variable in both years and months.

4.  Compute retained value as a percentage of each vehicle's MSRP

5.  Compute standard error as a percentage of the mean MSRP for each model and filter transactions with a low error rating for deeper analysis.

6.  Analyze the impact of age, mileage, condition, and segment on retained value

7.  Determine the segments (eg. compact cars, midsize cars, etc.) and vehicles with the highest retained value percentage.\

#### **Methods:**

Retained Value (RV) is calculated by dividing the selling price (auction value) by the vehicle's MSRP.

Retained Value% = Wholesale Price/ MSRP 

I was able to source MSRPs from a Canadian vehicle catalog *(Appendix B)* . To protect the data confidentiality, the MSRPs in the Canadian dollar catalog were averaged for each year, make, and model by trim level. In other words, I aggregated and derived the mean of the MSRPs of all trims belonging to a specific year, make and model, or YMM to reduce the level of detail. Therefore, MSRPs for trims belonging to a particular year, make and model shows the mean MSRP for the entire YMM group. 

![](images/process_RV.jpg){fig-align="center"}

Using the YMM as a key common to the 'carprices' data as well as the ca_catalog, I can combine both and fetch the MSRP. Unsurprisingly, this left one more challenge, the MSRPs are listed in Canadian Dollars. In order to convert it to US Dollars, I will have to find and divide each vehicle MSRP by the USD: CAD exchange rate at the time of release. I was able to source the FX rate from 2006 and onwards using 'exchange_rate.csv' data obtained from WSJ Markets. Considering the below assumptions, I joined the exchange_rate.csv to carprices.csv using launch year as the common key between both data frames. Each MSRP was then divided by the daily close rate of the USD-CAD. 

Important assumptions to get MSRP in USD: 

1.  Canadian vehicles & trims are priced similarly with US trims. 

2.  All vehicles were released in September of the previous year. i.e., a 2013 vehicle was available for sale in September 2012. 

3.  The exchange rate at the time of release was the closing FX rate at Aug-31st 

#### **Data Cleansing:**

##### **Observing the data**

```{r results='hide'}
summary(df)
```

Table 1.0: Auction Data (Preview)

```{r Intro, message=FALSE, warning=FALSE, echo= TRUE}
print(kable(head(carprices)))
```

```{r message=FALSE, warning=FALSE, echo= TRUE, results='hide'}
#Observing and removing all records for model year 2006 and older & all erroneous prices
carprices<- carprices %>% filter(`year`> 2006, sellingprice>100)

#Checking the consistency in the names of states and body types
carprices %>%  select(state) %>% distinct()
carprices %>% select(body) %>% distinct() 

carprices$state <- toupper(carprices$state)
carprices$body <- toupper(carprices$body)
```

##### Cleaning the Data: Mapping and Joining

```{r message=FALSE, warning=FALSE, echo=TRUE, results='hide'}

#Checking the consistency in the names of Body types}
carprices %>% select(body) %>% unique()

#Creating a Body Type column
carprices <- mutate(carprices, Body_Type = case_when(str_detect(body, "EDAN")~ "Sedan", str_detect(body, "SUV") ~ "SUV", str_detect(body, "AB") ~ "Pickup", str_detect(body, "OUP")~ "Coupe", str_detect(`body`, "VAN")~ "Minivan", str_detect(`body`,"ONV")~ "Convertible", str_detect(`body`,"AGO") ~ "Wagon", str_detect(`body`,"ATCH")~ "Hatchback", str_detect(`body`,"CREW")~ "Pickup" ))

```

Joining multiple files to get the MSRP from a historical catalog. The code below shows what I did to write a new CSV file (that does not include the original MSRP but rather the mean MSRP and the other useful fields like trim count and Error.

```{r Creating a Join key, message=FALSE, warning=FALSE, eval=FALSE}

#Creating a Key in the CDN catalog
ca_catalog <-  ca_catalog %>% mutate(YearOfLaunch= `Model Year`-1)

#Creating Catalog with MSRP in CAD
ca_catalog <- ca_catalog %>% group_by(`Combined`, `YearOfLaunch`, `Segment` ) %>% 
summarise( Trim_Count=n(), AVG_MSRP_CAD= mean(`MSRP`, na.rm= TRUE), MSRP_Range_CAD= (max(`MSRP`)- min(`MSRP`)), StdDev_MSRP_CAD= sd(`MSRP`, na.rm=FALSE), Error.MSRP_CAD= std.error(`MSRP`, na.rm=FALSE))%>% 
ungroup()

#Writing CSV to exclude confidential MSRPs and only include the AVG_MSRP field
write.csv(ca_catalog, "C:\\Users\\Owen-Matthew\\Documents\\Github\\601_Fall_2022\\posts\\ca_catalog.csv", row.names=FALSE)
```

Join #1

```{r First Join, , paged.print=TRUE , echo=TRUE}
#Joining the data to convert CAD MSRP to USD MSRP later
ca_catalog <- left_join(ca_catalog, exchange_rate, c("YearOfLaunch"= "Year"))

#Creating US$ Catalog and Converting to MSRP from CAD to USD
us_catalog <- ca_catalog %>% 
  mutate(AVG_MSRP= round((`AVG_MSRP_CAD` / `Close`), digits = 0), MSRP_Range= `MSRP_Range_CAD` / `Close`, StdDev_MSRP= `StdDev_MSRP_CAD`/ `Close`, Error.MSRP= `Error.MSRP_CAD`/ `Close` )

#Finalizing US Catalog for Joining
us_catalog <- us_catalog %>% 
select(Combined, Segment, AVG_MSRP, MSRP_Range, Trim_Count, StdDev_MSRP, Error.MSRP) %>% 
  mutate(`Error %`= round((`Error.MSRP`/ `AVG_MSRP` * 100), digits = 1), `%MSRP_Range` = round((`MSRP_Range`/ `AVG_MSRP` *100), digits=1))

#Creating a common key to join carprices and us_catalog
carprices<- carprices %>% mutate(year2=year, make2=make, model2=model)
carprices <- carprices %>%  unite(`YMM`, `year2`, `make2`, `model2`, sep=" ")
```

Join #2

```{r Second Join, message=FALSE, warning=FALSE, paged.print=TRUE}
#Joining Data with us_catalog
carprices <- left_join(carprices, us_catalog, c("YMM"= "Combined" )) 

#saving our raw records before filtering
raw_carprices <- carprices

#Filtering NAs out the keys that were not mapped to an MSRP using YMM
#(Will run these false keys through a vin decoder to get exact matches)
false_keys <- carprices %>% filter(is.na(AVG_MSRP))

#Removing the false_keys from main dataset
carprices <- carprices %>% filter(!is.na(AVG_MSRP)) 

us_catalog$Combined <- toupper(us_catalog$Combined)
false_keys$vin <- toupper(false_keys$vin)
false_keys$make <- toupper(false_keys$make)
false_keys$model <- toupper(false_keys$model)
```

##### Cleaning the Data: Decoding & More Joining

Vin deocder function used to decode all VINs that were not matched. This was available from the vindecodr package in R contributed by the NHTSA.

The code below shows how I created the 'decoded_keys' csv file that I read in earlier. I have set eval=FALSE because of the lengthy run time.

```{r Vin Decoder: cleaning unmtached records, eval=FALSE, echo=TRUE}
# To clean Year, make, Model fields in false_keys (code run time = 3hrs + 13mins)

VinIn <-false_keys %>% select(vin) 
VinIn <- VinIn$vin

decode <- function(x) {
 VIN<- decode_vin(x)}

decoded_keys <- map_dfr(VinIn,  decode)

#Saving the above output as a CSV
write.csv(decoded_keys, "C:\\Users\\Owen-Matthew\\Documents\\Github\\601_Fall_2022\\posts\\decoded_keys.csv", row.names=FALSE)
```

```{r Read-In decoded keys, eval=FALSE, message=FALSE, warning=FALSE, echo=TRUE}
# # #R Studio Desktop Read In
##Cleaned False keys (Imported at read in so I set eval= FALSE)
decoded_keys <- read_csv("decoded.keys.csv")

#R Studio Cloud Read In
# decoded_keys <- read_csv("/cloud/project/Final_project/decoded.keys1.csv")
```

```{r 1. Inner Join_Matching cleaned records with catalog, message=FALSE, warning=FALSE, echo=FALSE}
action_keys <- decoded_keys
action_keys$make <- toupper(action_keys$make)
action_keys$model <- toupper(action_keys$model)
action_keys$VIN <- toupper(action_keys$VIN)

action_keys <-  action_keys %>% filter(!is.na(year)) %>% filter(!is.na(make)) %>% filter(!is.na(model)) %>% filter(!is.na(VIN)) 
a_keys <- action_keys

#Join with Catalog
a_keys <-  a_keys %>% 
  select(model_year, make,model, VIN) %>%  mutate(year= model_year, make2=make, model2=model) %>% unite (YMM, model_year, make2, model2, sep = " ")
a_keys <- inner_join(a_keys, us_catalog, c("YMM"= "Combined" ))
```

```{r 2. Inner Join_Matching cleaned records with catalog, message=FALSE, warning=FALSE, echo=FALSE, results='hide', include=FALSE, eval=FALSE}

#Checking to see if a vehicle was auctioned multiple times, to see if we need to include sale date as a matching key
unique_check <- a_keys %>%  select(`VIN`)
print(str_c(count(unique_check)-count(unique(unique_check)), " ", "are multiple/ duplicated YMMs that failed to have a match with the catalog. These vehicles may have been auctioned multiple times. Possibly when the title was clean and again when salvaged. We can test this hypothesis by observing the vehicle condition. However, that is outside of the scope of this analysis. "))
```

Join #3

```{r 1. Tidying & more Joining,echo=TRUE,message=FALSE, warning=FALSE, paged.print=TRUE}
#Selecting useful columns in false_keys
f_keys <- false_keys
false_keys <- false_keys %>%
select(trim, body, Body_Type, transmission, vin, state , condition, odometer, color, interior,seller, mmr, sellingprice, saledate)

#Join with false keys to get sale records
#cleaned_keys= inner_join(a_keys, false_keys, c("VIN" = "vin")) %>% select(1:26) %>% rename("vin" = "VIN")
cleaned_keys <-  inner_join(false_keys, a_keys, c("vin" = "VIN"))

#saving unmatched NAs for future wrangling
action_keys <- action_keys %>%   select(model_year, make,model, VIN) %>%  mutate(year= model_year, make2=make, model2=model) %>% unite (YMM, model_year, make2, model2, sep = " ")

#saving unmatched keys
unmatched_keys <- anti_join(f_keys, cleaned_keys, c("vin"= "vin"))

```

Join #4-5

```{r 1. Manually cleaning unmatched records, tidying & more Joining,echo=TRUE,message=FALSE, warning=FALSE, paged.print=TRUE}
#Attempting to clean the most common unmatched keys manually
unmatched_keys_clean <- mutate(unmatched_keys, year2=year, year3=year, make2=make, make3=make, model2 = case_when(str_detect(model,"6 SERIES")~ "6 SERIES", str_detect(model, "3 SERIES")~ "3 SERIES", str_detect(model, "4 SERIES")~ "4 SERIES",str_detect(model, "5 SERIES")~ "5 SERIES", str_detect(model, "CAPTIVA")~ "CAPTIVA",str_detect(model, "OUTLANDER")~ "OUTLANDER", str_detect(model, "RAM PICKUP 1500")~ "RAM 1500", str_detect(model, "TOWN AND COUNTRY")~ "TOWN & COUNTRY",  str_detect(model, c("750I", "7 SERIES",  "750LI"))~ "7 SERIES" ), model3=model2)

unmatched_keys_clean <-  unmatched_keys_clean %>%  unite(YMM2, year2, make2, model2, sep= " ")
 
unmatched_keys_clean <- unmatched_keys_clean %>%  
  select(year3, make3, model3, trim, body, Body_Type, transmission, vin, state , condition, odometer, color, interior,seller, mmr, sellingprice, saledate, YMM2) %>%
  filter(!is.na(make3), !is.na(model3)) 
  
#An additional 6860 records were cleaned
cleaned_keys2 <- inner_join(unmatched_keys_clean, us_catalog, c("YMM2"= "Combined" )) %>% rename(year= year3, make=make3, model=model3, YMM=YMM2)

#-------
unmatched_keys <- anti_join(unmatched_keys, cleaned_keys2, c("vin"= "vin"))
unmatched_unique <- unmatched_keys %>% select(vin) %>% distinct()
a <- (str_c(count(unmatched_unique), " ", "unique vehicle VINs; "))

unmatched_makes <- unmatched_keys %>% select(make) %>% distinct()
unmatched_models <- unmatched_keys %>% select(model) %>% distinct()
unmatched_YMM <- unmatched_keys %>% select(YMM) %>% distinct()
b <- (str_c(count(unmatched_makes), " ", "unmatched makes; "))
c <- (str_c(count(unmatched_models), " ", "unmatched models: "))
d <- (str_c(count(unmatched_YMM), " ", "unmatched YMM keys (Year-Make-Model)."))

print(str_c(a,b,c,d))
```

Binding cleaned data sets

```{r  Tidying & row binding,echo=TRUE,message=FALSE, warning=FALSE, paged.print=TRUE}
#---------
#get column names & order from carprices
check_cols <-  colnames(carprices) %>% as.factor()


#checking to see if the column order is the same
cleaned_keys <- cleaned_keys[, check_cols]
cleaned_keys2 <- cleaned_keys2[, check_cols]

#Merging both sets of cleaned data (Auto and manually cleaned)
cleaned_keys <-  rbind(cleaned_keys, cleaned_keys2)


print(kable(head(cleaned_keys), caption = "Table 1.1: Displaying cleaned keys"))
print(kable(head(cleaned_keys2), caption = "Table 1.2: Displaying cleaned keys2"))
```

1st iteration of cleaning through decoder: 67% of the previous NA records were automatically cleaned (38208 of 57328 records) 2nd iteration of cleaning (Manually): 6860 records were manually cleaned. In total, 79% of the NAs were cleaned in the Year, Make, and Model fields

```{r  Final bind ,echo=TRUE,message=FALSE, warning=FALSE, paged.print=TRUE}

#reorganize cleaned keys to rbind with carprices data
carprices <-  rbind(carprices, cleaned_keys)

```

```{r echo=FALSE, message=FALSE, warning=FALSE, results='hide'}
# Displaying the Canadian records
carprices %>% filter(state== c("AB", "ON", "QC", "NS"))
# Excluding Canadian records
carprices<-carprices %>% filter(state!= c("AB", "ON", "QC", "NS"))
```

#### **Considerations:**

This approximation of the MSRP using the YMM formed the basis for a lot of the analysis:

1.  **Computation was narrowed down to model level.** By using the mean MSRP for all trim levels, I can only analyze the retained value on the Year, Make, Model level i.e., not specific to the trim. For example, I can only analyze how a 3-year-old Toyota Camry retains its value but not a Toyota Camry XSE. 

2.  **Normal distribution of mean price.** The mean of the MSRP assumes that the purchase of trims/packages above the base price mimics a normal distribution. That means that most customers will buy at or around the mean priced trim. Additionally, because trims often follow interval pricing, the mean MSRP of most passenger vehicles tends to be close to the median. Therefore, for the standard vehicle, there is generally not much variation between the median trim price and the mean price of all trims. 

3.  **Useful statistics:** it was valuable to assess the number of trims for each model, the standard deviation, and the standard error in my approximation of MSRP. To further the analysis, I also included the Range in MSRP (price difference between top and base trim) as a percentage of mean MSRP, along with Error % in MSRP- which is the standard error divided by the mean MSRP. Since MSRP is the denominator in our RV equation, these metrics provide an idea of how accurate our retained value (RV) percentage is. They also serve as appropriate means of filtering out low-quality data and as a sanity check to statistical tests that were conducted on correlations and regressions.

The next critical variable is age. For this analysis, I will treat age as continuous in Age_months and Age_years. Like the above assumptions, I will assume that all vehicles of model year x are released in the Fall of the previous year (x-1). This will be done by creating the launch date variable and finding the difference from the sale date in months and years, to obtain the age of the vehicle at the time of sale. A deeper analysis will be placed on 2-year- to 6-year-old vehicle transactions. See the Appendix for more details about the variables.

#### Calculations:

```{r Cleaning Error, message=FALSE, warning=FALSE, paged.print=TRUE, results='hide'}
dim(carprices)

#Checking for NA values in AVG_MSRP and YMM
# carprices %>% filter(is.na(AVG_MSRP)) 
# carprices %>% filter(is.na(YMM)) 

#Replacing na records in Error%
carprices$`Error %` <- carprices$`Error %` %>% replace_na(0)
carprices <- carprices# %>% filter(`Error %` <7)

#Observing new NA records for Bodytype
na.body <- carprices %>%  filter(is.na(Body_Type))

```

```{r Age Calculation   , message=FALSE, warning=FALSE, echo=TRUE}
#Creating Launch Date Step 1
carprices$year <- as.numeric(as.character(carprices$year))

carprices<- carprices %>%  mutate(launch_month= "9", launch_day= "1", launch_year= (`year`-1))

#Creating Launch Date Step 2
carprices <- carprices %>% mutate(launch_date= make_date(year= `launch_year`, month = `launch_month`, day=`launch_day`))

#Converting saledate to a date variable
carprices <- carprices %>% separate(saledate, into= c("ddd", "sale_month", "sale_day", "sale_year", "sale_time", "GMT", "TMZ"), sep=" ")   

#Recoding the format of the sale month
carprices <- mutate(carprices, sale_month2= recode(sale_month, `Jan` ="1", `Feb`= "2", `Mar`="3", `Apr`="4", `May`="5", `Jun`="6", `Jul`="7", `Aug`="8", `Sep`="9", `Oct`="10", `Nov`="11", `Dec`="12" ))

#Creating Date_Sold
#carprices <- mutate(carprices, sale_year2=sale_year)
carprices <- mutate(carprices, sale_year2=sale_year, Date_Sold = make_date(year=sale_year2, month = sale_month2, day = sale_day))

# Tidyiing up: selecting important columns
carprices <- carprices %>% 
select(year, make, model, trim, transmission, Body_Type, state, condition, odometer, color, mmr, sellingprice, sale_year, launch_date, Date_Sold, AVG_MSRP, Trim_Count, `Error %`, `%MSRP_Range`, Segment ) 

#Age of vehicle
carprices <- mutate(carprices, Age_months= round(as.numeric(difftime(Date_Sold, launch_date, units= "days"))/30))
```

```{r Age Calculation & Tidying   , message=FALSE, warning=FALSE, echo=TRUE, results='hide'}
#Checking for negatives in Vehicle Age
carprices %>%  select(Age_months) %>%  filter(Age_months<0)

#Recoding Vehicle Age
carprices<-carprices %>% mutate(Age_months=recode(Age_months, `-8`=0))

#Creating Age in years
carprices<-carprices %>% mutate(Age_years=round(Age_months/12, 0))


```

Below is the calculation of Retained Value:

```{r Calculating Retained Value% }
#Retained value and removing irrational RV%
carprices <- carprices %>% 
mutate(RV_percent= round(`sellingprice` / `AVG_MSRP` *100, digits = 1))# %>% filter(RV_percent<125)
print(kable(head(carprices)))
#How does the distribution of RV% look?
#summary(carprices$RV_percent)


#Annual KMs
carprices <- carprices %>% 
  mutate(Annual_miles= round(odometer/Age_months * 12, digits=0))

#Removing Salvage title and damaged vehicles
carprices <- carprices %>% filter(condition>1.4999999)
```

# **Visualizations**

Before analyzing the impact of the independent variables on the retained value, I will first look at the relationship that each of the IVs have on each other using multiple dimensions.

```{r Vehicle groups}
#Establishing vehicle groups
#Error in Nissan records
mainstream <- c("Toyota", "Honda", "Hyundai", "Nissan", "Hyundai", "Kia", "Mazda", "Subaru", "Volkswagen")
domestic <- c("Buick", "Cadillac", "Chevrolet", "Chrysler", "Dodge", "Ford", "GMC", "Jeep", "Lincoln", "Ram")
luxury <- c("BMW", "Audi", "Mercedes-Benz", "Porsche", "Lexus", "INFINITI", "Acura", "Genesis", "Volvo", "Cadillac", "Lincoln")
passenger <-  c("Sedan", "SUV", "Coupe", "Convertible", "Wagon", "Hatchback", "Minivan")
pickup <- c("Pickup")
Japanese <-  c("Toyota", "Honda", "Nissan", "Mazda", "Subaru", "Scion", "Mitsubishi", "Suzuki")
Korean <- c("Kia", "Hyundai", "Genesis")
Pop.European <- c("Audi", "BMW", "Mercedes-Benz", "Jaguar", "Land Rover" )

comp.seg <- c(mainstream, domestic, luxury)
comp.makes <- c(Japanese, Korean, Pop.European)
```

#### Correlation: Mileage, Age, Body, and Condition

```{r Correlations, echo=TRUE}
#Correlation between mileage and age (in months and years)
MLG.AGE1 <- round(cor(carprices$odometer, carprices$Age_months, use= "pairwise.complete.obs")  , digits=3)
MLG.AGE2 <- round(cor(carprices$odometer, carprices$Age_years, use= "pairwise.complete.obs") , digits=3)

#Correlation between mileage and condition
MLG.CON1. <- round(cor(carprices$odometer, carprices$condition, use= "pairwise.complete.obs"), digits=3)

#Correlation between Age and Condition
AGE.CON1 <- round(cor(carprices$Age_months, carprices$condition, use= "pairwise.complete.obs"), digits=3)
AGE.CON2 <- round(cor(carprices$Age_years, carprices$condition, use= "pairwise.complete.obs"), digits=3)

#Correlation between RV_percent and mileage
RV.MLG <- round(var(carprices$RV_percent, carprices$odometer, use= "pairwise.complete.obs"), digits= 3)
#Correlation between RV_percent and condition
RV.CON <- round(cor(carprices$RV_percent, carprices$condition, use= "pairwise.complete.obs"), digits= 3) 
```

```{r  Graph 1.1-1.2 Correlations, echo=TRUE, message=FALSE, warning=FALSE}
plot1.1 <- carprices %>% 
  filter(make== c(`mainstream`, `luxury`, `domestic`), Body_Type== c(passenger, pickup), condition>0)
  
ggplot(data=plot1.1) + 
aes(y =condition , x = (odometer), color= Age_years) +
 geom_point(position = "jitter")+
  geom_smooth(color="red")+
  xlim(0,210000)+
  ylim(2,5)+
  #facet_wrap(~Body_Type)+
  labs(y=" Condition", x= "Mileage", color= "Age (Years)")+
  ggtitle("Graph 1.1: Relationship between Mileage, Condition and Age", subtitle = str_c("Correlation between Mileage and Condition ", " = ", MLG.AGE1))

plot1.2 <- plot1.1
ggplot(data=plot1.2) + 
aes(y =condition , x = (Age_years), color= odometer) +
 geom_point(position = "jitter")+
  geom_smooth(color="red")+
  ylim(2,5)+
  #facet_wrap(~Body_Type)+
  labs(y=" Condition", x= "Age", color= "Mileage")+
  ggtitle("Graph 1.2: Relationship between Mileage, Condition and Age", subtitle = str_c("Correlation between Age and Condition ", " = ", AGE.CON1))


```

Graphs 1.1 and 1.2 above show the inverse relationships across the condition, mileage, and age. While this relationship is very intuitive and almost trivial, a closer look will highlight that a vehicle's condition is more sensitive to mileage at lower ages. In other words, As vehicles age, the impact of mileage on condition decreases. This was illustrated in Graph 1.1 where the slope of the curve steadily decreased over time.

This may also be caused by the fact that older high-mileage vehicles tend not to be driven as much as younger low-mileage vehicles, due to cost and several other factors like reliability issues. An exception to this would be the rare case where older vehicles (or antiques) already in very good condition are intentionally driven less to preserve their condition. Therefore, aging but maintaining a higher condition rating. Fortunately, I am only analyzing models from 2006 and newer, so this "antique effect" is of no significance but is still worthy of mention for future research purposes.

```{r Graph 1.3-1.4 More Correlations, echo=TRUE, warning=FALSE}
plot1.3 <- carprices %>% 
  filter(make== c(`mainstream`, `luxury`, `domestic`), Body_Type== c(passenger, pickup))
  
ggplot(data=plot1.3) + 
aes(y =condition , x = (odometer), color= Age_years) +
 geom_point(position = "jitter")+
  geom_smooth(color="red")+
  xlim(0, 180000)+
  ylim(1,5)+
  facet_wrap(~Body_Type)+
  labs(y=" Condition", x= "Mileage", color= "Age (Years)")+
  ggtitle("Graph 1.3: Relationship between Mileage, Condition, Age and Body Type")  
          # ,subtitle = str_c("Correlation between Mileage and Condition ", " = ", MLG.AGE1))

plot1.4 <- plot1.3
ggplot(data=plot1.4) + 
aes(y =condition , x = Age_years, color=odometer) +
 geom_point(position = "jitter")+
  geom_smooth(color="red")+
  #xlim(0, 200000)+
  ylim(1,5)+
  facet_wrap(~Body_Type)+
  labs(y=" Condition", x= "Age", color= "Mileage")+
  ggtitle("Graph 1.4: Relationship between Mileage, Age, Body Type and Condition")  
          # ,subtitle = str_c("Correlation between Mileage and Condition ", " = ", MLG.AGE1))

```

Keeping in mind the previous takeaways, graphs 1.3 and 1.4 illustrate the correlations by body type.

Unsurprisingly, among all body types, pickups and SUVs are least sensitive to mileage and age. This may be because most pickups and many SUVs are used for a wide range of purposes. They are engineered and designed to be driven more than other body types.

Insufficient data points to make conclusions about wagons and convertibles.

```{r,  include=FALSE}
cp <- carprices
carprices <- carprices %>% filter(RV_percent < 125)
```

#### Correlation: Mileage, Age, Body, Condition and Retained Value

```{r RV_Mileage, echo=TRUE, warning=FALSE}
#Correlation between RV_percent and mileage
plot2.1 <- carprices %>% filter(make== c(`mainstream`, `luxury`), Body_Type==c(passenger, pickup), RV_percent >15) %>% arrange(desc(RV_percent)) 


#Correlation between RV_percent and mileage
RV.MLG <- round(var(plot2.1$RV_percent, plot2.1$odometer, use= "pairwise.complete.obs"), digits= 3)


ggplot(data = plot2.1) + aes(y = RV_percent, x = (odometer) , color= Age_years) +
geom_point(position = "jitter")+
geom_smooth(color="orange")+
labs(x=" Mileage", y= "Retained Value %", color= "Age (Years)")+
  ggtitle("Graph 2.1: Correlation between Retained Value and Mileage", subtitle = str_c("Correlation", " = ", RV.MLG))
```

```{r Graph 2.2 RV_condition, warning=FALSE, message=FALSE}
#Correlation between RV_percent and condition
plot2.2 <- carprices %>% 
  filter(Body_Type== c(passenger, pickup)) %>% 
group_by(Age_months)

RV.CON <- round(cor(plot2.2$RV_percent, plot2.2$condition, use= "pairwise.complete.obs"), digits= 3)

ggplot(data = plot2.2) + aes(x =condition, y = (RV_percent)) +
 geom_smooth()+
  labs(x=" Condition", y= "Retained Value %")+
  ggtitle("Graph 2.2: Correlation between Retained Value and Condition", subtitle = str_c("Correlation", " = ", RV.CON)) 
```

Graph 2.1 shows the inverse relationship and diminishing effect that mileage and age have on retained value. However, as seen in graph 2.2, on aggregate, retained value is more strongly correlated to the condition of the car than its mileage.

#### Auction Transactions by High-Volume States

This section aims to take a more comprehensive look at our data. I'll analyze the states with the most transactions to see if there are any unique patterns to observe.

```{r message=FALSE, warning=FALSE}
#Analyzing the number of transactions by state by the age 
vol.states<- carprices %>% 
  filter(Age_years < 7, Age_years >0) %>% 
  group_by(Age_years, state) %>% tally() %>% 
  arrange(desc(n)) %>% slice(1:5)

vol.states %>% 
  ggplot()+ aes(x=reorder(state, -n), y= n) +
  geom_bar(stat = "identity", fill= "dark blue") +
labs(x= "State", y="Number of Transactions") +
facet_wrap(vars(Age_years))+
ggtitle("Graph 3.1: Number of Transactions by State (Faceted by Age in Years)")
```

From Graph 3.1 above, we see that most vehicles are auctioned between ages 2 and 4 years. This corresponds to the popular lease terms of 24 to 48 months.

```{r Auction by state, echo=TRUE, warning=FALSE}

top.states <- vol.states %>% ungroup() %>%   group_by(state, Age_years) %>%  arrange(desc(n)) %>% slice(1:3)

plot3.2 <- carprices %>%  
filter(state==vol.states$state, Body_Type== c(passenger, pickup), RV_percent> 15, Annual_miles<25000 ) 

#Correlation between RV_percent and age
RV.AGE <- round(cor(plot3.2$RV_percent, plot3.2$Age_months, use= "pairwise.complete.obs"), digits= 3)

ggplot(data = plot3.2) + aes(y = RV_percent, x = (Age_months),  color= Annual_miles) +
 geom_point(position = "jitter")+
geom_smooth(color="green")+
  labs(x=" Age (Months)", y= "Retained Value %", color= "Mileage")+
  facet_wrap(~state)+
  xlim(0,90)+
  ggtitle("Graph 3.2: Retained Value vs Age in High Volume States", subtitle = str_c("Correlation", " = ", RV.AGE))

```

Graph 3.2 shows that generally, across the states with the most auction transactions, vehicles experience similar depreciation curves and correlations between age and mileage.

I filtered for the annual miles driven *AnnualMiles* to be less than 25,000/ year to filter out vehicles used for commercial purposes. This was done because commercial vehicles (eg. rental cars, taxis, fleet vehicles, etc. ) add another layer of complexity that is beyond the scope of this paper.

#### Comparative Analysis

This section will delve into more specific analyses with the aim of providing additional insights.

```{r by_segemnts, eval= TRUE, include=TRUE, message=FALSE, warning=FALSE}
#Analysis by segment
by_segment <- carprices %>% 
  filter(Age_years < 8, !is.na(Segment)) %>% #, make==c(mainstream, luxury)) %>% 
  group_by(Age_years, Segment, sale_year, Age_months) %>% 
  summarise(Segment_RV= round(mean(RV_percent), digits=1)) %>% 
  arrange(desc(Segment_RV)) %>%arrange((Age_years)) %>% arrange((sale_year)) %>% slice(1:5) 

ggplot(data = by_segment) + 
geom_smooth(mapping = aes(x = Age_months, y = Segment_RV, color = `Segment`, position= "jitter"), se= F)+
  xlim(10,75)+
  labs(x= "Age (Months)", y= "Retained Value %", fill= "Segment")+
  ggtitle("Graph 4.1 Retained Value % by Segment")
```

```{r Best Segments, message=FALSE, warning=FALSE, paged.print=TRUE}
#Finding the Segments with the highest RV
best_value_segments <- carprices %>%
  filter(Age_years> 1, Age_years <6, Body_Type== c(passenger, "Pickup"), `Error %` < 10, !is.na(Segment), Segment !="HD Full-Size Pickup")%>% 
  group_by(Segment,Age_years,  RV_percent, odometer, condition, model) %>% tally() %>% 
  summarise(RV_percentage = mean(RV_percent), odometer=mean(odometer),condition=mean(condition), .groups = "keep") %>% tally() %>% 
  summarise(Segment=Segment, Age_years= Age_years, RV_percent= mean(RV_percent), odometer=mean(odometer),condition=mean(condition),  n = sum(n), .groups = "keep") %>%
ungroup() %>% 
   group_by(Segment, Age_years) %>% 
  summarise(Segment=Segment, Age_years= Age_years, RV_percent= mean(RV_percent),Mileage=mean(odometer), Condition=mean(condition),  Sample_Size = sum(n), .groups = "keep") %>%
  slice(1:1) %>% 
  arrange(desc(RV_percent)) %>% arrange((Age_years))
     
print(kable(best_value_segments))

#Removing a few missing fields in Segments that were missing in Catalog
BVS <- best_value_segments %>%  filter(!is.na(Segment)) 


BVS %>% 
ggplot() + aes(x = Age_years, y =(RV_percent)) +
  geom_bar(stat= "identity", fill="darkred") +
  facet_wrap(~reorder(Segment, -RV_percent))+
    labs(x= "Age (Years)", y= "Retained Value %")+
  ggtitle("Graph 4.2: Retained Value % vs. Age Ranked by Segment")

```

The graphical representation of data in Graphs 4.1 and 4.2 suggests that there is a hierarchy that exists among body types. Pickups have generally retained the best value across all years followed by SUVs and cars (sedans, coupes, convertibles, etc).

## Bonus Section

#### How does RV% compare among volume models?

```{r Volume models,message=FALSE, warning=FALSE}
#Volume models

vol.models<- carprices %>% 
  filter( Age_years >0) %>% 
  group_by(model) %>% 
  tally() %>% 
  arrange(desc(n)) %>% 
  ungroup() %>% 
  slice(1:10)

vol.models.passenger <- carprices %>% 
  filter(Age_years >0, Body_Type== passenger) %>% 
  group_by(model) %>% 
  tally() %>% 
  arrange(desc(n)) %>% 
  ungroup() %>% 
  slice(1:10)

popular.models <-  vol.models.passenger$model
```

```{r 2. Volume models,message=FALSE, warning=FALSE}
vol.models.passenger %>% 
  ggplot()+ aes(x=reorder(model, -n), y= n) +
  geom_bar(stat = "identity", fill= "blue") +
  labs(x= "Model", y="Number of Transactions")+
  ggtitle("Graph 5.1: Most Auctioned Models")

```

From Graph 5.1, it is easy to assume that the Nissan Altima is a highly sought-after vehicle and may therefore hold its value well. However, that may not be entirely true. Graph 5.2 will explore this further.

**Do the most auctioned vehicles retain the best value ?**

```{r 1. Models with best RV, message=FALSE, warning=FALSE, paged.print=TRUE}
#Finding the models with the highest RV

BVM <- carprices %>% filter(model== vol.models.passenger$model) %>% 
  filter(Age_years> 1, Age_years <6, Body_Type== passenger) %>% 
  group_by(make, model,Age_years,  RV_percent, `Error %`, odometer, condition, Segment) %>% tally() %>% 
  summarise(RV_percentage = mean(RV_percent), `Error %`=mean(`Error %`, na.rm=TRUE),  odometer=mean(odometer), condition=mean(condition), .groups = "keep") %>% tally() %>% 
  summarise(make=make, model=model, Age_years= Age_years, RV_percent= mean(RV_percent), `Error %`=mean(`Error %`, na.rm=TRUE), odometer=mean(odometer),condition=mean(condition),  n = sum(n), .groups = "keep") %>% ungroup() %>% 
   group_by(make, model, Age_years) %>% 
  summarise(make=make, model=model, Age_years= Age_years, RV_percent= mean(RV_percent),`Error %`=mean(`Error %`, na.rm=TRUE), mileage=mean(odometer), condition=mean(condition),  n = sum(n), .groups = "keep") %>%
  slice(1:1) %>% arrange(desc(RV_percent)) %>%
   ungroup() %>% 
  #Filtering a sample size 20 or more
  filter(n>19) %>%
arrange((Age_years))
(kable(BVM))
```

Graph 5.2 shows that the most auctioned vehicles do not necessarily retain the best value. This may be due to an oversupply of these vehicles at auction, and with a higher supply, a lower price often follows.

Another reason could be low consumer sentiment toward these vehicles. After looking at the nature of the sellers, one can conclude that most of these vehicles are lease returns. Vehicles with high lease returns can imply a low consumer sentiment, therefore decreasing the long-term resale value.

```{r 2. Models with best RV, message=FALSE, warning=FALSE, paged.print=TRUE}
vol.p.models <- vol.models.passenger$model %>% as.factor()
vol.vs.best.passenger <- BVM %>% filter(model== vol.p.models)

BVM %>%
  ggplot()+ aes(x=reorder(model, -RV_percent), y= RV_percent) +
  geom_bar(stat = "identity",position = "dodge", fill= "dark orange" ) +
  labs(x= "Model", y="Mean Retained Value %")+
  ggtitle("Graph 5.2: Retained values of the Most Auctioned Models" , subtitle = "Mean Retained Value Percentage from Year 2 to 5")
```

#### Best mainstream cars to buy post-2015

```{r Table Models with  best RV, message=FALSE, warning=FALSE, paged.print=TRUE}

#Finding the models with the highest RV
best_value_models <- carprices %>%
  filter(Age_years> 1, Age_years <6, Body_Type== passenger, `Error %` < 5) %>% 
  group_by(make, model,Age_years,  RV_percent, `Error %`, odometer, condition, Segment) %>% tally() %>% 
  summarise(RV_percentage = mean(RV_percent), `Error %`=mean(`Error %`, na.rm=TRUE),  odometer=mean(odometer), condition=mean(condition), .groups = "keep") %>% tally() %>% 
  summarise(make=make, model=model, Age_years= Age_years, RV_percent= mean(RV_percent), `Error %`=mean(`Error %`, na.rm=TRUE), odometer=mean(odometer),condition=mean(condition),  n = sum(n), .groups = "keep") %>% ungroup() %>% 
   group_by(make, model, Age_years) %>% 
  summarise(make=make, model=model, Age_years= Age_years, RV_percent= mean(RV_percent),`Error %`=mean(`Error %`, na.rm=TRUE), mileage=mean(odometer), condition=mean(condition),  n = sum(n), .groups = "keep") %>%
  slice(1:1) %>% arrange(desc(RV_percent)) %>%
   ungroup() %>% 
  #Filtering a sample size 20 or more
  filter(n>19) %>%
arrange((Age_years))
print(kable(best_value_models, caption= "Passengers Cars with Best RV"))
```

While I won't be making any formal recommendations in this paper, the above table provides a guide on which vehicles can be expected to retain good value in the future. Please pay close attention to the error rating and the sample size.

# **Reflection**

This project, though very time-consuming, has truly been worth every moment spent working on it. Possessing a solid foundation in MS Excel (my daily tool for analysis at work), it took some getting used to, and at times felt counterproductive trying to accomplish the same things in R. Although I'm new to programming, I intentionally chose to do as challenging a project as I possibly could with the knowledge I've learned in this course. The data that I used is almost identical to the type of data I analyze at work and is an area of great interest for me- the used car market. Despite the intrigue, I was a bit overwhelmed with the amount of tidying that was needed to complete the analysis. In truth, more time was spent on cleaning the data than anything else.

An issue I had to contend with was that Canadian vehicle trims do not always have the same naming conventions as US trims. For example, a 2021 VW Jetta Highline in Canada may be listed as a 2021 VW Jetta SE/ SEL in the US. This, however, simplified the joining of the datasets because I was better able to match the YMM in both the auction data (carprices.csv) and the catalog now in USD (us_catalog).

However, after joining the catalog I had 50,342 unmatched records that would be later passed through an iteration map function using the vindecoder() package. Subsequently, I still had to contend with 20,826 (out of 50,342 records) missing unmatched keys/MSRP values. Observing the distribution of vehicles in unmatched_keys (see appendix E), I was able to identify which models had a low map rate in the VIN decoder. After about 12 lines of code, I was able to clean an additional 6,860 records, leaving the total unmatched at 13,965.

Instead of deleting them entirely, I saved them in a CSV file named 'unmatched_keys.csv'. As I progress through the DACSS program, I aim to practice more advanced ways to handle these situations. If time permits, my strategy to handle these untidy fields and unmatched_keys would be to use functions str_detect() and advanced mapping functions to clean the data as best as possible.

Despite 13,965 out of 457,588 records (3%) of records not being matched, I do not believe that it will have a significant impact on the results. The unmatched records were due to the unavailability in the Canadian market of some US models as well as naming inconsistencies across YMM fields. Due to the fact I am mainly conducting my analysis on high-volume vehicle segments and models, this will only have a marginal to negligible effect on the quality of results.

There are far more insights that can be drawn from this data. For example, comparing the average yearly mileage by state, comparing resale values for vehicles in colder states vs warmer states, as well as the popularity and distribution of various makes across the US. Those are just a few of the burning questions that I have not been able to address in this project. What is certain is that I will be continuing to work on this data with the hope to manipulate similar vehicle data for the Canadian wholesale marketplace for more work-related reasons.

# **Conclusion**

The data analyzed was able to substantially answer the research question. Indeed factors like body type, vehicle condition, mileage, and age affect the retained value of vehicles. Excluding age for trivial reasons, overall, the condition had the largest impact (positive) followed by mileage (negative) of all the variables on RV. However, some segments were more sensitive to this effect than others. For example, convertibles were slightly more sensitive to mileage than any other body type; a stark contrast when compared to pickups and SUVs.

Bearing in mind the key assumptions made when converting the MSRP from CAD to USD, I was quite surprised at how accurate the results were compared to actual public residual value data. It seems as though the pricing for Canadian trims coincides greatly with their US counterparts. Another observation was that the most auctioned/sold vehicles do not necessarily retain the best value. Despite, the higher error rating among various segments, I believe the estimation of RV% (based on dividing the auction price by mean MSRP), provides a solid understanding, once analyzed on aggregate and in context.

The data suggests that except for pickups, SUVs- subcompact, compact and midsize, hold better value than standard passenger car body types. There seems to also be a resistance to price as vehicles with higher MSRPs tend to retain less value over time. This was evident when looking at graphs 4.1 and 4.2, where a clear hierarchy among body types and segments is observed.

# **Bibliography**

#### *Data Sources:*

Kaggle: [Used Car Auction Prices \| Kaggle](https://www.kaggle.com/datasets/tunguz/used-car-auction-prices)

WSJ Markets: [USD to CAD \| Canadian Dollar Historical Prices - WSJ](https://www.wsj.com/market-data/quotes/fx/USDCAD/historical-prices)

#### *Methodology:*

*DACSS Faculty*, University of Massachusetts, Amherst

Grolemund, G., & Wickham, H. (2017). *R for Data Science*. O'Reilly Media.

The National Highway Traffic Safety Administration: [VIN Decoder \| NHTSA](https://www.nhtsa.gov/vin-decoder)

[Yan Holtz](https://www.yan-holtz.com/) [The R Graph Gallery](https://r-graph-gallery.com/)

#### *Supporting Tools & Publications:*

Auto Trader [Used Cars for Sale - Autotrader](https://www.autotrader.com/cars-for-sale)

Edmonds [Top 10 Best-Selling Vehicles for 2013 on Edmunds.com](https://www.edmunds.com/car-reviews/top-10/top-10-best-selling-vehicles-for-2013.html)

J.D. Power: [2018 Resale Awards and Ratings \| J.D. Power](https://www.jdpower.com/Cars/Ratings/Depreciation/2018)

Kelley Blue Book [Kelley Blue Book](https://www.kbb.com/)

VIN Decoder: [Decode This](https://www.decodethis.com/)

# **Appendix**

## Appendix A:

Original data set: car_prices.csv

```{r}
dfSummary(df)
```

## Appendix B:

Canadian vehicle catalog

```{r}
dfSummary(ca_catalog)
```

## Appendix C:

First iteration of unmatched records labelled as *false_keys*

```{r}
dfSummary(false_keys)
```

This dataset contains all the YMM keys from *carprices* that failed to map to the *us_catalog.*

## Appendix D:

All records cleaned through NHTSA's *vindecodr* package lablled as *cleaned_keys*

```{r}
dfSummary(cleaned_keys)
```

## Appendix E:

Remaining records from false_keys that failed to map after decoding VINs labelled as unmatched_keys.

```{r}
dfSummary(unmatched_keys)

```

## Appendix F:

Finalized and cleaned auction data *carprices.*

```{r}
dfSummary(carprices)
```

More about the NHTSA's vindecodr package:

Below is an example of the API call from the NHTSA's website: <https://vpic.nhtsa.dot.gov/api/Home/Index/LanguageExamples>

    require(RJSONIO)
    # User-defined functions ----------------------------------
    VehicleAPIrConnect <- function(VINin){
        # Lookup VIN information from https://vpic.nhtsa.dot.gov/api
        #
        # Args:
        #   VINin: VIN
        #
        # Returns:
        #   Data frame with vehicle information.
        #
        # For Testing:
        # VINin <- "5UXWX7C5*BA"
        tempCall <- paste0("https://vpic.nhtsa.dot.gov/api/vehicles/DecodeVinValues/", VINin, "?format=json")
        tempExtract <- fromJSON(tempCall)
        dfOut <- data.frame(t(unlist(tempExtract$Results)),stringsAsFactors=FALSE)
        dfOut
    }
    VehicleAPIrConnect("5UXWX7C5*BA")