Ayushe’s HW5 Submission for DACSS 601
Loading the clean loan data which was cleaned in the previous HW assignment.
data_clean <- read_csv("/Users/ayushe/RStudio stuff/RData/clean_loan_data.csv")
Interest Rate VS Grade on term of loan
ggplot(data_clean, aes(x = interest_rate, y = grade , fill = term)) +
geom_boxplot() +
labs(x = 'Interest Rate' , y = 'Grade') +
facet_wrap(~ term)
We find that as the value of Grade increases from A to G, the interest rate also increases linearly. Therefore, a linear relationship between the Grade and Interest Rate is seen, which is applicable for both terms of 36 months and 60 months. Which is in compliance with the fact that Higher-grade loans (i.e. A,B,C) indicate better credit and lower risk while lower grade loans (i.e. E, F, G) indicate the opposite, and thus lower interest rate for Higher-grade loans and higher interest rates for lower grade loans.
Loan Amount VS Grade highlighting Sub Grade on term of loan
ggplot(data_clean, aes(x = grade , y = loan_amount , fill = sub_grade)) +
geom_boxplot() +
labs(y = 'Loan Amount' , x = 'Grade') +
facet_wrap(~ term)
Here we do not observe a linear relationship between Grade and Loan Amount, rather higher loan amounts are observed for 60 month term than the 30 month term for all the Grades (A-G). We also observe that Grade A in the 60 month term has the highest Loan Amount, and that Grade G does not have a significant Loan Amount for 36 month term.
State VS Percentage of Loans
data_clean %>%
group_by(state) %>%
summarise(CountLoanPurpose = n() ) %>%
mutate(percentage = (CountLoanPurpose/sum(CountLoanPurpose) ) *100 ) %>%
mutate(state = reorder(state, percentage)) %>%
arrange(desc(percentage)) %>%
filter(percentage > 1) %>%
ggplot(aes(x = state, y = percentage, fill=state)) +
geom_bar(stat='identity', colour="white") +
geom_text(aes(x = state, y = 1, label = paste0(round(percentage,2),sep="")),
hjust=0, vjust=.5, size = 4, colour = 'black',
fontface = 'bold') +
labs(x = 'State Name', y = 'Percentage of Loans', title = 'States and Loans') +
coord_flip()
Here we observe (in decreasing order) the state-wise trends in the form of loan percentage for loan percentage greated than 1%. We find that people in California takes the majority of loans with 13.39%.
Purpose VS Loan Amount on Type of Application
ggplot(data_clean, aes(y = loan_purpose, x = loan_amount, fill = application_type)) +
geom_boxplot() +
labs(x = 'Loan Amount' , y = 'Purpose')
The relationship between the purpose and loan amount is observed here, and we can also see the type of application for the loan here. It is seen that a joint application type corresponds to higher loan amount, and this relationship is extremely visible for purposes like Small Businesses and House.
At this point, we have a clear picture of the relationship between Rate of Interest and the Grade of Loan on two different terms of 36 and 60 month period, and also how the loan amounts vary with the grades of loan on the sub-grades. Plot depicting the state VS the Loan amount is informative, as it tells about the residents of which state borrow the highest and lowest amounts of loans. We also answered the question about for what purpose are the people taking highest and lowest amounts of loans by plotting the relationship between Purpose and Loan Amount on Type of Application, which additionally tells us about the type of loan application for these purposes.
I feel that a plot between the total annual income and the loan amount will add to the analysis as it depicts the borrowing capacity of the borrower based on their annual income. A plot between the Purpose and Loan Amount on Type of income source will also enhance the analysis as it gives important insight as well.
I strongly feel that these plots are easier to understand and portray a clear picture delienating the relationships between different variables, and would be easily understood by a naive user.
The dataset can not be used to answer questions on the monthly payments made by the borrower and if the borrower was late or not for these payments.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Gangal (2022, April 27). Data Analytics and Computational Social Science: HW 5 DACSS 601. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomayushe17893423/
BibTeX citation
@misc{gangal2022hw, author = {Gangal, Ayushe}, title = {Data Analytics and Computational Social Science: HW 5 DACSS 601}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomayushe17893423/}, year = {2022} }