Ayushe’s HW6 Submission for DACSS 601
The dataset is loaded using the read_csv(). It is found that the dataset has 10000 rows and 55 columns, that is, 55 features. Given below is a glimpse of the dataset. The dataset has two types of values in it, namely, character type and double class.
data <- read_csv("/Users/ayushe/RStudio stuff/RData/loans_full_schema.csv")
A more detailed description of the dataset is given below:
glimpse(data)
Rows: 10,000
Columns: 55
$ emp_title <chr> "global config engineer", "…
$ emp_length <dbl> 3, 10, 3, 1, 10, NA, 10, 10…
$ state <chr> "NJ", "HI", "WI", "PA", "CA…
$ homeownership <chr> "MORTGAGE", "RENT", "RENT",…
$ annual_income <dbl> 90000, 40000, 40000, 30000,…
$ verified_income <chr> "Verified", "Not Verified",…
$ debt_to_income <dbl> 18.01, 5.04, 21.15, 10.16, …
$ annual_income_joint <dbl> NA, NA, NA, NA, 57000, NA, …
$ verification_income_joint <chr> NA, NA, NA, NA, "Verified",…
$ debt_to_income_joint <dbl> NA, NA, NA, NA, 37.66, NA, …
$ delinq_2y <dbl> 0, 0, 0, 0, 0, 1, 0, 1, 1, …
$ months_since_last_delinq <dbl> 38, NA, 28, NA, NA, 3, NA, …
$ earliest_credit_line <dbl> 2001, 1996, 2006, 2007, 200…
$ inquiries_last_12m <dbl> 6, 1, 4, 0, 7, 6, 1, 1, 3, …
$ total_credit_lines <dbl> 28, 30, 31, 4, 22, 32, 12, …
$ open_credit_lines <dbl> 10, 14, 10, 4, 16, 12, 10, …
$ total_credit_limit <dbl> 70795, 28800, 24193, 25400,…
$ total_credit_utilized <dbl> 38767, 4321, 16000, 4997, 5…
$ num_collections_last_12m <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ num_historical_failed_to_pay <dbl> 0, 1, 0, 1, 0, 0, 0, 0, 0, …
$ months_since_90d_late <dbl> 38, NA, 28, NA, NA, 60, NA,…
$ current_accounts_delinq <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_collection_amount_ever <dbl> 1250, 0, 432, 0, 0, 0, 0, 0…
$ current_installment_accounts <dbl> 2, 0, 1, 1, 1, 0, 2, 2, 6, …
$ accounts_opened_24m <dbl> 5, 11, 13, 1, 6, 2, 1, 4, 1…
$ months_since_last_credit_inquiry <dbl> 5, 8, 7, 15, 4, 5, 9, 7, 4,…
$ num_satisfactory_accounts <dbl> 10, 14, 10, 4, 16, 12, 10, …
$ num_accounts_120d_past_due <dbl> 0, 0, 0, 0, 0, 0, 0, NA, 0,…
$ num_accounts_30d_past_due <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ num_active_debit_accounts <dbl> 2, 3, 3, 2, 10, 1, 3, 5, 11…
$ total_debit_limit <dbl> 11100, 16500, 4300, 19400, …
$ num_total_cc_accounts <dbl> 14, 24, 14, 3, 20, 27, 8, 1…
$ num_open_cc_accounts <dbl> 8, 14, 8, 3, 15, 12, 7, 12,…
$ num_cc_carrying_balance <dbl> 6, 4, 6, 2, 13, 5, 6, 10, 1…
$ num_mort_accounts <dbl> 1, 0, 0, 0, 0, 3, 2, 7, 2, …
$ account_never_delinq_percent <dbl> 92.9, 100.0, 93.5, 100.0, 1…
$ tax_liens <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, …
$ public_record_bankrupt <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, …
$ loan_purpose <chr> "moving", "debt_consolidati…
$ application_type <chr> "individual", "individual",…
$ loan_amount <dbl> 28000, 5000, 2000, 21600, 2…
$ term <dbl> 60, 36, 36, 36, 36, 36, 60,…
$ interest_rate <dbl> 14.07, 12.61, 17.09, 6.72, …
$ installment <dbl> 652.53, 167.54, 71.40, 664.…
$ grade <chr> "C", "C", "D", "A", "C", "A…
$ sub_grade <chr> "C3", "C1", "D1", "A3", "C3…
$ issue_month <chr> "Mar-2018", "Feb-2018", "Fe…
$ loan_status <chr> "Current", "Current", "Curr…
$ initial_listing_status <chr> "whole", "whole", "fraction…
$ disbursement_method <chr> "Cash", "Cash", "Cash", "Ca…
$ balance <dbl> 27015.86, 4651.37, 1824.63,…
$ paid_total <dbl> 1999.330, 499.120, 281.800,…
$ paid_principal <dbl> 984.14, 348.63, 175.37, 274…
$ paid_interest <dbl> 1015.19, 150.49, 106.43, 56…
$ paid_late_fees <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, …
Checking for Missing Values
emp_title emp_length
833 817
state homeownership
0 0
annual_income verified_income
0 0
debt_to_income annual_income_joint
24 8505
verification_income_joint debt_to_income_joint
8545 8505
delinq_2y months_since_last_delinq
0 5658
earliest_credit_line inquiries_last_12m
0 0
total_credit_lines open_credit_lines
0 0
total_credit_limit total_credit_utilized
0 0
num_collections_last_12m num_historical_failed_to_pay
0 0
months_since_90d_late current_accounts_delinq
7715 0
total_collection_amount_ever current_installment_accounts
0 0
accounts_opened_24m months_since_last_credit_inquiry
0 1271
num_satisfactory_accounts num_accounts_120d_past_due
0 318
num_accounts_30d_past_due num_active_debit_accounts
0 0
total_debit_limit num_total_cc_accounts
0 0
num_open_cc_accounts num_cc_carrying_balance
0 0
num_mort_accounts account_never_delinq_percent
0 0
tax_liens public_record_bankrupt
0 0
loan_purpose application_type
0 0
loan_amount term
0 0
interest_rate installment
0 0
grade sub_grade
0 0
issue_month loan_status
0 0
initial_listing_status disbursement_method
0 0
balance paid_total
0 0
paid_principal paid_interest
0 0
paid_late_fees
0
We find that the column emp_title has 833 missing values, emp_length has 817 missing values, debt_to_income, annual_income_joint, verification_income_joint have 24, 8505 and 8545 missing values respectively, and so on. We first drop all the rows with missing values in the emp_title.
emp_title emp_length
0 0
state homeownership
0 0
annual_income verified_income
0 0
debt_to_income annual_income_joint
1 7875
verification_income_joint debt_to_income_joint
7909 7875
delinq_2y months_since_last_delinq
0 5183
earliest_credit_line inquiries_last_12m
0 0
total_credit_lines open_credit_lines
0 0
total_credit_limit total_credit_utilized
0 0
num_collections_last_12m num_historical_failed_to_pay
0 0
months_since_90d_late current_accounts_delinq
7090 0
total_collection_amount_ever current_installment_accounts
0 0
accounts_opened_24m months_since_last_credit_inquiry
0 1122
num_satisfactory_accounts num_accounts_120d_past_due
0 295
num_accounts_30d_past_due num_active_debit_accounts
0 0
total_debit_limit num_total_cc_accounts
0 0
num_open_cc_accounts num_cc_carrying_balance
0 0
num_mort_accounts account_never_delinq_percent
0 0
tax_liens public_record_bankrupt
0 0
loan_purpose application_type
0 0
loan_amount term
0 0
interest_rate installment
0 0
grade sub_grade
0 0
issue_month loan_status
0 0
initial_listing_status disbursement_method
0 0
balance paid_total
0 0
paid_principal paid_interest
0 0
paid_late_fees
0
We also change the column name of “emp_title” as “Job” and “emp_length” as “employment_len”
We replace the missing values in the rest of the columns with character class type as “Not Applicable” and for double class type as -1
data <- replace_na(data, list(debt_to_income_joint = -1))
data <- replace_na(data, list(annual_income_joint = -1))
data <- replace_na(data, list(verification_income_joint = "Not Applicable"))
data <- replace_na(data, list(months_since_last_delinq = -1))
data <- replace_na(data, list(months_since_90d_late = -1))
data <- replace_na(data, list(months_since_last_credit_inquiry = -1))
data <- replace_na(data, list(num_accounts_120d_past_due = -1))
We now take a look at the dataset, which has zero missing values
job employment_len
0 0
state homeownership
0 0
annual_income verified_income
0 0
debt_to_income annual_income_joint
0 0
verification_income_joint debt_to_income_joint
0 0
delinq_2y months_since_last_delinq
0 0
earliest_credit_line inquiries_last_12m
0 0
total_credit_lines open_credit_lines
0 0
total_credit_limit total_credit_utilized
0 0
num_collections_last_12m num_historical_failed_to_pay
0 0
months_since_90d_late current_accounts_delinq
0 0
total_collection_amount_ever current_installment_accounts
0 0
accounts_opened_24m months_since_last_credit_inquiry
0 0
num_satisfactory_accounts num_accounts_120d_past_due
0 0
num_accounts_30d_past_due num_active_debit_accounts
0 0
total_debit_limit num_total_cc_accounts
0 0
num_open_cc_accounts num_cc_carrying_balance
0 0
num_mort_accounts account_never_delinq_percent
0 0
tax_liens public_record_bankrupt
0 0
loan_purpose application_type
0 0
loan_amount term
0 0
interest_rate installment
0 0
grade sub_grade
0 0
issue_month loan_status
0 0
initial_listing_status disbursement_method
0 0
balance paid_total
0 0
paid_principal paid_interest
0 0
paid_late_fees
0
head(data)
# A tibble: 6 × 55
job employment_len state homeownership annual_income
<chr> <dbl> <chr> <chr> <dbl>
1 global config engi… 3 NJ MORTGAGE 90000
2 warehouse office c… 10 HI RENT 40000
3 assembly 3 WI RENT 40000
4 customer service 1 PA RENT 30000
5 security supervisor 10 CA RENT 35000
6 hr 10 MI MORTGAGE 35000
# … with 50 more variables: verified_income <chr>,
# debt_to_income <dbl>, annual_income_joint <dbl>,
# verification_income_joint <chr>, debt_to_income_joint <dbl>,
# delinq_2y <dbl>, months_since_last_delinq <dbl>,
# earliest_credit_line <dbl>, inquiries_last_12m <dbl>,
# total_credit_lines <dbl>, open_credit_lines <dbl>,
# total_credit_limit <dbl>, total_credit_utilized <dbl>, …
Data has been cleaned successfully and can now be used for analysis and visualization.
Creating a dataframe with only numeric data type values to perform numerical analysis
data_num <- select_if(data, is.numeric)
data_num
# A tibble: 9,166 × 42
employment_len annual_income debt_to_income annual_income_joint
<dbl> <dbl> <dbl> <dbl>
1 3 90000 18.0 -1
2 10 40000 5.04 -1
3 3 40000 21.2 -1
4 1 30000 10.2 -1
5 10 35000 58.0 57000
6 10 35000 23.7 155000
7 10 110000 16.2 -1
8 10 65000 36.5 -1
9 3 30000 18.9 -1
10 10 75000 10.4 -1
# … with 9,156 more rows, and 38 more variables:
# debt_to_income_joint <dbl>, delinq_2y <dbl>,
# months_since_last_delinq <dbl>, earliest_credit_line <dbl>,
# inquiries_last_12m <dbl>, total_credit_lines <dbl>,
# open_credit_lines <dbl>, total_credit_limit <dbl>,
# total_credit_utilized <dbl>, num_collections_last_12m <dbl>,
# num_historical_failed_to_pay <dbl>, …
Mean of all the numeric (42) columns
summarize_all(data_num, list(mean=mean), na.rm=TRUE)
# A tibble: 1 × 42
employment_len_m… annual_income_m… debt_to_income_… annual_income_j…
<dbl> <dbl> <dbl> <dbl>
1 5.93 82193. 19.0 18634.
# … with 38 more variables: debt_to_income_joint_mean <dbl>,
# delinq_2y_mean <dbl>, months_since_last_delinq_mean <dbl>,
# earliest_credit_line_mean <dbl>, inquiries_last_12m_mean <dbl>,
# total_credit_lines_mean <dbl>, open_credit_lines_mean <dbl>,
# total_credit_limit_mean <dbl>, total_credit_utilized_mean <dbl>,
# num_collections_last_12m_mean <dbl>,
# num_historical_failed_to_pay_mean <dbl>, …
Median of all the numeric (42) columns
summarize_all(data_num, list(median=median), na.rm=TRUE)
# A tibble: 1 × 42
employment_len_m… annual_income_m… debt_to_income_… annual_income_j…
<dbl> <dbl> <dbl> <dbl>
1 6 68000 17.4 -1
# … with 38 more variables: debt_to_income_joint_median <dbl>,
# delinq_2y_median <dbl>, months_since_last_delinq_median <dbl>,
# earliest_credit_line_median <dbl>,
# inquiries_last_12m_median <dbl>, total_credit_lines_median <dbl>,
# open_credit_lines_median <dbl>, total_credit_limit_median <dbl>,
# total_credit_utilized_median <dbl>,
# num_collections_last_12m_median <dbl>, …
Standard deviation of all the numeric (42) columns
summarize_all(data_num, list(sd=sd), na.rm=TRUE)
# A tibble: 1 × 42
employment_len_sd annual_income_sd debt_to_income_… annual_income_j…
<dbl> <dbl> <dbl> <dbl>
1 3.70 65934. 14.2 53252.
# … with 38 more variables: debt_to_income_joint_sd <dbl>,
# delinq_2y_sd <dbl>, months_since_last_delinq_sd <dbl>,
# earliest_credit_line_sd <dbl>, inquiries_last_12m_sd <dbl>,
# total_credit_lines_sd <dbl>, open_credit_lines_sd <dbl>,
# total_credit_limit_sd <dbl>, total_credit_utilized_sd <dbl>,
# num_collections_last_12m_sd <dbl>,
# num_historical_failed_to_pay_sd <dbl>, …
Inter quartile range of all the numeric (42) columns
summarize_all(data_num, list(IQR=IQR), na.rm=TRUE)
# A tibble: 1 × 42
employment_len_I… annual_income_I… debt_to_income_… annual_income_j…
<dbl> <dbl> <dbl> <dbl>
1 8 50219. 13.7 0
# … with 38 more variables: debt_to_income_joint_IQR <dbl>,
# delinq_2y_IQR <dbl>, months_since_last_delinq_IQR <dbl>,
# earliest_credit_line_IQR <dbl>, inquiries_last_12m_IQR <dbl>,
# total_credit_lines_IQR <dbl>, open_credit_lines_IQR <dbl>,
# total_credit_limit_IQR <dbl>, total_credit_utilized_IQR <dbl>,
# num_collections_last_12m_IQR <dbl>,
# num_historical_failed_to_pay_IQR <dbl>, …
Minimum of all the numeric (42) columns
summarize_all(data_num, list(min=min), na.rm=TRUE)
# A tibble: 1 × 42
employment_len_m… annual_income_m… debt_to_income_… annual_income_j…
<dbl> <dbl> <dbl> <dbl>
1 0 3000 0 -1
# … with 38 more variables: debt_to_income_joint_min <dbl>,
# delinq_2y_min <dbl>, months_since_last_delinq_min <dbl>,
# earliest_credit_line_min <dbl>, inquiries_last_12m_min <dbl>,
# total_credit_lines_min <dbl>, open_credit_lines_min <dbl>,
# total_credit_limit_min <dbl>, total_credit_utilized_min <dbl>,
# num_collections_last_12m_min <dbl>,
# num_historical_failed_to_pay_min <dbl>, …
Maximum of all the numeric (42) columns
summarize_all(data_num, list(max=max), na.rm=TRUE)
# A tibble: 1 × 42
employment_len_m… annual_income_m… debt_to_income_… annual_income_j…
<dbl> <dbl> <dbl> <dbl>
1 10 2300000 469. 1100000
# … with 38 more variables: debt_to_income_joint_max <dbl>,
# delinq_2y_max <dbl>, months_since_last_delinq_max <dbl>,
# earliest_credit_line_max <dbl>, inquiries_last_12m_max <dbl>,
# total_credit_lines_max <dbl>, open_credit_lines_max <dbl>,
# total_credit_limit_max <dbl>, total_credit_utilized_max <dbl>,
# num_collections_last_12m_max <dbl>,
# num_historical_failed_to_pay_max <dbl>, …
Mean of annual income grouped by term and also counting the number of values which fall in each term
# A tibble: 2 × 3
term mean_annual_income n
<dbl> <dbl> <int>
1 36 80463. 6339
2 60 86070. 2827
Mean of loan amount grouped by term and also counting the number of values which fall in each term
# A tibble: 2 × 3
term mean_aloan_amount n
<dbl> <dbl> <int>
1 36 14037. 6339
2 60 22470. 2827
Minimum, maximum and mean of annual income grouped by the type of job of the borrower and also counting the number of values which fall in each job
data %>%
group_by(job) %>%
summarise(min_annual_income = min(annual_income),
max_annual_income = max(annual_income),
mean_annual_income = mean(annual_income),
n = n())
# A tibble: 4,411 × 5
job min_annual_inco… max_annual_inco… mean_annual_inc… n
<chr> <dbl> <dbl> <dbl> <int>
1 1sg 78000 100000 89000 2
2 1st grade… 45000 45000 45000 1
3 21 dealer 44000 44000 44000 1
4 2nd shift… 55000 55000 55000 1
5 360app sp… 37000 37000 37000 1
6 3rd shift… 88000 88000 88000 1
7 3rd shift… 54000 54000 54000 1
8 4th person 30000 30000 30000 1
9 5th year … 88400 88400 88400 1
10 6th grade… 55000 55000 55000 1
# … with 4,401 more rows
Minimum, maximum and mean of annual income grouped by the type of home ownership of the borrower and also counting the number of values which fall in each home ownership
data %>%
group_by(homeownership) %>%
summarise(min_annual_income = min(annual_income),
max_annual_income = max(annual_income),
mean_annual_income = mean(annual_income),
n = n())
# A tibble: 3 × 5
homeownership min_annual_income max_annual_income mean_annual_income
<chr> <dbl> <dbl> <dbl>
1 MORTGAGE 3300 2300000 93753.
2 OWN 9996 740000 76683.
3 RENT 3000 1200000 69540.
# … with 1 more variable: n <int>
Minimum, maximum and mean of loan amount grouped by the type of home ownership of the borrower and also counting the number of values which fall in each home ownership
data %>%
group_by(homeownership) %>%
summarise(min_loan_amount = min(loan_amount),
max_loan_amount = max(loan_amount),
mean_loan_amount = mean(loan_amount),
n = n())
# A tibble: 3 × 5
homeownership min_loan_amount max_loan_amount mean_loan_amount n
<chr> <dbl> <dbl> <dbl> <int>
1 MORTGAGE 1000 40000 18367. 4458
2 OWN 1000 40000 16290. 1125
3 RENT 1000 40000 14596. 3583
Interest rate, the loan amount and the mean of annual income grouped by the type of grade of the loan and also counting the number of values which fall in each grade
data %>%
group_by(grade) %>%
summarise( interest_rate = mean(interest_rate),
loan_amount = mean(loan_amount),
mean_annual_income = mean(annual_income),
n = n())
# A tibble: 7 × 5
grade interest_rate loan_amount mean_annual_income n
<chr> <dbl> <dbl> <dbl> <int>
1 A 6.73 15617. 92063. 2263
2 B 10.5 16510. 83528. 2788
3 C 14.2 17116. 76904. 2434
4 D 19.1 16962. 75087. 1310
5 E 25.1 18844. 70736. 304
6 F 29.4 22153. 77298. 56
7 G 30.8 25923. 71066. 11
Visualization #1: Interest Rate VS Grade on term of loan
ggplot(data, aes(x = interest_rate, y = grade , fill = term)) +
geom_boxplot() +
labs(x = 'Interest Rate' , y = 'Grade') +
facet_wrap(~ term)
We find that as the value of Grade increases from A to G, the interest rate also increases linearly. Therefore, a linear relationship between the Grade and Interest Rate is seen, which is applicable for both terms of 36 months and 60 months. Which is in compliance with the fact that Higher-grade loans (i.e. A,B,C) indicate better credit and lower risk while lower grade loans (i.e. E, F, G) indicate the opposite, and thus lower interest rate for Higher-grade loans and higher interest rates for lower grade loans.
Visualization #2: Purpose of Loan VS Loan Amount highlighting the status of Loan on term of loan
ggplot(data, aes(x = loan_amount , y = loan_purpose , fill = loan_status)) +
geom_boxplot() +
labs(y = ' Purpose of Loan' , x = 'Loan_amount') +
facet_wrap(~ term)
We find that for term of 36 months, the highest loan amounts correspond to the purposes like Renewable Energy, Moving, Major Purchases, Debt Consolidation, Home Improvement, Car and Credit Cards, all of whose loan status is currently going on. We also found that people take higher loans for all purposes for the 60 month term, and there are less number of late loan statuses for 60 month term, which shows that there are lesser instances where people are not able to pay back their loan on time for a longer loan period.
Visualization #3 - Loan Amount VS Grade highlighting Sub Grade on term of loan
ggplot(data, aes(x = grade , y = loan_amount , fill = sub_grade)) +
geom_boxplot() +
labs(y = 'Loan Amount' , x = 'Grade') +
facet_wrap(~ term)
Here we do not observe a linear relationship between Grade and Loan Amount, rather higher loan amounts are observed for 60 month term than the 30 month term for all the Grades (A-G). We also observe that Grade A in the 60 month term has the highest Loan Amount, and that Grade G does not have a significant Loan Amount for 36 month term.
Visualization #4- State VS Percentage of Loans
data %>%
group_by(state) %>%
summarise(CountLoanPurpose = n() ) %>%
mutate(percentage = (CountLoanPurpose/sum(CountLoanPurpose) ) *100 ) %>%
mutate(state = reorder(state, percentage)) %>%
arrange(desc(percentage)) %>%
filter(percentage > 1) %>%
ggplot(aes(x = state, y = percentage, fill=state)) +
geom_bar(stat='identity', colour="white") +
geom_text(aes(x = state, y = 1, label = paste0(round(percentage,2),sep="")),
hjust=0, vjust=.5, size = 4, colour = 'black',
fontface = 'bold') +
labs(x = 'State Name', y = 'Percentage of Loans', title = 'States and Loans') +
coord_flip()
Here we observe (in decreasing order) the state-wise trends in the form of loan percentage for loan percentage greated than 1%. We find that people in California takes the majority of loans with 13.39%.
Visualization #5- Purpose VS Loan Amount on Type of Application
ggplot(data, aes(y = loan_purpose, x = loan_amount, fill = application_type)) +
geom_boxplot() +
labs(x = 'Loan Amount' , y = 'Purpose')
The relationship between the purpose and loan amount is observed here, and we can also see the type of application for the loan here. It is seen that a joint application type corresponds to higher loan amount, and this relationship is extremely visible for purposes like Small Businesses and House.
Visualization #6 - Loan Amount VS Annual Income highlighting Rate of Interest on Term of Loan
ggplot(data, aes(x = annual_income , y = loan_amount , color = interest_rate)) +
geom_point(alpha = 0.5 , size = 1.5) +
geom_smooth(se = F , color = 'darkred' , method = 'loess') +
xlim(c(0 , 300000)) +
labs(x = 'Annual Income' , y = 'Loan Ammount' , color = 'Interest Rate') +
facet_wrap(~ term)
Here we observe the trends between Loan Amount, Annual Income and Rate of Interest over 36 month and 60 month terms. It can be clearly seen that people prefer to take loans of higher amounts for a 60 term period for lesser Rate of Interests.
Visualization #7 - Purpose VS Loan Amount on Type of Incomeggplot(data, aes(x = loan_amount , y = loan_purpose, fill = verified_income)) +
geom_boxplot() +
labs(y = ' Purpose of Loan' , x = 'Loan_amount')
The relationship between the purpose and loan amount is observed here, and we can also see the type of income of the borrower. It is seen that a verified income type corresponds to higher loan amount, and this relationship is extremely visible for purposes like House, major purchases and Small Businesses.
I aim to present a strong write-up for the final project, with all the sections properly formatted and complete.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Gangal (2022, April 27). Data Analytics and Computational Social Science: HW 6 DACSS 601. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomayushe17893462/
BibTeX citation
@misc{gangal2022hw, author = {Gangal, Ayushe}, title = {Data Analytics and Computational Social Science: HW 6 DACSS 601}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomayushe17893462/}, year = {2022} }