Code
library(tidyverse)
library(ggplot2)
library(stats)
::opts_chunk$set(echo = TRUE) knitr
Mani Shanker Kamarapu
October 11, 2022
Churning refers to a customer who leaves one company to go to another company. Customer churn introduces not only some loss in income but also other negative effects on the operation of companies. Churn management is the concept of identifying those customers who are intending to move their custom to a competing service provider.
Risselada et al. (2010) stated that churn management is becoming part of customer relationship management. It is important for companies to consider it as they try to establish long-term relationships with customers and maximize the value of their customer base.
This project will be useful to better understand more about the customer difficulties and factors and also give us a pretty good idea on the factors effecting the customers to exit and also about the dormant state of the customers.
Customer churn analysis has become a major concern in almost every industry that offers products and services. The model developed will help banks identify clients who are likely to be churners and develop appropriate marketing actions to retain their valuable clients. And this model also supports information about similar customer group to consider which marketing reactions are to be provided. Thus, due to existing customers are retained, it will provide banks with increased profits and revenues.
Given the above, we can frame our hypotheses as follows:
Rows: 10000 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Surname, Geography, Gender
dbl (11): RowNumber, CustomerId, CreditScore, Age, Tenure, Balance, NumOfPro...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
This data set includes 10k bank customer data records with 14 attributes including socio-demographic attributes, account level and behavioural attributes.
Attribute Description 1. Row Number- Number of customers 2. Customer ID- ID of customer 3. Surname- Customer name 4. Credit Score- Score of credit card usage 5. Geography- Location of customer 6. Gender- Customer gender 7. Age- Age of Customer 8. Tenure- The period of having the account in months 9. Balance- Customer main balance 10. NumOfProducts- No of products used by customer 11. HasCrCard- If the customer has a credit card or not 12. IsActiveMember- Customer account is active or not 13. Estimated Salary- Estimated salary of the customer. 14. Exited- Indicate churned or not
spc_tbl_ [10,000 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ RowNumber : num [1:10000] 1 2 3 4 5 6 7 8 9 10 ...
$ CustomerId : num [1:10000] 15634602 15647311 15619304 15701354 15737888 ...
$ Surname : chr [1:10000] "Hargrave" "Hill" "Onio" "Boni" ...
$ CreditScore : num [1:10000] 619 608 502 699 850 645 822 376 501 684 ...
$ Geography : chr [1:10000] "France" "Spain" "France" "France" ...
$ Gender : chr [1:10000] "Female" "Female" "Female" "Female" ...
$ Age : num [1:10000] 42 41 42 39 43 44 50 29 44 27 ...
$ Tenure : num [1:10000] 2 1 8 1 2 8 7 4 4 2 ...
$ Balance : num [1:10000] 0 83808 159661 0 125511 ...
$ NumOfProducts : num [1:10000] 1 1 3 2 1 2 2 4 2 1 ...
$ HasCrCard : num [1:10000] 1 0 1 0 1 1 1 1 0 1 ...
$ IsActiveMember : num [1:10000] 1 1 0 0 1 0 1 0 1 1 ...
$ EstimatedSalary: num [1:10000] 101349 112543 113932 93827 79084 ...
$ Exited : num [1:10000] 1 0 1 0 0 1 0 1 0 0 ...
- attr(*, "spec")=
.. cols(
.. RowNumber = col_double(),
.. CustomerId = col_double(),
.. Surname = col_character(),
.. CreditScore = col_double(),
.. Geography = col_character(),
.. Gender = col_character(),
.. Age = col_double(),
.. Tenure = col_double(),
.. Balance = col_double(),
.. NumOfProducts = col_double(),
.. HasCrCard = col_double(),
.. IsActiveMember = col_double(),
.. EstimatedSalary = col_double(),
.. Exited = col_double()
.. )
- attr(*, "problems")=<externalptr>
RowNumber CustomerId Surname CreditScore
Min. : 1 Min. :15565701 Length:10000 Min. :350.0
1st Qu.: 2501 1st Qu.:15628528 Class :character 1st Qu.:584.0
Median : 5000 Median :15690738 Mode :character Median :652.0
Mean : 5000 Mean :15690941 Mean :650.5
3rd Qu.: 7500 3rd Qu.:15753234 3rd Qu.:718.0
Max. :10000 Max. :15815690 Max. :850.0
Geography Gender Age Tenure
Length:10000 Length:10000 Min. :18.00 Min. : 0.000
Class :character Class :character 1st Qu.:32.00 1st Qu.: 3.000
Mode :character Mode :character Median :37.00 Median : 5.000
Mean :38.92 Mean : 5.013
3rd Qu.:44.00 3rd Qu.: 7.000
Max. :92.00 Max. :10.000
Balance NumOfProducts HasCrCard IsActiveMember
Min. : 0 Min. :1.00 Min. :0.0000 Min. :0.0000
1st Qu.: 0 1st Qu.:1.00 1st Qu.:0.0000 1st Qu.:0.0000
Median : 97199 Median :1.00 Median :1.0000 Median :1.0000
Mean : 76486 Mean :1.53 Mean :0.7055 Mean :0.5151
3rd Qu.:127644 3rd Qu.:2.00 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :250898 Max. :4.00 Max. :1.0000 Max. :1.0000
EstimatedSalary Exited
Min. : 11.58 Min. :0.0000
1st Qu.: 51002.11 1st Qu.:0.0000
Median :100193.91 Median :0.0000
Mean :100090.24 Mean :0.2037
3rd Qu.:149388.25 3rd Qu.:0.0000
Max. :199992.48 Max. :1.0000
Rows: 10,000
Columns: 14
$ RowNumber <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
$ CustomerId <dbl> 15634602, 15647311, 15619304, 15701354, 15737888, 1557…
$ Surname <chr> "Hargrave", "Hill", "Onio", "Boni", "Mitchell", "Chu",…
$ CreditScore <dbl> 619, 608, 502, 699, 850, 645, 822, 376, 501, 684, 528,…
$ Geography <chr> "France", "Spain", "France", "France", "Spain", "Spain…
$ Gender <chr> "Female", "Female", "Female", "Female", "Female", "Mal…
$ Age <dbl> 42, 41, 42, 39, 43, 44, 50, 29, 44, 27, 31, 24, 34, 25…
$ Tenure <dbl> 2, 1, 8, 1, 2, 8, 7, 4, 4, 2, 6, 3, 10, 5, 7, 3, 1, 9,…
$ Balance <dbl> 0.00, 83807.86, 159660.80, 0.00, 125510.82, 113755.78,…
$ NumOfProducts <dbl> 1, 1, 3, 2, 1, 2, 2, 4, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, …
$ HasCrCard <dbl> 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, …
$ IsActiveMember <dbl> 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, …
$ EstimatedSalary <dbl> 101348.88, 112542.58, 113931.57, 93826.63, 79084.10, 1…
$ Exited <dbl> 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, …
---
title: "Final project part 1"
author: "Mani Shanker Kamarapu"
description: "The first part of final project"
date: "10/11/2022"
format:
html:
df-print: paged
css: styles.css
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- finalpart1
- Bank Customer Churn Prediction
- Mani Shanker Kamarapu
---
## Introduction
Churning refers to a customer who leaves one company to go to another
company. Customer churn introduces not only some loss in income but also
other negative effects on the operation of companies. Churn management
is the concept of identifying those customers who are intending to move
their custom to a competing service provider.
Risselada et al. (2010) stated that churn management is becoming part of
customer relationship management. It is important for companies to
consider it as they try to establish long-term relationships with
customers and maximize the value of their customer base.
::: callout-important
## Research Questions
A. Does churn-rate depend on the geographical factors of the customer?
B. Do non-active members are probable to churn or not?
:::
This project will be useful to better understand more about the customer
difficulties and factors and also give us a pretty good idea on the
factors effecting the customers to exit and also about the dormant state
of the customers.
## Hypothesis
Customer churn analysis has become a major concern in almost every
industry that offers products and services. The model developed will
help banks identify clients who are likely to be churners and develop
appropriate marketing actions to retain their valuable clients. And this
model also supports information about similar customer group to consider
which marketing reactions are to be provided. Thus, due to existing
customers are retained, it will provide banks with increased profits and
revenues.
Given the above, we can frame our hypotheses as follows:
::: callout-tip
## H~0A~
Geographical factors [will not]{.underline} be statistically predict the
churn-rate.
:::
::: callout-tip
## H~1A~
Geographical factors [will]{.underline} be statistically predict the
churn-rate.
:::
::: callout-tip
## H~0B~
Active members [will not]{.underline} churn.
:::
::: callout-tip
## H~1B~
Active members [will]{.underline} churn.
:::
## Loading libraries
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(ggplot2)
library(stats)
knitr::opts_chunk$set(echo = TRUE)
```
## Reading the data set
```{r}
Churn <- read_csv("_data/Churn_Modelling.csv")
Churn
```
This data set includes 10k bank customer data records with 14 attributes
including socio-demographic attributes, account level and behavioural
attributes.
*Attribute Description* 1. Row Number- Number of customers 2. Customer
ID- ID of customer 3. Surname- Customer name 4. Credit Score- Score of
credit card usage 5. Geography- Location of customer 6. Gender- Customer
gender 7. Age- Age of Customer 8. Tenure- The period of having the
account in months 9. Balance- Customer main balance 10. NumOfProducts-
No of products used by customer 11. HasCrCard- If the customer has a
credit card or not 12. IsActiveMember- Customer account is active or not
13. Estimated Salary- Estimated salary of the customer. 14. Exited-
Indicate churned or not
```{r}
str(Churn)
```
## Descriptive statistics
```{r}
summary(Churn)
```
```{r}
glimpse(Churn)
```