Final project part 1

finalpart1
Bank Customer Churn Prediction
Mani Shanker Kamarapu
The first part of final project
Author

Mani Shanker Kamarapu

Published

October 11, 2022

Introduction

Churning refers to a customer who leaves one company to go to another company. Customer churn introduces not only some loss in income but also other negative effects on the operation of companies. Churn management is the concept of identifying those customers who are intending to move their custom to a competing service provider.

Risselada et al. (2010) stated that churn management is becoming part of customer relationship management. It is important for companies to consider it as they try to establish long-term relationships with customers and maximize the value of their customer base.

Research Questions

A. Does churn-rate depend on the geographical factors of the customer?

B. Do non-active members are probable to churn or not?

This project will be useful to better understand more about the customer difficulties and factors and also give us a pretty good idea on the factors effecting the customers to exit and also about the dormant state of the customers.

Hypothesis

Customer churn analysis has become a major concern in almost every industry that offers products and services. The model developed will help banks identify clients who are likely to be churners and develop appropriate marketing actions to retain their valuable clients. And this model also supports information about similar customer group to consider which marketing reactions are to be provided. Thus, due to existing customers are retained, it will provide banks with increased profits and revenues.

Given the above, we can frame our hypotheses as follows:

H0A

Geographical factors will not be statistically predict the churn-rate.

H1A

Geographical factors will be statistically predict the churn-rate.

H0B

Active members will not churn.

H1B

Active members will churn.

Loading libraries

Code
library(tidyverse)
library(ggplot2)
library(stats)

knitr::opts_chunk$set(echo = TRUE)

Reading the data set

Code
Churn <- read_csv("_data/Churn_Modelling.csv")
Rows: 10000 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): Surname, Geography, Gender
dbl (11): RowNumber, CustomerId, CreditScore, Age, Tenure, Balance, NumOfPro...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
Churn

This data set includes 10k bank customer data records with 14 attributes including socio-demographic attributes, account level and behavioural attributes.

Attribute Description 1. Row Number- Number of customers 2. Customer ID- ID of customer 3. Surname- Customer name 4. Credit Score- Score of credit card usage 5. Geography- Location of customer 6. Gender- Customer gender 7. Age- Age of Customer 8. Tenure- The period of having the account in months 9. Balance- Customer main balance 10. NumOfProducts- No of products used by customer 11. HasCrCard- If the customer has a credit card or not 12. IsActiveMember- Customer account is active or not 13. Estimated Salary- Estimated salary of the customer. 14. Exited- Indicate churned or not

Code
str(Churn)
spc_tbl_ [10,000 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ RowNumber      : num [1:10000] 1 2 3 4 5 6 7 8 9 10 ...
 $ CustomerId     : num [1:10000] 15634602 15647311 15619304 15701354 15737888 ...
 $ Surname        : chr [1:10000] "Hargrave" "Hill" "Onio" "Boni" ...
 $ CreditScore    : num [1:10000] 619 608 502 699 850 645 822 376 501 684 ...
 $ Geography      : chr [1:10000] "France" "Spain" "France" "France" ...
 $ Gender         : chr [1:10000] "Female" "Female" "Female" "Female" ...
 $ Age            : num [1:10000] 42 41 42 39 43 44 50 29 44 27 ...
 $ Tenure         : num [1:10000] 2 1 8 1 2 8 7 4 4 2 ...
 $ Balance        : num [1:10000] 0 83808 159661 0 125511 ...
 $ NumOfProducts  : num [1:10000] 1 1 3 2 1 2 2 4 2 1 ...
 $ HasCrCard      : num [1:10000] 1 0 1 0 1 1 1 1 0 1 ...
 $ IsActiveMember : num [1:10000] 1 1 0 0 1 0 1 0 1 1 ...
 $ EstimatedSalary: num [1:10000] 101349 112543 113932 93827 79084 ...
 $ Exited         : num [1:10000] 1 0 1 0 0 1 0 1 0 0 ...
 - attr(*, "spec")=
  .. cols(
  ..   RowNumber = col_double(),
  ..   CustomerId = col_double(),
  ..   Surname = col_character(),
  ..   CreditScore = col_double(),
  ..   Geography = col_character(),
  ..   Gender = col_character(),
  ..   Age = col_double(),
  ..   Tenure = col_double(),
  ..   Balance = col_double(),
  ..   NumOfProducts = col_double(),
  ..   HasCrCard = col_double(),
  ..   IsActiveMember = col_double(),
  ..   EstimatedSalary = col_double(),
  ..   Exited = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Descriptive statistics

Code
summary(Churn)
   RowNumber       CustomerId         Surname           CreditScore   
 Min.   :    1   Min.   :15565701   Length:10000       Min.   :350.0  
 1st Qu.: 2501   1st Qu.:15628528   Class :character   1st Qu.:584.0  
 Median : 5000   Median :15690738   Mode  :character   Median :652.0  
 Mean   : 5000   Mean   :15690941                      Mean   :650.5  
 3rd Qu.: 7500   3rd Qu.:15753234                      3rd Qu.:718.0  
 Max.   :10000   Max.   :15815690                      Max.   :850.0  
  Geography            Gender               Age            Tenure      
 Length:10000       Length:10000       Min.   :18.00   Min.   : 0.000  
 Class :character   Class :character   1st Qu.:32.00   1st Qu.: 3.000  
 Mode  :character   Mode  :character   Median :37.00   Median : 5.000  
                                       Mean   :38.92   Mean   : 5.013  
                                       3rd Qu.:44.00   3rd Qu.: 7.000  
                                       Max.   :92.00   Max.   :10.000  
    Balance       NumOfProducts    HasCrCard      IsActiveMember  
 Min.   :     0   Min.   :1.00   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:     0   1st Qu.:1.00   1st Qu.:0.0000   1st Qu.:0.0000  
 Median : 97199   Median :1.00   Median :1.0000   Median :1.0000  
 Mean   : 76486   Mean   :1.53   Mean   :0.7055   Mean   :0.5151  
 3rd Qu.:127644   3rd Qu.:2.00   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :250898   Max.   :4.00   Max.   :1.0000   Max.   :1.0000  
 EstimatedSalary         Exited      
 Min.   :    11.58   Min.   :0.0000  
 1st Qu.: 51002.11   1st Qu.:0.0000  
 Median :100193.91   Median :0.0000  
 Mean   :100090.24   Mean   :0.2037  
 3rd Qu.:149388.25   3rd Qu.:0.0000  
 Max.   :199992.48   Max.   :1.0000  
Code
glimpse(Churn)
Rows: 10,000
Columns: 14
$ RowNumber       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
$ CustomerId      <dbl> 15634602, 15647311, 15619304, 15701354, 15737888, 1557…
$ Surname         <chr> "Hargrave", "Hill", "Onio", "Boni", "Mitchell", "Chu",…
$ CreditScore     <dbl> 619, 608, 502, 699, 850, 645, 822, 376, 501, 684, 528,…
$ Geography       <chr> "France", "Spain", "France", "France", "Spain", "Spain…
$ Gender          <chr> "Female", "Female", "Female", "Female", "Female", "Mal…
$ Age             <dbl> 42, 41, 42, 39, 43, 44, 50, 29, 44, 27, 31, 24, 34, 25…
$ Tenure          <dbl> 2, 1, 8, 1, 2, 8, 7, 4, 4, 2, 6, 3, 10, 5, 7, 3, 1, 9,…
$ Balance         <dbl> 0.00, 83807.86, 159660.80, 0.00, 125510.82, 113755.78,…
$ NumOfProducts   <dbl> 1, 1, 3, 2, 1, 2, 2, 4, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, …
$ HasCrCard       <dbl> 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, …
$ IsActiveMember  <dbl> 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, …
$ EstimatedSalary <dbl> 101348.88, 112542.58, 113931.57, 93826.63, 79084.10, 1…
$ Exited          <dbl> 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, …