603_Final_Project_Check_In_2_Saisrinivas_Ambatipudi

Author

Saisrinivas Ambatipudi

Published

April 20, 2023

Data Description:

The blood transfusion dataset contain 748 samples with 5 input features: Input Features: • Recency (number of months since the last donation) • Frequency (total number of donations) • Monetary (total blood donated in c.c.) • Time (number of months since the first donation) • Age (age of the donor)

Source: https://www.kaggle.com/datasets/ninalabiba/blood-transfusion-dataset

This summary function presents a statistical description of a dataset related to blood donation, with five variables: Recency, Frequency, Monetary, Time, and whether the individual donated blood in March 2007. Here’s a breakdown of each variable:

Recency (months): This variable represents the number of months since the last blood donation. The minimum value is 0 months, indicating that some individuals donated blood very recently. The mean is 9.507 months, suggesting that, on average, people donated blood around 9.5 months ago. The maximum value is 74 months, which means the longest gap between donations is 74 months.

Frequency (times): This variable shows the total number of times an individual has donated blood. The minimum value is 1, meaning that at least one person has only donated blood once. The mean is 5.515 times, indicating that people, on average, have donated blood about 5.5 times. The maximum value is 50 times, showing that some individuals have donated blood quite frequently.

Monetary (c.c. blood): This variable represents the total volume of blood donated by an individual, measured in cubic centimeters (c.c.). The minimum value is 250 c.c., which corresponds to the minimum single donation volume. The mean is 1379 c.c., suggesting that, on average, individuals have donated around 1.379 liters of blood. The maximum value is 12,500 c.c., indicating that the highest total volume donated by a person is 12.5 liters.

Time (months): This variable measures the length of time an individual has been donating blood. The minimum value is 2 months, suggesting that some individuals are relatively new to blood donation. The mean is 34.28 months, indicating that, on average, people have been donating blood for about 34.3 months. The maximum value is 98 months, showing that some individuals have been donating blood for a long time.

Whether he/she donated blood in March 2007: This is a binary variable that indicates if an individual donated blood in March 2007. The mean is 0.238, which means that about 23.8% of the individuals in the dataset donated blood in that specific month.

The summary function provides an overview of the dataset’s key statistics, such as minimum, 1st quartile, median, mean, 3rd quartile, and maximum values for each variable. This information helps to understand the distribution, central tendency, and spread of the data.

BD <- read_csv("C:/UMass/DACSS_603/603_Spring_2023/posts/_data/transfusion_saisrinivas.csv")
Rows: 748 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): Recency (months), Frequency (times), Monetary (c.c. blood), Time (m...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data(BD)
Warning in data(BD): data set 'BD' not found
BD
# A tibble: 748 × 5
   `Recency (months)` `Frequency (times)` Monetary (c.c. blood…¹ Time …² wheth…³
                <dbl>               <dbl>                  <dbl>   <dbl>   <dbl>
 1                  2                  50                  12500      98       1
 2                  0                  13                   3250      28       1
 3                  1                  16                   4000      35       1
 4                  2                  20                   5000      45       1
 5                  1                  24                   6000      77       0
 6                  4                   4                   1000       4       0
 7                  2                   7                   1750      14       1
 8                  1                  12                   3000      35       0
 9                  2                   9                   2250      22       1
10                  5                  46                  11500      98       1
# … with 738 more rows, and abbreviated variable names
#   ¹​`Monetary (c.c. blood)`, ²​`Time (months)`,
#   ³​`whether he/she donated blood in March 2007`
summary(BD)
 Recency (months) Frequency (times) Monetary (c.c. blood) Time (months)  
 Min.   : 0.000   Min.   : 1.000    Min.   :  250         Min.   : 2.00  
 1st Qu.: 2.750   1st Qu.: 2.000    1st Qu.:  500         1st Qu.:16.00  
 Median : 7.000   Median : 4.000    Median : 1000         Median :28.00  
 Mean   : 9.507   Mean   : 5.515    Mean   : 1379         Mean   :34.28  
 3rd Qu.:14.000   3rd Qu.: 7.000    3rd Qu.: 1750         3rd Qu.:50.00  
 Max.   :74.000   Max.   :50.000    Max.   :12500         Max.   :98.00  
 whether he/she donated blood in March 2007
 Min.   :0.000                             
 1st Qu.:0.000                             
 Median :0.000                             
 Mean   :0.238                             
 3rd Qu.:0.000                             
 Max.   :1.000                             

EDA:

Exploratory Data Analysis would be performed along with graphs, including the variables (Dependent and Independent Variables) in the Blood Transfusion Dataset (Recency (months), Frequency (times), Monetary (c.c. blood), Time (months), whether he/she donated blood in March 2007)

Detailed Research Statement:

  1. Hypothesis 1: There is a significant difference in the past frequency of blood donations between donors who have donated in March 2007 and those who have not.

Confounding variables recency of blood donations, Monetary Value of blood donations, Time (months) of blood donations

  1. Hypothesis 2: There is a significant difference in the Recency of blood donations between donors who have donated in March 2007 and those who have not.

Confounding variables: Past frequency of blood donations, Monetary Value of blood donations, Time (months) of blood donations

  1. Hypothesis 3: There is a significant difference in the Monetary Value of blood donations between donors who have donated in March 2007 and those who have not.

Confounding variables: Past frequency of blood donations, Recency of blood donations, Time (months) of blood donations

  1. Hypothesis 4: There is a significant difference in the Time (months) of blood donations between donors who have donated in March 2007 and those who have not.

Confounding variables: Past frequency of blood donations, Recency of blood donations, Monetary Value of blood donations

  1. Hypothesis 5: Donors who have donated blood more frequently in the past (i.e. a higher average number of donations per month) are more likely to donate again in March 2007.

  2. Hypothesis 6: Donors who have donated higher average Monetary (c.c. blood) (per month) are more likely to donate again in March 2007.

  3. Hypothesis 7: Donors who have donated blood more recently (Recency (months)) and with higher frequency (Frequency (times)) are more likely to donate again in March 2007.

  4. Hypothesis 8: Donors who have donated blood more recently (Recency (months)) and with higher blood donation in past (Monetary (c.c. blood)) are more likely to donate again in March 2007.

  5. Hypothesis 9: Donors who have donated blood more recently (Recency (months)) and with Time (months) are more likely to donate again in March 2007.

Demographic Data:

We have considered removing demographic data from the scope, as we couldn’t find any relevant data resources on internet. So confounding variables are limited to the variable available in the blood transfusion dataset.

Graphs:

Wherever necessary bar graph, scatter plot, correlation, regression, logistic regression graphs would be included for the hypothesis mentioned above.

Building a Donor Retention Strategy:

Use the insights gained to develop a donor retention strategy for the blood donation center. Identify factors that are associated with donor churn (i.e., donors who stop donating blood), and develop a plan to mitigate these factors. This can help healthcare and blood donation organizations to develop strategies to retain donors over the long term.

Expected Contribution:

Sai Srinivas: Exploratory Analysis, Research Question 1-4

Akhilesh Kumar: and Research Question 6-9, Donor Retention Strategy

References:

Kaggle: https://www.kaggle.com/datasets/ninalabiba/blood-transfusion-dataset

Original Owner and Donor: Prof. I-Cheng Yeh, Department of Information Management, Chung-Hua University, Hsin Chu, Taiwan 30067, R.O.C., e-mail:icyeh ‘@’ chu.edu.tw, TEL:886-3-5186511, Date Donated: October 3, 2008