The blood transfusion dataset contain 748 samples with 5 input features: Input Features: • Recency (number of months since the last donation) • Frequency (total number of donations) • Monetary (total blood donated in c.c.) • Time (number of months since the first donation) • Age (age of the donor)
Rows: 748 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): Recency (months), Frequency (times), Monetary (c.c. blood), Time (m...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The aim of this study is to develop linear regression, logistic regression machine learning model, to accurately predict whether a blood donor is likely to donate in the future. The results of this study could be useful in developing targeted strategies for donor recruitment and retention, ultimately improving the availability and accessibility of blood donations.
Identify Factors that Affect Blood Donation:
How do donation behaviors vary across different regions, and what factors may contribute to these variations? Using hypothesis testing, this research project aims to compare the donation patterns of donors from different regions based on demographic and donation-related variables, such as age, gender, donation frequency, and time since last donation. The findings can provide insights into the regional variations in blood donation behaviors and inform targeted strategies to address these differences, potentially leading to increased donation rates and more efficient allocation of resources for blood donation organizations. The study will visualize the results using plots and charts to identify any significant patterns or trends that could help healthcare and blood donation organizations to develop effective strategies to increase blood donation rates.
Segmentation of Blood Donors:
Use clustering techniques to segment blood donors based on their demographic and donation history characteristics. Explore methods such as K-means clustering or hierarchical clustering to identify groups of donors with similar characteristics. Examine the differences between these groups and explore their donation patterns over time.
Building a Donor Retention Strategy:
Use the insights gained from the previous projects to develop a donor retention strategy for the blood donation center. Identify factors that are associated with donor churn (i.e., donors who stop donating blood), and develop a plan to mitigate these factors. This can help healthcare and blood donation organizations to develop strategies to retain donors over the long term.
Summary of the data
summary(BD)
Recency (months) Frequency (times) Monetary (c.c. blood) Time (months)
Min. : 0.000 Min. : 1.000 Min. : 250 Min. : 2.00
1st Qu.: 2.750 1st Qu.: 2.000 1st Qu.: 500 1st Qu.:16.00
Median : 7.000 Median : 4.000 Median : 1000 Median :28.00
Mean : 9.507 Mean : 5.515 Mean : 1379 Mean :34.28
3rd Qu.:14.000 3rd Qu.: 7.000 3rd Qu.: 1750 3rd Qu.:50.00
Max. :74.000 Max. :50.000 Max. :12500 Max. :98.00
whether he/she donated blood in March 2007
Min. :0.000
1st Qu.:0.000
Median :0.000
Mean :0.238
3rd Qu.:0.000
Max. :1.000
Descrpition of the Variables:
This summary function presents a statistical description of a dataset related to blood donation, with five variables: Recency, Frequency, Monetary, Time, and whether the individual donated blood in March 2007. Here’s a breakdown of each variable:
Recency (months): This variable represents the number of months since the last blood donation. The minimum value is 0 months, indicating that some individuals donated blood very recently. The mean is 9.507 months, suggesting that, on average, people donated blood around 9.5 months ago. The maximum value is 74 months, which means the longest gap between donations is 74 months.
Frequency (times): This variable shows the total number of times an individual has donated blood. The minimum value is 1, meaning that at least one person has only donated blood once. The mean is 5.515 times, indicating that people, on average, have donated blood about 5.5 times. The maximum value is 50 times, showing that some individuals have donated blood quite frequently.
Monetary (c.c. blood): This variable represents the total volume of blood donated by an individual, measured in cubic centimeters (c.c.). The minimum value is 250 c.c., which corresponds to the minimum single donation volume. The mean is 1379 c.c., suggesting that, on average, individuals have donated around 1.379 liters of blood. The maximum value is 12,500 c.c., indicating that the highest total volume donated by a person is 12.5 liters.
Time (months): This variable measures the length of time an individual has been donating blood. The minimum value is 2 months, suggesting that some individuals are relatively new to blood donation. The mean is 34.28 months, indicating that, on average, people have been donating blood for about 34.3 months. The maximum value is 98 months, showing that some individuals have been donating blood for a long time.
Whether he/she donated blood in March 2007: This is a binary variable that indicates if an individual donated blood in March 2007. The mean is 0.238, which means that about 23.8% of the individuals in the dataset donated blood in that specific month.
The summary function provides an overview of the dataset’s key statistics, such as minimum, 1st quartile, median, mean, 3rd quartile, and maximum values for each variable. This information helps to understand the distribution, central tendency, and spread of the data.
Expected Contribution:
Akhilesh Kumar: Will work on the Segmentation of Blood Donors and Building a Donor Retention Strategy
Sai Srinivas: Will work on the Blood Donation Prediction Frequency and Identify Factors that Affect Blood Donation
Original Owner and Donor: Prof. I-Cheng Yeh, Department of Information Management, Chung-Hua University, Hsin Chu, Taiwan 30067, R.O.C., e-mail:icyeh ‘@’ chu.edu.tw, TEL:886-3-5186511, Date Donated: October 3, 2008