The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Code
library(readxl)library(ggplot2)library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
Code
library(reshape2)
Attaching package: 'reshape2'
The following object is masked from 'package:tidyr':
smiths
Introduction and background
The implementation of the one-child policy by the Chinese government in 1979 led to an increase in the number of families with only one child and a unique family structure known as the “four-two-one” model, consisting of four grandparents, two parents, and one child. While being part of such a family structure provides certain advantages in terms of family and social resources, children without siblings, commonly referred to as “only children,” may experience various physical and socio-psychological challenges during their development. One notable concern is the increased risk of overweight and obesity among only children. These children are more likely to struggle with weight-related issues compared to their counterparts who have one or more siblings. Additionally, the psychosocial consequences associated with being an only child are also worth investigating. In this context, it is important to explore not only the relationship between overweight/obesity and mental health in young adolescents but also how the presence or absence of siblings and other factors into this relationship. Overall, investigating the link between overweight/obesity, mental health, and sib-size in young adolescents within the context of the one-child policy can shed light on the potential challenges faced by only children and contribute to a better understanding of their overall well-being.
research questions
Does obesity positively related to depression rate?
What are factors that affects obesity?
Does sibling or obesity directly related to depression?
key predictors
depression rate
sibling number
obesity rate
Family location, finance and education
hypothesis
Higher obesity rate increase the risk of depression
# A tibble: 6 × 29
T0depres…¹ T0anx…² T1dep…³ T1anx…⁴ Height Weight WC HC SBP DBP FBG
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 31 35 41 35 153. 34.6 58 67 98 60 4.4
2 35 24 35 25 172. 46.1 63 78 110 70 3.9
3 31 34 37 26 146. 38.9 72 77.7 102 62 4.6
4 27 31 42 35 162. 46.8 62 80 116 80 4.5
5 31 26 49 33 154. 36.4 56 72 90 60 4.2
6 30 28 47 32 164. 40.6 55 73 102 70 3.7
# … with 18 more variables: TC <dbl>, TG <dbl>, `HDL-C` <dbl>, `LDL-C` <dbl>,
# BMI <dbl>, WHR <dbl>, WtHR <dbl>, `Family location` <dbl>,
# `Number of siblings` <dbl>,
# `How much time do you spend with your father in elementary school?` <dbl>,
# `How much time do you spend with your mother in elementary school?` <dbl>,
# `Father’s education level` <dbl>, `Mother’s education level` <dbl>,
# `Family financial situation` <dbl>, `Sleeping hours` <dbl>, …
Code
sum(is.na(data))
[1] 728
Code
plot(data$T0depression~data$BMI)
This dataset including 1348 variables and 29 columns. there are 728 NA in this data set. all variables was presented as numberic data. descriptive data was also presented as degrees such as education level, family financial situation and depression rate. By pre-plotting depression rate vs BMI, we can see that some ouliers may need to deal with and there is no siginifcant disrtibution on graph. More data processing is needed in future process.
Modified column name
Code
variables <-c("Family location", "Number of siblings", " time spend with father in elementary school?", " time spend with mother in elementary school?", "Father’s education level", "Mother’s education level", "Family financial situation", "Sleeping hours", "Skipping breakfast", "Vigorous", "Moderate")abreviations <-c("FL", "NS", "TFE", "TME", "FEL", "MEL", "FS", "SL", "SB", "VG", "MD")cat("varible table\n")
variables abreviations
1 Family location FL
2 Number of siblings NS
3 time spend with father in elementary school? TFE
4 time spend with mother in elementary school? TME
5 Father’s education level FEL
6 Mother’s education level MEL
7 Family financial situation FS
8 Sleeping hours SL
9 Skipping breakfast SB
10 Vigorous VG
11 Moderate MD
Some key predictors were plotted. The distribution of family location was plotted in the first chart and the distribution of family financial situation were plotted in the second chart.
A scatter plot was used to visualize the relationship between skipping breakfast and BMI rate. Only by scatter plot it is difficult to observe the relationship between two varibales. Therefore, further analysis is needed
Code
ggplot(data, aes(x = SB, y = BMI)) +geom_jitter(width =0.2, height =0, color ="indianred", alpha =0.5) +xlab("skipping breakfast")
1. Higher obesity rate increase the risk of depression
H0=no relationship between obesity rate and the risk of depression
Ha=higher obesity rate increases the risk of depression
In order to prove this hypothesis, linear model was used to calculate relationship between depression rate and BMI.
Code
#linear regression of depresison and BMIlm0<-lm(T1depression ~ BMI, data = data)summary(lm0)
Call:
lm(formula = T1depression ~ BMI, data = data)
Residuals:
Min 1Q Median 3Q Max
-19.2371 -6.2000 0.1719 6.2964 21.9845
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.75036 1.50580 27.062 <2e-16 ***
BMI -0.09528 0.07803 -1.221 0.222
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.093 on 1308 degrees of freedom
(38 observations deleted due to missingness)
Multiple R-squared: 0.001139, Adjusted R-squared: 0.0003749
F-statistic: 1.491 on 1 and 1308 DF, p-value: 0.2223
The BMI coefficient (-0.09528) represents the estimated change in the depression score for a one-unit increase in BMI. For each unit increase in BMI, the depression score decreases by 0.09528. With p-value of 0.222, the coefficient is not significant. Therefore, there is no strong evidence of a linear relationship between BMI and depression. the residuals range from -19.2371 to 21.9845. The residual standard error is relatively high also indicates the model is not fit to the data. F-staistic gives an overall sinificance of the model and with a high p-value, this model is also not statically significant as a whole. The multiple R-squared value (0.001139) represents the proportion of variance in the depression score explained by the model and only 0.1139% of the variability in depression can be attributed to the linear relationship with BMI. The adjusted R-squared value (0.0003749) adjusts the multiple R-squared value for the number of predictors in the model. It penalizes the inclusion of unnecessary predictors. A lower adjusted R-squared suggests that the model does not provide a good fit to the data.
In summary, based on the provided output, there is no strong evidence to support a linear relationship between BMI and depression. The coefficient for BMI is not statistically significant, and the model’s overall fit is weak (low R-squared values and non-significant F-statistic).
Code
#diagnosticpar(mfrow =c(2,2))plot(lm0)
Linear regression diagnostic plot was used to evaluate the performance.In residual vs fitted plot, the a horizontal red line represent the mean or expected value of the residuals. and the residuals are evenly distributes above and below the horizontal line. This indicates the linear model was fitted to our data. In normal Q_Q plot, the straight pattern suggests that the residuals of a linear regression model follow a normal distribution,supporting the assumption of normality. In a scale-location plot , a straight red line typically indicates homoscedasticity, which means that the residuals have a constant variance across different levels of the predictor variable(s). The scale-location plot detects any systematic patterns in the spread (variance) of the residuals.Here, the plot suggested that the assumption of homoscedasticity is met. According to these diagnostics,the linear model is reliable and presenting the relationship properly.
Code
#visualizationggplot(data, aes(x = BMI, y = T1depression)) +geom_point(color ="indianred") +geom_smooth(method ="lm", se =FALSE, color ="darkred")
In addition to BMI, several other variables were included in the analysis to examine their relationship with depression rate in children. Two variables, namely time spent with father in elementary school and frequency of skipping breakfast, emerged as significant factors influencing children’s depression rate.
The findings revealed that less time spent with father in elementary school was associated with a higher likelihood of experiencing an increase in depression rate among children. This suggests the importance of positive father-child interactions and involvement during this critical period of development.The analysis showed that skipping breakfast more frequently was also linked to a higher depression rate in children. With diagnostic, there was not a substantial deviation from the assumptions of the model. This suggests that the linear regression analysis provided a reasonable fit to the data and supported the interpretation of the results.
Code
#linear regression model 2lm1<-lm(T1depression ~ BMI+NS+TFE+TME+FEL+MEL+FL+SL+SB, data = data)summary(lm1)
Call:
lm(formula = T1depression ~ BMI + NS + TFE + TME + FEL + MEL +
FL + SL + SB, data = data)
Residuals:
Min 1Q Median 3Q Max
-22.3852 -5.9985 0.1987 6.4197 21.1941
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.55778 2.88686 14.742 < 2e-16 ***
BMI -0.10358 0.07798 -1.328 0.1843
NS 0.72821 0.51703 1.408 0.1592
TFE -0.57069 0.24271 -2.351 0.0189 *
TME -0.18783 0.30401 -0.618 0.5368
FEL -0.28747 0.24332 -1.181 0.2376
MEL -0.22372 0.26894 -0.832 0.4056
FL 0.09777 0.18801 0.520 0.6031
SL 0.10706 0.38873 0.275 0.7831
SB 1.47402 0.31288 4.711 2.73e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.967 on 1300 degrees of freedom
(38 observations deleted due to missingness)
Multiple R-squared: 0.0379, Adjusted R-squared: 0.03124
F-statistic: 5.691 on 9 and 1300 DF, p-value: 9.18e-08
Code
#diagnosticpar(mfrow =c(2,2))plot(lm1)
To verify if the model is correct, some of varibles with large p value are deleted for backward elimination. “Time spend with mother in elementary school”, “Father’s educaion level”, “sleeping time” are deleted comparing to the model before. In this case, “Time spend with father in elementary school” and “skipping breakfast” still above the significant level. By comparing the adjusted R square of two models(0.03124 and 0.03321). There was no not two big difference in these two models.
Code
#linear regression model 3lm2<-lm(T1depression ~ BMI+NS+TFE+FEL+SB, data = data)summary(lm2)
Call:
lm(formula = T1depression ~ BMI + NS + TFE + FEL + SB, data = data)
Residuals:
Min 1Q Median 3Q Max
-22.9398 -6.0354 0.2289 6.3596 20.8288
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 41.91051 2.17737 19.248 < 2e-16 ***
BMI -0.10798 0.07716 -1.399 0.16193
NS 0.73126 0.45738 1.599 0.11011
TFE -0.64654 0.20647 -3.131 0.00178 **
FEL -0.33519 0.23154 -1.448 0.14795
SB 1.51314 0.31049 4.873 1.23e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.959 on 1304 degrees of freedom
(38 observations deleted due to missingness)
Multiple R-squared: 0.03681, Adjusted R-squared: 0.03312
F-statistic: 9.967 on 5 and 1304 DF, p-value: 2.26e-09
Code
#diagnostic par(mfrow =c(2,3))plot(lm2)#compare predicted value with observe valuelm3<-lm(T1depression ~ TFE, data = data)plot_data <-data.frame(Predicted_value =predict(lm3), Observed_value = data$T1depression[1:length(predict(lm3))])ggplot(plot_data, aes(x = Predicted_value, y = Observed_value)) +geom_point() +geom_abline(intercept =0, slope =1, color ="green")
Predicted value was plotted between first linear model and actual value.The regression line was slightly decreasing. This also implies the linear model is not statistic significant. In summary, based on the provided output, there is no strong evidence to support a linear relationship between BMI and depression. The coefficient for BMI is not statistically significant, and the model’s overall fit is weak (low R-squared values and non-significant F-statistic). Therefore, we fail to reject the null hypothesis.
2. higher family income increase the rate of obesity
H0=no relationship between family income and rate of obesity
Ha=higher family income increase the rate of obesity
Based on this data, we may also explore what factors may affect the obesity rate. Here we made a hypothsis as higher family income increase the rate of obesity. Due to most of the variables are ordinal variables, ordinal logistic regression is applied in this slot to verify the hypothesis.
Code
#convert BMI to ordinal varibledata$BMI_category <-cut(data$BMI, breaks =c(-Inf, 18.5, 24.9, Inf),labels =c("Underweight", "Normal weight", "Overweight"))data$BMI_rank <-as.factor(unclass(data$BMI_category))# Visualizing data# Filter out rows with NA values in BMI_rank or FSfiltered_data <- data[complete.cases(data$BMI_rank, data$FS), ]# Create the plot with filtered dataggplot(filtered_data, aes(x = BMI_category, y = FS)) +geom_boxplot(size =0.75, color ="indianred") +geom_jitter(alpha =0.5, color ="red") +theme(axis.text.x =element_text(angle =45, hjust =1, vjust =1))+ylab("Financial Situation")
Code
# #Fit BMI rank and family finanacial situation into ordinal logit modelmodel <-polr(BMI_rank ~ FS, data = data, Hess=TRUE)summary(model)
Call:
polr(formula = BMI_rank ~ FS, data = data, Hess = TRUE)
Coefficients:
Value Std. Error t value
FS -0.1883 0.07865 -2.394
Intercepts:
Value Std. Error t value
1|2 -0.7067 0.2554 -2.7672
2|3 2.6195 0.2828 9.2639
Residual Deviance: 2173.226
AIC: 2179.226
(36 observations deleted due to missingness)
Value Std. Error t value p value
FS -0.1883107 0.0786533 -2.394187 1.665724e-02
1|2 -0.7067002 0.2553870 -2.767174 5.654453e-03
2|3 2.6194751 0.2827624 9.263873 1.971474e-20
Code
# Getting odds-ratioexp(coef(model))
FS
0.8283573
The findings of the study indicate that financial situation (FS) is significantly associated with BMI ranks. The coefficient of -0.1883 suggests that higher financial situation is linked to a decreased likelihood of being in a higher BMI rank. Specifically, individuals with higher financial situation are less likely to fall into the normal weight category compared to the underweight category (1|2) with a coefficient of -0.7067. Conversely, they are more likely to be in the overweight category compared to the normal weight category (2|3) with a coefficient of 2.6195. These relationships were found to be statistically significant, indicating that the observed associations are unlikely to occur by chance. Furthermore, the inclusion of additional variables in the model improved its predictive ability, as indicated by the slightly lower AIC value. These findings highlight the importance of considering financial situation as a factor influencing BMI ranks and provide insights into the complex relationship between socioeconomic factors and weight status. The null hypothsis is therefore rejected
Code
#introduce more variables to compareLR1<-polr(formula = BMI_rank~SL+SB+FS, data = data, Hess =TRUE, method ="logistic")SUM1<-summary(LR1)SUM1
Call:
polr(formula = BMI_rank ~ SL + SB + FS, data = data, Hess = TRUE,
method = "logistic")
Coefficients:
Value Std. Error t value
SL -0.5179 0.09863 -5.250
SB 0.2064 0.07715 2.676
FS -0.1336 0.08003 -1.669
Intercepts:
Value Std. Error t value
1|2 -1.3558 0.3381 -4.0102
2|3 2.0251 0.3551 5.7024
Residual Deviance: 2136.853
AIC: 2146.853
(36 observations deleted due to missingness)
Value Std. Error t value p value
SL -0.5178720 0.09863452 -5.250414 1.517580e-07
SB 0.2064347 0.07715030 2.675748 7.456279e-03
FS -0.1336130 0.08003340 -1.669465 9.502524e-02
1|2 -1.3557600 0.33807702 -4.010210 6.066466e-05
2|3 2.0251068 0.35513075 5.702426 1.181141e-08
Code
coef(SUM1)
Value Std. Error t value
SL -0.5178720 0.09863452 -5.250414
SB 0.2064347 0.07715030 2.675748
FS -0.1336130 0.08003340 -1.669465
1|2 -1.3557600 0.33807702 -4.010210
2|3 2.0251068 0.35513075 5.702426
Code
exp(coef(SUM1))
Value Std. Error t value
SL 0.5957870 1.103663 5.245348e-03
SB 1.2292875 1.080204 1.452320e+01
FS 0.8749286 1.083323 1.883478e-01
1|2 0.2577513 1.402249 1.812958e-02
2|3 7.5769204 1.426367 2.995934e+02
Code
### Predict probability# Create a data frame with possible IV valuesnewdat <-data.frame(FS =rep(1:5, each =272),SL =rep(1:4, each =340),SB =rep(1:4, each =340),BMI =rep(seq(from =12.8, to =39, length.out =340), 4))# Get the predicted probability newdat <-cbind(newdat, predict(LR1, newdat, type ="probs"))# Keeping the category with the highest probabilitylnewdat <-melt(newdat, id.vars =c("FS", "SL", "SB","BMI"),variable.name ="Level", value.name="Probability")# Visualizing probabilityggplot(lnewdat, aes(x = BMI, y = Probability, colour = Level)) +geom_line() +facet_grid(FS ~ SL, labeller="label_both")
Code
#plotggplot(data, aes(x = SL, y = BMI)) +geom_point(color ="red4") +geom_smooth(method ="lm", se =FALSE,color="indianred")
3. More sibling reduce the risk of both depression and anxiety.
H0=no relationship between sibling numbers and depression rate
Ha=more siblings reduce the risk of depression
The sibling numbers here indicates as 1: only child and 2: have siblings. Therefore, a welch two sample t-test and corelation test were performed to explore the relationship.
Code
ggplot(data=subset(data,!is.na(NS)), aes(x =factor(NS), y = T1depression)) +geom_boxplot(color ="red4")+geom_jitter(color ="tomato1")+xlab("Number of sibling")
The analysis revealed a positive correlation between the number of siblings and depression score, with a sample estimate of the correlation coefficient (rho) of 0.0716. This suggests that as the number of siblings increases, the depression score tends to be higher. The statistical significance of this correlation was confirmed by a p-value of 0.0085, indicating that the observed relationship is unlikely to occur by chance. Furthermore, a Welch t-test was conducted, which demonstrated that the group with more siblings had a significantly higher depression index. Based on these findings, we can conclude that there is a positive relationship between the number of siblings and depression score, suggesting that having more siblings may contribute to increased levels of depression. Therefore, we conclude that number of siblings has a postive relationship with depression score.
Conclusion
In this study, we investigated the relationship between depression, obesity, family financial situation, and sibling numbers among young adolescents.
Firstly, we examined the association between depression and obesity using various linear regression models. The results indicated that there was no significant relationship between depression and obesity in this population. Next, we focused on the impact of family financial situation on obesity using ordinal logistic regression. Our findings revealed that higher family financial situation was associated with a higher likelihood of having abnormal weight, including being underweight or overweight, rather than having a normal weight. Furthermore, we tested the hypothesis regarding the association between depression and sibling numbers using a Welch t-test. Interestingly, the results showed that adolescents with siblings had significantly lower depression scores compared to those without siblings. Overall, this study contributes to our understanding of the factors influencing the mental and physical well-being of young adolescents. The findings suggest that family financial situation plays a role in the occurrence of abnormal weight, while the presence of siblings appears to have a protective effect against depression. These results emphasize the importance of considering familial and social factors in addressing the mental and physical health of young individuals. Further research and interventions can build upon these findings to develop strategies for promoting healthier outcomes in this population.
Source Code
---title: "Final Project "author: "Xiaoyan"description: "Template of course blog qmd file"date: "05/17/2023"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - finalpart1---```{r}library(tidyr)library(dplyr)library(readxl)library(ggplot2)library(MASS)library(reshape2)```# {.tabset}## Introduction and background The implementation of the one-child policy by the Chinese government in 1979 led to an increase in the number of families with only one child and a unique family structure known as the "four-two-one" model, consisting of four grandparents, two parents, and one child. While being part of such a family structure provides certain advantages in terms of family and social resources, children without siblings, commonly referred to as "only children," may experience various physical and socio-psychological challenges during their development.One notable concern is the increased risk of overweight and obesity among only children. These children are more likely to struggle with weight-related issues compared to their counterparts who have one or more siblings. Additionally, the psychosocial consequences associated with being an only child are also worth investigating.In this context, it is important to explore not only the relationship between overweight/obesity and mental health in young adolescents but also how the presence or absence of siblings and other factors into this relationship.Overall, investigating the link between overweight/obesity, mental health, and sib-size in young adolescents within the context of the one-child policy can shed light on the potential challenges faced by only children and contribute to a better understanding of their overall well-being.## research questions1. Does obesity positively related to depression rate?2. What are factors that affects obesity?3. Does sibling or obesity directly related to depression?## key predictors1. depression rate2. sibling number3. obesity rate4. Family location, finance and education## hypothesis1. Higher obesity rate increase the risk of depression2. higher family income increase the rate of obesity3. More sibling reduce the risk of depression ## data description### overlook of data```{r}data<-read_excel("/Users/cassie199/Desktop/23spring/603_Spring_2023-1/posts/_data/mentalhealth_data.xlsx")head(data)sum(is.na(data))plot(data$T0depression~data$BMI)```This dataset including 1348 variables and 29 columns. there are 728 NA in this data set. all variables was presented as numberic data. descriptive data was also presented as degrees such as education level, family financial situation and depression rate. By pre-plotting depression rate vs BMI, we can see that some ouliers may need to deal with and there is no siginifcant disrtibution on graph. More data processing is needed in future process.Modified column name ```{r}variables <-c("Family location", "Number of siblings", " time spend with father in elementary school?", " time spend with mother in elementary school?", "Father’s education level", "Mother’s education level", "Family financial situation", "Sleeping hours", "Skipping breakfast", "Vigorous", "Moderate")abreviations <-c("FL", "NS", "TFE", "TME", "FEL", "MEL", "FS", "SL", "SB", "VG", "MD")cat("varible table\n")variable_table <-data.frame(variables, abreviations)variable_tablecolnames(data)<-c("T0depression","T0anxiety","T1depression","T1anxiety","Height","Weight","WC","HC","SBP","DBP","FBG","TC","TG","HDL-C","LDL-C","BMI","WHR","WtHR","FL", "NS", "TFE", "TME", "FEL", "MEL", "FS", "SL", "SB","Vigorous","Moderate")```### parameter explainationBMI (body mass index) in this study is used as indicator of obisity. NIH divided BMI value into three levels as table below.```{r}# Create the data frame for BMI categoriesbmi_levels <-c("Underweight", "Normal Weight", "Overweight")bmi_values <-c("<18.5", "18.5-24.9", ">=25")bmi_table <-data.frame(Category = bmi_levels, BMI = bmi_values)# Print the BMI category tablecat("\nBMI Categories\n")print(bmi_table)```### data explanatorySome key predictors were plotted. The distribution of family location was plotted in the first chart and the distribution of family financial situation were plotted in the second chart.```{r}data$proportion <- data$FL /sum(data$FL)data$category <-factor(data$FS, levels =c("1", "2", "3","4"))ggplot(data, aes(x ="", y = proportion, fill = category)) +geom_bar(stat ="identity", width =1) +coord_polar(theta ="y") +labs(fill ="Category") +theme_void()data$proportion1 <- data$FS /sum(data$FS)ggplot(data, aes(x ="", y = proportion1, fill = category)) +geom_bar(stat ="identity", width =1) +coord_polar(theta ="y") +labs(fill ="Category") +theme_void()```A scatter plot was used to visualize the relationship between skipping breakfast and BMI rate. Only by scatter plot it is difficult to observe the relationship between two varibales. Therefore, further analysis is needed ```{r}ggplot(data, aes(x = SB, y = BMI)) +geom_jitter(width =0.2, height =0, color ="indianred", alpha =0.5) +xlab("skipping breakfast")```## hypothesis test### 1. Higher obesity rate increase the risk of depressionH0=no relationship between obesity rate and the risk of depressionHa=higher obesity rate increases the risk of depressionIn order to prove this hypothesis, linear model was used to calculate relationship between depression rate and BMI.```{r}#linear regression of depresison and BMIlm0<-lm(T1depression ~ BMI, data = data)summary(lm0)```The BMI coefficient (-0.09528) represents the estimated change in the depression score for a one-unit increase in BMI. For each unit increase in BMI, the depression score decreases by 0.09528. With p-value of 0.222, the coefficient is not significant. Therefore, there is no strong evidence of a linear relationship between BMI and depression. the residuals range from -19.2371 to 21.9845. The residual standard error is relatively high also indicates the model is not fit to the data. F-staistic gives an overall sinificance of the model and with a high p-value, this model is also not statically significant as a whole. The multiple R-squared value (0.001139) represents the proportion of variance in the depression score explained by the model and only 0.1139% of the variability in depression can be attributed to the linear relationship with BMI.The adjusted R-squared value (0.0003749) adjusts the multiple R-squared value for the number of predictors in the model. It penalizes the inclusion of unnecessary predictors. A lower adjusted R-squared suggests that the model does not provide a good fit to the data. In summary, based on the provided output, there is no strong evidence to support a linear relationship between BMI and depression. The coefficient for BMI is not statistically significant, and the model's overall fit is weak (low R-squared values and non-significant F-statistic). ```{r}#diagnosticpar(mfrow =c(2,2))plot(lm0)```Linear regression diagnostic plot was used to evaluate the performance.In residual vs fitted plot, the a horizontal red line represent the mean or expected value of the residuals. and the residuals are evenly distributes above and below the horizontal line. This indicates the linear model was fitted to our data. In normal Q_Q plot, the straight pattern suggests that the residuals of a linear regression model follow a normal distribution,supporting the assumption of normality. In a scale-location plot , a straight red line typically indicates homoscedasticity, which means that the residuals have a constant variance across different levels of the predictor variable(s). The scale-location plot detects any systematic patterns in the spread (variance) of the residuals.Here, the plot suggested that the assumption of homoscedasticity is met.According to these diagnostics,the linear model is reliable and presenting the relationship properly.```{r}#visualizationggplot(data, aes(x = BMI, y = T1depression)) +geom_point(color ="indianred") +geom_smooth(method ="lm", se =FALSE, color ="darkred")plott1depression<-data$T1depression[1:length(predict(lm0))]plot_data <-data.frame(Predicted_value =predict(lm0), Observed_value = plott1depression)ggplot(plot_data, aes(x = Predicted_value, y = Observed_value)) +geom_point() +geom_abline(intercept =0, slope =1, color ="green")```In addition to BMI, several other variables were included in the analysis to examine their relationship with depression rate in children. Two variables, namely time spent with father in elementary school and frequency of skipping breakfast, emerged as significant factors influencing children's depression rate.The findings revealed that less time spent with father in elementary school was associated with a higher likelihood of experiencing an increase in depression rate among children. This suggests the importance of positive father-child interactions and involvement during this critical period of development.The analysis showed that skipping breakfast more frequently was also linked to a higher depression rate in children. With diagnostic, there was not a substantial deviation from the assumptions of the model. This suggests that the linear regression analysis provided a reasonable fit to the data and supported the interpretation of the results.```{r}#linear regression model 2lm1<-lm(T1depression ~ BMI+NS+TFE+TME+FEL+MEL+FL+SL+SB, data = data)summary(lm1)#diagnosticpar(mfrow =c(2,2))plot(lm1)```To verify if the model is correct, some of varibles with large p value are deleted for backward elimination. "Time spend with mother in elementary school", "Father's educaion level", "sleeping time" are deleted comparing to the model before. In this case, "Time spend with father in elementary school" and "skipping breakfast" still above the significant level. By comparing the adjusted R square of two models(0.03124 and 0.03321). There was no not two big difference in these two models. ```{r}#linear regression model 3lm2<-lm(T1depression ~ BMI+NS+TFE+FEL+SB, data = data)summary(lm2)#diagnostic par(mfrow =c(2,3))plot(lm2)#compare predicted value with observe valuelm3<-lm(T1depression ~ TFE, data = data)plot_data <-data.frame(Predicted_value =predict(lm3), Observed_value = data$T1depression[1:length(predict(lm3))])ggplot(plot_data, aes(x = Predicted_value, y = Observed_value)) +geom_point() +geom_abline(intercept =0, slope =1, color ="green")```Predicted value was plotted between first linear model and actual value.The regression line was slightly decreasing. This also implies the linear model is not statistic significant. In summary, based on the provided output, there is no strong evidence to support a linear relationship between BMI and depression. The coefficient for BMI is not statistically significant, and the model's overall fit is weak (low R-squared values and non-significant F-statistic). Therefore, we fail to reject the null hypothesis. ### 2. higher family income increase the rate of obesityH0=no relationship between family income and rate of obesity Ha=higher family income increase the rate of obesityBased on this data, we may also explore what factors may affect the obesity rate. Here we made a hypothsis as higher family income increase the rate of obesity. Due to most of the variables are ordinal variables, ordinal logistic regression is applied in this slot to verify the hypothesis. ```{r}#convert BMI to ordinal varibledata$BMI_category <-cut(data$BMI, breaks =c(-Inf, 18.5, 24.9, Inf),labels =c("Underweight", "Normal weight", "Overweight"))data$BMI_rank <-as.factor(unclass(data$BMI_category))# Visualizing data# Filter out rows with NA values in BMI_rank or FSfiltered_data <- data[complete.cases(data$BMI_rank, data$FS), ]# Create the plot with filtered dataggplot(filtered_data, aes(x = BMI_category, y = FS)) +geom_boxplot(size =0.75, color ="indianred") +geom_jitter(alpha =0.5, color ="red") +theme(axis.text.x =element_text(angle =45, hjust =1, vjust =1))+ylab("Financial Situation")# #Fit BMI rank and family finanacial situation into ordinal logit modelmodel <-polr(BMI_rank ~ FS, data = data, Hess=TRUE)summary(model)#p valuectable <-coef(summary(model))p <-pnorm(abs(ctable[, "t value"]), lower.tail =FALSE) *2ctable <-cbind(ctable, "p value"= p)ctable``````{r}# Getting odds-ratioexp(coef(model))```The findings of the study indicate that financial situation (FS) is significantly associated with BMI ranks. The coefficient of -0.1883 suggests that higher financial situation is linked to a decreased likelihood of being in a higher BMI rank. Specifically, individuals with higher financial situation are less likely to fall into the normal weight category compared to the underweight category (1|2) with a coefficient of -0.7067. Conversely, they are more likely to be in the overweight category compared to the normal weight category (2|3) with a coefficient of 2.6195. These relationships were found to be statistically significant, indicating that the observed associations are unlikely to occur by chance. Furthermore, the inclusion of additional variables in the model improved its predictive ability, as indicated by the slightly lower AIC value. These findings highlight the importance of considering financial situation as a factor influencing BMI ranks and provide insights into the complex relationship between socioeconomic factors and weight status.The null hypothsis is therefore rejected```{r}#introduce more variables to compareLR1<-polr(formula = BMI_rank~SL+SB+FS, data = data, Hess =TRUE, method ="logistic")SUM1<-summary(LR1)SUM1#p valuectable2 <-coef(summary(LR1))p <-pnorm(abs(ctable2[, "t value"]), lower.tail =FALSE) *2ctable2 <-cbind(ctable2, "p value"= p)ctable2``````{r}coef(SUM1)exp(coef(SUM1))### Predict probability# Create a data frame with possible IV valuesnewdat <-data.frame(FS =rep(1:5, each =272),SL =rep(1:4, each =340),SB =rep(1:4, each =340),BMI =rep(seq(from =12.8, to =39, length.out =340), 4))# Get the predicted probability newdat <-cbind(newdat, predict(LR1, newdat, type ="probs"))# Keeping the category with the highest probabilitylnewdat <-melt(newdat, id.vars =c("FS", "SL", "SB","BMI"),variable.name ="Level", value.name="Probability")# Visualizing probabilityggplot(lnewdat, aes(x = BMI, y = Probability, colour = Level)) +geom_line() +facet_grid(FS ~ SL, labeller="label_both")#plotggplot(data, aes(x = SL, y = BMI)) +geom_point(color ="red4") +geom_smooth(method ="lm", se =FALSE,color="indianred")```### 3. More sibling reduce the risk of both depression and anxiety. H0=no relationship between sibling numbers and depression rateHa=more siblings reduce the risk of depressionThe sibling numbers here indicates as 1: only child and 2: have siblings. Therefore, a welch two sample t-test and corelation test were performed to explore the relationship. ```{r}ggplot(data=subset(data,!is.na(NS)), aes(x =factor(NS), y = T1depression)) +geom_boxplot(color ="red4")+geom_jitter(color ="tomato1")+xlab("Number of sibling")var1<-as.numeric(data$T1depression)var2<-as.numeric(data$NS)```The analysis revealed a positive correlation between the number of siblings and depression score, with a sample estimate of the correlation coefficient (rho) of 0.0716. This suggests that as the number of siblings increases, the depression score tends to be higher. The statistical significance of this correlation was confirmed by a p-value of 0.0085, indicating that the observed relationship is unlikely to occur by chance. Furthermore, a Welch t-test was conducted, which demonstrated that the group with more siblings had a significantly higher depression index. Based on these findings, we can conclude that there is a positive relationship between the number of siblings and depression score, suggesting that having more siblings may contribute to increased levels of depression.Therefore, we conclude that number of siblings has a postive relationship with depression score.## Conclusion In this study, we investigated the relationship between depression, obesity, family financial situation, and sibling numbers among young adolescents. Firstly, we examined the association between depression and obesity using various linear regression models. The results indicated that there was no significant relationship between depression and obesity in this population.Next, we focused on the impact of family financial situation on obesity using ordinal logistic regression. Our findings revealed that higher family financial situation was associated with a higher likelihood of having abnormal weight, including being underweight or overweight, rather than having a normal weight.Furthermore, we tested the hypothesis regarding the association between depression and sibling numbers using a Welch t-test. Interestingly, the results showed that adolescents with siblings had significantly lower depression scores compared to those without siblings. Overall, this study contributes to our understanding of the factors influencing the mental and physical well-being of young adolescents. The findings suggest that family financial situation plays a role in the occurrence of abnormal weight, while the presence of siblings appears to have a protective effect against depression. These results emphasize the importance of considering familial and social factors in addressing the mental and physical health of young individuals. Further research and interventions can build upon these findings to develop strategies for promoting healthier outcomes in this population.