Matt Zambetti: HW2 Submission

hw2
Author

Matt Zambetti

Published

June 10, 2023

Homwork Two: Reading in Data

Choosing the Data Set

The data that I am using is the McDonalds menu data. The first few entries can be seen below.

I got the data at: (https://www.kaggle.com/datasets/deepcontractor/mcdonalds-india-menu-nutrition-facts)

Code
mcdonalds_tidy <- read_csv("_data/mcdonaldata.csv")
New names:
Rows: 141 Columns: 14
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(4): item, servesize, calories, menu dbl (10): ...1, protien, totalfat, satfat,
transfat, cholestrol, carbs, suga...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Code
head(mcdonalds_tidy)
# A tibble: 6 × 14
   ...1 item      servesize calories protien totalfat satfat transfat cholestrol
  <dbl> <chr>     <chr>     <chr>      <dbl>    <dbl>  <dbl>    <dbl>      <dbl>
1     0 "McVeggi… 168       402         10.2     13.8   5.34     0.16       2.49
2     1 "McAloo … 146       339          8.5     11.3   4.27     0.2        1.47
3     2 "McSpicy… 199       652         20.3     39.4  17.1      0.18      21.8 
4     3 "Spicy P… 250       674         21.0     39.1  19.7      0.26      40.9 
5     4 "America… 177       512         15.3     23.4  10.5      0.17      25.2 
6     5 "Veg Mah… 306       832         24.2     37.9  16.8      0.28      36.2 
# ℹ 5 more variables: carbs <dbl>, sugar <dbl>, addedsugar <dbl>, sodium <dbl>,
#   menu <chr>

Cleaning the Data Set

In my opinion, there is nothing I need to do with cleaning the data. It is nicely organized and in a “tidy” form.

Narrative About the Data Set

This data provides a plethora of information of each menu option at McDonalds. Each entry contains the serving size, the calories for that serving, common macro-nutrients, and the type of product the entry is (whether its on the regular, gourmet, beverages, etc. menus).

Below is a list of each variable and their type.

Code
for (idx in 2:14) {
  cat(colnames(mcdonalds_tidy)[idx], ": ", sapply(mcdonalds_tidy[idx], typeof),"\n")
}
item :  character 
servesize :  character 
calories :  character 
protien :  double 
totalfat :  double 
satfat :  double 
transfat :  double 
cholestrol :  double 
carbs :  double 
sugar :  double 
addedsugar :  double 
sodium :  double 
menu :  character 

Here we can see the first few entries are character and then once we get into the nutrition facts all of the values are doubles. The final variable ‘menu’ can be seen as a character as well.

Potential Research Questions

As a student athlete that travels a lot I am interested in healthier options in fast food all the time. Using this data I could answer a few potential research questions. What are some lower calorie options at McDonalds, and what are some higher calorie foods that contain more protein? A lot of times I will still eat higher calorie foods even if the macro-nutrients are good. This means finding things that are low in fat and high in protein.

This data can be very useful in finding foods with low saturated and total fat, and also to find foods that I should avoid because of the high fat content. To my surprise, many foods that are lower in carbohydrates tend to be very high in fat which is not always good.

Not only can this analysis of the data be useful for me, but it can be useful for many other athletes are I am sure many experience the same dilemmas.

Another useful piece of data is to find healthier options in each of the menu categories, for example there are the categories “regular” and “beverage”, which finding good options in both of these can help make a better complete meal.