finalpart2
Liam Tucksmith
tidyverse
readxl
ggplot2
dplyr
tidyr
janitor
Author

Liam Tucksmith

Published

May 2, 2023

Code
library(tidyverse)
library(readxl)
library(ggplot2)
library(dplyr)
library(tidyr)
library(janitor)
knitr::opts_chunk$set(echo = TRUE)

Note: I slightly changed my final project to be looking at sports betting rather than media. The media portion was too ambitious so I pivoted to something more straightforward. Please let me know if that is an issue.

The most profitable sports league in the world, the National Football League, generated $18 billion in revenue in 2021 according to sportico.com. This total represents national media rights, league sponsorships with gambling companies, news outlets, and other companies, and shared revenue and royalties from the league’s various affiliates and subsidiaries, such as NFL Enterprises, NFL Properties, and NFL International. Sports betting, an industry linked to the NFL, generated $7.5 in revenue in 2021. As money is what keeps the NFL afloat and allows teams to competitively pay for top players and coaches, NFL teams benefit from sports betting engagement and vice versa. What I plan to analyze how the win-rate of a team relates to the ratio of correct money line predictions for that team, season by season. My hypothesis is that teams with higher win ratios will have higher correct money-line prediction ratios, both when they are favored, not favored, and in total.

The first dataset is a collection of all the matchups and scores that occurred in the NFL since 1966. The matchup information includes the names of the two teams competing, home/away team status, the match site, the weather on game day, and the final score for each team. Sports betting data for each game since 1977 is also included, and includes who was favored (money line), the point spread (point differential), and the over/under line. The dataset is from https://www.kaggle.com/datasets/tobycrabtree/nfl-scores-and-betting-data and was created from a variety of sources including games and scores from public websites such as ESPN, NFL.com, and Pro Football Reference. Weather information is from NOAA data, cross-referenced with NFLweather.com. Betting data reflects lines available at sportsline.com and aussportsbetting.com. For the analysis, I will limit the data to the data collected from the previous 5 years.

Data import and clean-up

Code
#read in data

scores <- read.csv("~/Documents/GitHub/603_Spring_2023/posts/_data/spreadspoke_scores.csv")
colnames(scores)
 [1] "schedule_date"       "schedule_season"     "schedule_week"      
 [4] "schedule_playoff"    "team_home"           "score_home"         
 [7] "score_away"          "team_away"           "team_favorite_id"   
[10] "spread_favorite"     "over_under_line"     "stadium"            
[13] "stadium_neutral"     "weather_temperature" "weather_wind_mph"   
[16] "weather_humidity"    "weather_detail"     
Code
scores <- scores[scores$schedule_season >= 2018, ]
summary(scores)
 schedule_date      schedule_season schedule_week      schedule_playoff
 Length:1372        Min.   :2018    Length:1372        Mode :logical   
 Class :character   1st Qu.:2019    Class :character   FALSE:1311      
 Mode  :character   Median :2020    Mode  :character   TRUE :61        
                    Mean   :2020                                       
                    3rd Qu.:2021                                       
                    Max.   :2022                                       
                                                                       
  team_home           score_home      score_away     team_away        
 Length:1372        Min.   : 0.00   Min.   : 0.00   Length:1372       
 Class :character   1st Qu.:17.00   1st Qu.:16.00   Class :character  
 Mode  :character   Median :24.00   Median :23.00   Mode  :character  
                    Mean   :23.82   Mean   :22.52                     
                    3rd Qu.:30.25   3rd Qu.:30.00                     
                    Max.   :56.00   Max.   :59.00                     
                                                                      
 team_favorite_id   spread_favorite   over_under_line   stadium         
 Length:1372        Min.   :-21.500   Min.   :32.00   Length:1372       
 Class :character   1st Qu.: -7.000   1st Qu.:43.00   Class :character  
 Mode  :character   Median : -4.500   Median :46.00   Mode  :character  
                    Mean   : -5.476   Mean   :46.12                     
                    3rd Qu.: -3.000   3rd Qu.:49.00                     
                    Max.   : -1.000   Max.   :63.50                     
                                                                        
 stadium_neutral weather_temperature weather_wind_mph weather_humidity
 Mode :logical   Min.   : 7.0        Min.   : 0.000   Min.   :  8.00  
 FALSE:1350      1st Qu.:59.0        1st Qu.: 0.000   1st Qu.: 54.00  
 TRUE :22        Median :72.0        Median : 0.000   Median : 67.00  
                 Mean   :64.8        Mean   : 3.219   Mean   : 66.59  
                 3rd Qu.:72.0        3rd Qu.: 6.000   3rd Qu.: 80.25  
                 Max.   :97.0        Max.   :33.000   Max.   :100.00  
                 NA's   :695         NA's   :695      NA's   :1292    
 weather_detail    
 Length:1372       
 Class :character  
 Mode  :character  
                   
                   
                   
                   
Code
glimpse(scores)
Rows: 1,372
Columns: 17
$ schedule_date       <chr> "9/6/2018", "9/9/2018", "9/9/2018", "9/9/2018", "9…
$ schedule_season     <int> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 20…
$ schedule_week       <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", …
$ schedule_playoff    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ team_home           <chr> "Philadelphia Eagles", "Arizona Cardinals", "Balti…
$ score_home          <int> 18, 6, 47, 16, 21, 27, 24, 23, 28, 27, 24, 27, 40,…
$ score_away          <int> 12, 24, 3, 8, 21, 24, 23, 34, 38, 20, 16, 20, 48, …
$ team_away           <chr> "Atlanta Falcons", "Washington Redskins", "Buffalo…
$ team_favorite_id    <chr> "PHI", "ARI", "BAL", "CAR", "PIT", "DEN", "GB", "C…
$ spread_favorite     <dbl> -1.0, -2.0, -7.5, -2.5, -3.5, -3.0, -6.5, -1.0, -3…
$ over_under_line     <dbl> 44.5, 43.5, 39.0, 42.5, 41.0, 42.5, 45.0, 47.5, 48…
$ stadium             <chr> "Lincoln Financial Field", "University of Phoenix …
$ stadium_neutral     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
$ weather_temperature <int> 81, 72, 61, 81, 58, 84, 62, 72, 84, 89, 72, 62, 72…
$ weather_wind_mph    <int> 8, 0, 12, 9, 11, 5, 10, 0, 8, 7, 0, 10, 0, 7, 0, 1…
$ weather_humidity    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ weather_detail      <chr> "rain", "indoor", "rain", "rain", "rain", "", "", …

Data cleanup and descriptive analytics

Code
#map team names to abbreviations
unique(scores$team_home)
 [1] "Philadelphia Eagles"      "Arizona Cardinals"       
 [3] "Baltimore Ravens"         "Carolina Panthers"       
 [5] "Cleveland Browns"         "Denver Broncos"          
 [7] "Green Bay Packers"        "Indianapolis Colts"      
 [9] "Los Angeles Chargers"     "Miami Dolphins"          
[11] "Minnesota Vikings"        "New England Patriots"    
[13] "New Orleans Saints"       "New York Giants"         
[15] "Detroit Lions"            "Oakland Raiders"         
[17] "Cincinnati Bengals"       "Atlanta Falcons"         
[19] "Buffalo Bills"            "Dallas Cowboys"          
[21] "Jacksonville Jaguars"     "Los Angeles Rams"        
[23] "New York Jets"            "Pittsburgh Steelers"     
[25] "San Francisco 49ers"      "Tampa Bay Buccaneers"    
[27] "Tennessee Titans"         "Washington Redskins"     
[29] "Chicago Bears"            "Houston Texans"          
[31] "Kansas City Chiefs"       "Seattle Seahawks"        
[33] "Washington Football Team" "Las Vegas Raiders"       
[35] "Washington Commanders"   
Code
unique(scores$team_favorite_id)
 [1] "PHI" "ARI" "BAL" "CAR" "PIT" "DEN" "GB"  "CIN" "LAC" "TEN" "MIN" "NE" 
[13] "NO"  "JAX" "DET" "LAR" "ATL" "DAL" "NYJ" "SF"  "HOU" "WAS" "CHI" "CLE"
[25] "KC"  "MIA" "SEA" "IND" "LVR" "TB"  "NYG" "BUF"
Code
#rewrite team_home and team_away to use IDs
scores$team_home[scores$team_home=="Philadelphia Eagles"] <- "PHI"
scores$team_home[scores$team_home=="Arizona Cardinals"] <- "ARI"
scores$team_home[scores$team_home=="Baltimore Ravens"] <- "BAL"
scores$team_home[scores$team_home=="Carolina Panthers"] <- "CAR"
scores$team_home[scores$team_home=="Cleveland Browns"] <- "CLE"
scores$team_home[scores$team_home=="Denver Broncos"] <- "DEN"
scores$team_home[scores$team_home=="Green Bay Packers"] <- "GB"
scores$team_home[scores$team_home=="Indianapolis Colts"] <- "IND"
scores$team_home[scores$team_home=="Los Angeles Chargers"] <- "LAC"
scores$team_home[scores$team_home=="Miami Dolphins"] <- "MIA"
scores$team_home[scores$team_home=="Minnesota Vikings"] <- "MIN"
scores$team_home[scores$team_home=="New England Patriots"] <- "NE"
scores$team_home[scores$team_home=="New Orleans Saints"] <- "NO"
scores$team_home[scores$team_home=="New York Giants"] <- "NYG"
scores$team_home[scores$team_home=="Detroit Lions"] <- "DET"
scores$team_home[scores$team_home=="Oakland Raiders" | 
                 scores$team_home=="Las Vegas Raiders"] <- "LVR"
scores$team_home[scores$team_home=="Cincinnati Bengals"] <- "CIN"
scores$team_home[scores$team_home=="Atlanta Falcons"] <- "ATL"
scores$team_home[scores$team_home=="Buffalo Bills"] <- "BUF"
scores$team_home[scores$team_home=="Dallas Cowboys"] <- "DAL"
scores$team_home[scores$team_home=="Jacksonville Jaguars"] <- "JAX"
scores$team_home[scores$team_home=="Los Angeles Rams"] <- "LAR"
scores$team_home[scores$team_home=="New York Jets"] <- "NYJ"
scores$team_home[scores$team_home=="Pittsburgh Steelers"] <- "PIT"
scores$team_home[scores$team_home=="San Francisco 49ers"] <- "SF"
scores$team_home[scores$team_home=="Tampa Bay Buccaneers"] <- "TB"
scores$team_home[scores$team_home=="Tennessee Titans"] <- "TEN"
scores$team_home[scores$team_home=="Washington Redskins" | 
                 scores$team_home=="Washington Football Team"|
                 scores$team_home=="Washington Commanders"] <- "WAS"
scores$team_home[scores$team_home=="Chicago Bears"] <- "CHI"
scores$team_home[scores$team_home=="Houston Texans"] <- "HOU"
scores$team_home[scores$team_home=="Kansas City Chiefs"] <- "KC"
scores$team_home[scores$team_home=="Seattle Seahawks"] <- "SEA"

scores$team_away[scores$team_away=="Philadelphia Eagles"] <- "PHI"
scores$team_away[scores$team_away=="Arizona Cardinals"] <- "ARI"
scores$team_away[scores$team_away=="Baltimore Ravens"] <- "BAL"
scores$team_away[scores$team_away=="Carolina Panthers"] <- "CAR"
scores$team_away[scores$team_away=="Cleveland Browns"] <- "CLE"
scores$team_away[scores$team_away=="Denver Broncos"] <- "DEN"
scores$team_away[scores$team_away=="Green Bay Packers"] <- "GB"
scores$team_away[scores$team_away=="Indianapolis Colts"] <- "IND"
scores$team_away[scores$team_away=="Los Angeles Chargers"] <- "LAC"
scores$team_away[scores$team_away=="Miami Dolphins"] <- "MIA"
scores$team_away[scores$team_away=="Minnesota Vikings"] <- "MIN"
scores$team_away[scores$team_away=="New England Patriots"] <- "NE"
scores$team_away[scores$team_away=="New Orleans Saints"] <- "NO"
scores$team_away[scores$team_away=="New York Giants"] <- "NYG"
scores$team_away[scores$team_away=="Detroit Lions"] <- "DET"
scores$team_away[scores$team_away=="Oakland Raiders" | 
                 scores$team_away=="Las Vegas Raiders"] <- "LVR"
scores$team_away[scores$team_away=="Cincinnati Bengals"] <- "CIN"
scores$team_away[scores$team_away=="Atlanta Falcons"] <- "ATL"
scores$team_away[scores$team_away=="Buffalo Bills"] <- "BUF"
scores$team_away[scores$team_away=="Dallas Cowboys"] <- "DAL"
scores$team_away[scores$team_away=="Jacksonville Jaguars"] <- "JAX"
scores$team_away[scores$team_away=="Los Angeles Rams"] <- "LAR"
scores$team_away[scores$team_away=="New York Jets"] <- "NYJ"
scores$team_away[scores$team_away=="Pittsburgh Steelers"] <- "PIT"
scores$team_away[scores$team_away=="San Francisco 49ers"] <- "SF"
scores$team_away[scores$team_away=="Tampa Bay Buccaneers"] <- "TB"
scores$team_away[scores$team_away=="Tennessee Titans"] <- "TEN"
scores$team_away[scores$team_away=="Washington Redskins" | 
                 scores$team_away=="Washington Football Team"|
                 scores$team_away=="Washington Commanders"] <- "WAS"
scores$team_away[scores$team_away=="Chicago Bears"] <- "CHI"
scores$team_away[scores$team_away=="Houston Texans"] <- "HOU"
scores$team_away[scores$team_away=="Kansas City Chiefs"] <- "KC"
scores$team_away[scores$team_away=="Seattle Seahawks"] <- "SEA"

unique(scores$team_home)
 [1] "PHI" "ARI" "BAL" "CAR" "CLE" "DEN" "GB"  "IND" "LAC" "MIA" "MIN" "NE" 
[13] "NO"  "NYG" "DET" "LVR" "CIN" "ATL" "BUF" "DAL" "JAX" "LAR" "NYJ" "PIT"
[25] "SF"  "TB"  "TEN" "WAS" "CHI" "HOU" "KC"  "SEA"
Code
n_distinct(scores$team_home)
[1] 32
Code
unique(scores$team_away)
 [1] "ATL" "WAS" "BUF" "DAL" "PIT" "SEA" "CHI" "CIN" "KC"  "TEN" "SF"  "HOU"
[13] "TB"  "JAX" "NYJ" "LAR" "BAL" "CAR" "LAC" "NYG" "LVR" "MIN" "NE"  "ARI"
[25] "CLE" "MIA" "DET" "PHI" "IND" "NO"  "DEN" "GB" 
Code
#add column for game winner
scores$score_home = as.integer(scores$score_home)
scores$score_away = as.integer(scores$score_away)

scores$winner = ifelse(scores$score_home == scores$score_away, "TIE", ifelse(scores$score_home > scores$score_away, scores$team_home, scores$team_away))

#add column for if money line prediction is true
scores$mlTF = ifelse(scores$winner == scores$team_favorite_id, TRUE, FALSE)
scores$mlHomeAway = ifelse(scores$team_home == scores$team_favorite_id, "Home", "Away")

#get counts for wins and losses, correct and incorrect moneyline predictions when favored and not favored, for away and home games
home_team <- scores %>% group_by(team_home, schedule_season) %>% 
                  summarise(home_wins = sum(team_home == winner),
                            home_losses = sum(team_away == winner),
                            ml_incorr_favored_h = sum(mlTF == FALSE & mlHomeAway == "Home"),
                            ml_incorr_unfavored_h = sum(mlTF == FALSE & mlHomeAway == "Away"),
                            ml_corr_favored_h = sum(mlTF == TRUE & mlHomeAway == "Home"),
                            ml_corr_unfavored_h = sum(mlTF == TRUE & mlHomeAway == "Away"),
                            .groups = 'drop')

away_team <- scores %>% group_by(team_away, schedule_season) %>% 
                  summarise(away_wins = sum(team_away == winner),
                            away_losses = sum(team_home == winner),
                            ml_incorr_favored_a = sum(mlTF == FALSE & mlHomeAway == "Away"),
                            ml_incorr_unfavored_a = sum(mlTF == FALSE & mlHomeAway == "Home"),
                            ml_corr_favored_a = sum(mlTF == TRUE & mlHomeAway == "Away"),
                            ml_corr_unfavored_a = sum(mlTF == TRUE & mlHomeAway == "Home"),
                            .groups = 'drop')

#merge home and away data
teams <- merge(home_team, away_team, 
               by.x = c("team_home","schedule_season"), 
               by.y = c("team_away","schedule_season"))

colnames(teams)
 [1] "team_home"             "schedule_season"       "home_wins"            
 [4] "home_losses"           "ml_incorr_favored_h"   "ml_incorr_unfavored_h"
 [7] "ml_corr_favored_h"     "ml_corr_unfavored_h"   "away_wins"            
[10] "away_losses"           "ml_incorr_favored_a"   "ml_incorr_unfavored_a"
[13] "ml_corr_favored_a"     "ml_corr_unfavored_a"  
Code
teams <- teams %>% 
       rename("team" = "team_home")

#combine home and away fields 
teams$total_wins <- teams$home_wins + teams$away_wins
teams$total_losses <- teams$home_losses + teams$away_losses
teams$total_home_games <- teams$home_wins + teams$home_losses

#combine favored and unfavored, correct and incorrect data
teams$ml_corr_favored <- teams$ml_corr_favored_h + teams$ml_corr_favored_a 
teams$ml_corr_unfavored <- teams$ml_corr_unfavored_h + teams$ml_corr_unfavored_a
teams$ml_corr_total <- teams$ml_corr_favored + teams$ml_corr_unfavored 
teams$ml_incorr_favored <- teams$ml_incorr_favored_h + teams$ml_incorr_favored_a
teams$ml_incorr_unfavored <- teams$ml_incorr_unfavored_h + teams$ml_incorr_unfavored_a
teams$ml_incorr_total <- teams$ml_incorr_favored + teams$ml_incorr_unfavored 

#calculate win, correct, uncorrect, and total money line predictions
teams$win_ratio = teams$total_wins/(teams$total_wins + teams$total_losses)
teams$ratio_corr_favored = teams$ml_corr_favored/(teams$ml_corr_favored + teams$ml_incorr_favored)
teams$ratio_corr_unfavored = teams$ml_corr_unfavored/(teams$ml_corr_unfavored + teams$ml_incorr_unfavored)
teams$ratio_corr_total = teams$ml_corr_total/(teams$ml_corr_total + teams$ml_incorr_total)
Code
head(teams, 50)
   team schedule_season home_wins home_losses ml_incorr_favored_h
1   ARI            2018         1           7                   2
2   ARI            2019         2           5                   1
3   ARI            2020         4           4                   3
4   ARI            2021         3           5                   5
5   ARI            2022         1           8                   1
6   ATL            2018         4           4                   4
7   ATL            2019         3           5                   2
8   ATL            2020         2           6                   3
9   ATL            2021         2           6                   2
10  ATL            2022         6           3                   0
11  BAL            2018         6           3                   3
12  BAL            2019         7           2                   2
13  BAL            2020         5           3                   3
14  BAL            2021         5           4                   2
15  BAL            2022         5           3                   2
16  BUF            2018         4           4                   1
17  BUF            2019         4           4                   1
18  BUF            2020         9           1                   0
19  BUF            2021         7           3                   3
20  BUF            2022         8           2                   2
21  CAR            2018         5           3                   1
22  CAR            2019         2           6                   3
23  CAR            2020         2           6                   3
24  CAR            2021         2           6                   3
25  CAR            2022         5           4                   3
26  CHI            2018         7           2                   1
27  CHI            2019         4           4                   3
28  CHI            2020         3           5                   1
29  CHI            2021         3           5                   0
30  CHI            2022         2           7                   1
31  CIN            2018         4           4                   1
32  CIN            2019         2           6                   2
33  CIN            2020         3           5                   0
34  CIN            2021         6           5                   2
35  CIN            2022         7           1                   1
36  CLE            2018         5           2                   1
37  CLE            2019         4           4                   2
38  CLE            2020         6           2                   1
39  CLE            2021         6           3                   2
40  CLE            2022         4           4                   3
41  DAL            2018         8           1                   1
42  DAL            2019         5           3                   3
43  DAL            2020         4           4                   2
44  DAL            2021         5           4                   4
45  DAL            2022         8           1                   0
46  DEN            2018         3           5                   2
47  DEN            2019         5           3                   1
48  DEN            2020         2           6                   0
49  DEN            2021         4           5                   4
50  DEN            2022         4           4                   2
   ml_incorr_unfavored_h ml_corr_favored_h ml_corr_unfavored_h away_wins
1                      0                 1                   5         2
2                      3                 0                   4         3
3                      1                 3                   1         4
4                      0                 3                   0         8
5                      0                 1                   7         3
6                      0                 4                   0         3
7                      1                 2                   3         4
8                      1                 1                   3         2
9                      0                 2                   4         5
10                     2                 4                   3         1
11                     0                 6                   0         4
12                     2                 5                   0         7
13                     0                 5                   0         7
14                     1                 4                   2         3
15                     0                 5                   1         5
16                     2                 2                   3         2
17                     0                 4                   3         6
18                     2                 7                   1         6
19                     0                 7                   0         5
20                     0                 8                   0         6
21                     1                 4                   2         2
22                     0                 2                   3         3
23                     2                 0                   3         3
24                     1                 1                   3         3
25                     5                 0                   1         2
26                     1                 6                   1         5
27                     1                 3                   1         4
28                     2                 1                   4         5
29                     0                 3                   5         3
30                     1                 1                   6         1
31                     0                 4                   3         2
32                     2                 0                   4         0
33                     2                 1                   5         1
34                     2                 4                   3         7
35                     1                 6                   0         7
36                     4                 2                   1         2
37                     0                 4                   2         2
38                     0                 6                   1         6
39                     0                 6                   1         2
40                     2                 2                   1         3
41                     2                 6                   0         3
42                     1                 4                   0         3
43                     2                 2                   2         2
44                     0                 5                   0         7
45                     1                 7                   1         5
46                     1                 2                   3         3
47                     2                 3                   2         2
48                     2                 0                   6         3
49                     1                 3                   1         3
50                     1                 3                   2         1
   away_losses ml_incorr_favored_a ml_incorr_unfavored_a ml_corr_favored_a
1            6                   0                     2                 0
2            5                   0                     3                 0
3            4                   2                     1                 3
4            2                   1                     6                 2
5            5                   1                     3                 0
6            5                   1                     1                 2
7            4                   1                     4                 0
8            6                   0                     2                 0
9            4                   0                     3                 2
10           7                   1                     1                 0
11           4                   2                     3                 1
12           1                   0                     1                 6
13           3                   1                     0                 7
14           5                   3                     1                 2
15           5                   2                     2                 3
16           6                   0                     2                 0
17           3                   0                     4                 2
18           3                   1                     1                 5
19           4                   2                     2                 3
20           2                   2                     0                 6
21           6                   4                     2                 0
22           5                   0                     2                 1
23           5                   0                     2                 1
24           6                   2                     2                 1
25           6                   1                     2                 0
26           3                   2                     1                 4
27           4                   1                     0                 4
28           4                   0                     4                 1
29           6                   0                     2                 1
30           7                   0                     1                 0
31           6                   0                     1                 1
32           8                   1                     0                 0
33           6                   0                     2                 0
34           3                   1                     5                 2
35           4                   2                     1                 6
36           6                   0                     1                 1
37           6                   4                     1                 1
38           4                   1                     3                 3
39           6                   0                     1                 1
40           6                   1                     2                 1
41           6                   1                     3                 0
42           5                   4                     0                 3
43           6                   2                     1                 1
44           2                   0                     2                 5
45           5                   3                     2                 3
46           5                   2                     1                 2
47           6                   1                     2                 0
48           5                   0                     3                 0
49           5                   2                     1                 2
50           8                   3                     1                 0
   ml_corr_unfavored_a total_wins total_losses total_home_games ml_corr_favored
1                    6          3           13                8               1
2                    5          5           10                7               0
3                    2          8            8                8               6
4                    1         11            7                8               5
5                    4          4           13                9               1
6                    4          7            9                8               6
7                    3          7            9                8               2
8                    6          4           12                8               1
9                    4          7           10                8               4
10                   6          7           10                9               4
11                   2         10            7                9               7
12                   1         14            3                9              11
13                   2         12            6                8              12
14                   2          8            9                9               6
15                   3         10            8                8               8
16                   6          6           10                8               2
17                   3         10            7                8               6
18                   2         15            4               10              12
19                   2         12            7               10              10
20                   0         14            4               10              14
21                   2          7            9                8               4
22                   5          5           11                8               3
23                   5          5           11                8               1
24                   4          5           12                8               2
25                   5          7           10                9               0
26                   1         12            5                9              10
27                   3          8            8                8               7
28                   4          8            9                8               2
29                   6          6           11                8               4
30                   7          3           14                9               1
31                   6          6           10                8               5
32                   7          2           14                8               0
33                   6          4           11                8               1
34                   2         13            8               11               6
35                   2         14            5                8              12
36                   6          7            8                7               3
37                   2          6           10                8               5
38                   3         12            6                8               9
39                   6          8            9                9               7
40                   5          7           10                8               3
41                   5         11            7                9               6
42                   1          8            8                8               7
43                   4          6           10                8               3
44                   2         12            6                9              10
45                   2         13            6                9              10
46                   3          6           10                8               4
47                   5          7            9                8               3
48                   5          5           11                8               0
49                   3          7           10                9               5
50                   5          5           12                8               3
   ml_corr_unfavored ml_corr_total ml_incorr_favored ml_incorr_unfavored
1                 11            12                 2                   2
2                  9             9                 1                   6
3                  3             9                 5                   2
4                  1             6                 6                   6
5                 11            12                 2                   3
6                  4            10                 5                   1
7                  6             8                 3                   5
8                  9            10                 3                   3
9                  8            12                 2                   3
10                 9            13                 1                   3
11                 2             9                 5                   3
12                 1            12                 2                   3
13                 2            14                 4                   0
14                 4            10                 5                   2
15                 4            12                 4                   2
16                 9            11                 1                   4
17                 6            12                 1                   4
18                 3            15                 1                   3
19                 2            12                 5                   2
20                 0            14                 4                   0
21                 4             8                 5                   3
22                 8            11                 3                   2
23                 8             9                 3                   4
24                 7             9                 5                   3
25                 6             6                 4                   7
26                 2            12                 3                   2
27                 4            11                 4                   1
28                 8            10                 1                   6
29                11            15                 0                   2
30                13            14                 1                   2
31                 9            14                 1                   1
32                11            11                 3                   2
33                11            12                 0                   4
34                 5            11                 3                   7
35                 2            14                 3                   2
36                 7            10                 1                   5
37                 4             9                 6                   1
38                 4            13                 2                   3
39                 7            14                 2                   1
40                 6             9                 4                   4
41                 5            11                 2                   5
42                 1             8                 7                   1
43                 6             9                 4                   3
44                 2            12                 4                   2
45                 3            13                 3                   3
46                 6            10                 4                   2
47                 7            10                 2                   4
48                11            11                 0                   5
49                 4             9                 6                   2
50                 7            10                 5                   2
   ml_incorr_total win_ratio ratio_corr_favored ratio_corr_unfavored
1                4 0.1875000          0.3333333            0.8461538
2                7 0.3333333          0.0000000            0.6000000
3                7 0.5000000          0.5454545            0.6000000
4               12 0.6111111          0.4545455            0.1428571
5                5 0.2352941          0.3333333            0.7857143
6                6 0.4375000          0.5454545            0.8000000
7                8 0.4375000          0.4000000            0.5454545
8                6 0.2500000          0.2500000            0.7500000
9                5 0.4117647          0.6666667            0.7272727
10               4 0.4117647          0.8000000            0.7500000
11               8 0.5882353          0.5833333            0.4000000
12               5 0.8235294          0.8461538            0.2500000
13               4 0.6666667          0.7500000            1.0000000
14               7 0.4705882          0.5454545            0.6666667
15               6 0.5555556          0.6666667            0.6666667
16               5 0.3750000          0.6666667            0.6923077
17               5 0.5882353          0.8571429            0.6000000
18               4 0.7894737          0.9230769            0.5000000
19               7 0.6315789          0.6666667            0.5000000
20               4 0.7777778          0.7777778                  NaN
21               8 0.4375000          0.4444444            0.5714286
22               5 0.3125000          0.5000000            0.8000000
23               7 0.3125000          0.2500000            0.6666667
24               8 0.2941176          0.2857143            0.7000000
25              11 0.4117647          0.0000000            0.4615385
26               5 0.7058824          0.7692308            0.5000000
27               5 0.5000000          0.6363636            0.8000000
28               7 0.4705882          0.6666667            0.5714286
29               2 0.3529412          1.0000000            0.8461538
30               3 0.1764706          0.5000000            0.8666667
31               2 0.3750000          0.8333333            0.9000000
32               5 0.1250000          0.0000000            0.8461538
33               4 0.2666667          1.0000000            0.7333333
34              10 0.6190476          0.6666667            0.4166667
35               5 0.7368421          0.8000000            0.5000000
36               6 0.4666667          0.7500000            0.5833333
37               7 0.3750000          0.4545455            0.8000000
38               5 0.6666667          0.8181818            0.5714286
39               3 0.4705882          0.7777778            0.8750000
40               8 0.4117647          0.4285714            0.6000000
41               7 0.6111111          0.7500000            0.5000000
42               8 0.5000000          0.5000000            0.5000000
43               7 0.3750000          0.4285714            0.6666667
44               6 0.6666667          0.7142857            0.5000000
45               6 0.6842105          0.7692308            0.5000000
46               6 0.3750000          0.5000000            0.7500000
47               6 0.4375000          0.6000000            0.6363636
48               5 0.3125000                NaN            0.6875000
49               8 0.4117647          0.4545455            0.6666667
50               7 0.2941176          0.3750000            0.7777778
   ratio_corr_total
1         0.7500000
2         0.5625000
3         0.5625000
4         0.3333333
5         0.7058824
6         0.6250000
7         0.5000000
8         0.6250000
9         0.7058824
10        0.7647059
11        0.5294118
12        0.7058824
13        0.7777778
14        0.5882353
15        0.6666667
16        0.6875000
17        0.7058824
18        0.7894737
19        0.6315789
20        0.7777778
21        0.5000000
22        0.6875000
23        0.5625000
24        0.5294118
25        0.3529412
26        0.7058824
27        0.6875000
28        0.5882353
29        0.8823529
30        0.8235294
31        0.8750000
32        0.6875000
33        0.7500000
34        0.5238095
35        0.7368421
36        0.6250000
37        0.5625000
38        0.7222222
39        0.8235294
40        0.5294118
41        0.6111111
42        0.5000000
43        0.5625000
44        0.6666667
45        0.6842105
46        0.6250000
47        0.6250000
48        0.6875000
49        0.5294118
50        0.5882353
Code
#plot correct and incorrect moneyline prediction ratio for each team by season
agg_tbl <- teams %>% group_by(team, schedule_season) %>% 
  summarise(corr_ratio = (sum(ml_corr_total)/(sum(ml_corr_total + ml_incorr_total))),
            incorr_ratio = (sum(ml_incorr_total)/(sum(ml_corr_total + ml_incorr_total))),
            sum = sum(ml_corr_total + ml_incorr_total),
            .groups = 'drop')

ggplot(agg_tbl, aes(factor(team), fill=factor(corr_ratio))) +     
  geom_bar()

Code
ggplot(agg_tbl, aes(x=schedule_season, y=corr_ratio)) +     
  geom_col() + 
  facet_wrap(~team)

Code
#plot correct moneyline prediction ratio for each team by season, facet  by team
agg_tbl2 <- teams %>% group_by(team, schedule_season) %>% 
  summarise(ratio = (sum(ml_corr_favored)/(sum(ml_corr_favored + ml_incorr_favored))),
            .groups = 'drop')

ggplot(agg_tbl2, aes(x=schedule_season, y=ratio)) +     
  geom_col() + 
  facet_wrap(~team)
Warning: Removed 4 rows containing missing values (`position_stack()`).

Code
#plot win ratio of each team by season 
agg_tbl3 <- teams %>% group_by(team, schedule_season) %>% 
  summarise(ratio = total_wins/(total_wins + total_losses),
            .groups = 'drop')

ggplot(agg_tbl3, aes(x=schedule_season, y=ratio)) +     
  geom_point(aes(colour = factor(team))) 

Code
regFavored <- lm(teams$ratio_corr_favored ~ teams$win_ratio)
print(summary(regFavored))

Call:
lm(formula = teams$ratio_corr_favored ~ teams$win_ratio)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.52819 -0.06854 -0.01043  0.07007  0.61637 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.19088    0.04002    4.77 4.24e-06 ***
teams$win_ratio  0.81917    0.07578   10.81  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1685 on 154 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.4314,    Adjusted R-squared:  0.4277 
F-statistic: 116.9 on 1 and 154 DF,  p-value: < 2.2e-16
Code
regUnfavored <- lm(teams$ratio_corr_unfavored ~ teams$win_ratio)
print(summary(regUnfavored))

Call:
lm(formula = teams$ratio_corr_unfavored ~ teams$win_ratio)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.44543 -0.07571 -0.01488  0.05314  0.59908 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.96472    0.03471   27.79   <2e-16 ***
teams$win_ratio -0.70474    0.06720  -10.49   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1496 on 154 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.4167,    Adjusted R-squared:  0.4129 
F-statistic:   110 on 1 and 154 DF,  p-value: < 2.2e-16
Code
regTotal <- lm(teams$ratio_corr_total ~ teams$win_ratio)
print(summary(regTotal))

Call:
lm(formula = teams$ratio_corr_total ~ teams$win_ratio)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.31798 -0.09001  0.01632  0.08683  0.23861 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.657053   0.026274  25.008   <2e-16 ***
teams$win_ratio -0.009385   0.050237  -0.187    0.852    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1145 on 158 degrees of freedom
Multiple R-squared:  0.0002208, Adjusted R-squared:  -0.006107 
F-statistic: 0.0349 on 1 and 158 DF,  p-value: 0.8521
Code
regFavoredTeam <- lm(teams$ratio_corr_favored ~ teams$win_ratio + teams$team)
print(summary(regFavoredTeam))

Call:
lm(formula = teams$ratio_corr_favored ~ teams$win_ratio + teams$team)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.48798 -0.07169 -0.00507  0.08247  0.58378 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.04455    0.08219   0.542  0.58875    
teams$win_ratio  0.77329    0.09733   7.945 1.05e-12 ***
teams$teamATL    0.18652    0.10426   1.789  0.07607 .  
teams$teamBAL    0.15363    0.10699   1.436  0.15358    
teams$teamBUF    0.24468    0.10725   2.281  0.02424 *  
teams$teamCAR   -0.02201    0.10426  -0.211  0.83314    
teams$teamCHI    0.32875    0.10445   3.147  0.00207 ** 
teams$teamCIN    0.28718    0.10436   2.752  0.00683 ** 
teams$teamCLE    0.23153    0.10474   2.210  0.02892 *  
teams$teamDAL    0.14911    0.10594   1.407  0.16182    
teams$teamDEN    0.14430    0.11057   1.305  0.19432    
teams$teamDET    0.18153    0.11058   1.642  0.10324    
teams$teamGB     0.15153    0.10719   1.414  0.15998    
teams$teamHOU    0.09757    0.11074   0.881  0.38000    
teams$teamIND    0.16508    0.10491   1.573  0.11819    
teams$teamJAX   -0.03209    0.10456  -0.307  0.75941    
teams$teamKC     0.16428    0.11153   1.473  0.14331    
teams$teamLAC    0.15644    0.10511   1.488  0.13921    
teams$teamLAR    0.17190    0.10663   1.612  0.10949    
teams$teamLVR    0.15414    0.10434   1.477  0.14214    
teams$teamMIA    0.27068    0.11152   2.427  0.01667 *  
teams$teamMIN    0.23795    0.10573   2.250  0.02619 *  
teams$teamNE     0.19991    0.10619   1.883  0.06212 .  
teams$teamNO     0.16083    0.10745   1.497  0.13703    
teams$teamNYG    0.26148    0.10428   2.507  0.01346 *  
teams$teamNYJ    0.18972    0.10455   1.815  0.07201 .  
teams$teamPHI    0.18542    0.10536   1.760  0.08090 .  
teams$teamPIT    0.12113    0.10604   1.142  0.25553    
teams$teamSEA    0.15109    0.10607   1.424  0.15685    
teams$teamSF     0.09851    0.10570   0.932  0.35319    
teams$teamTB     0.20718    0.10544   1.965  0.05168 .  
teams$teamTEN    0.13301    0.10605   1.254  0.21213    
teams$teamWAS    0.28505    0.10426   2.734  0.00718 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1648 on 123 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.5654,    Adjusted R-squared:  0.4523 
F-statistic:     5 on 32 and 123 DF,  p-value: 4.13e-11
Code
regUnfavoredTeam <- lm(teams$ratio_corr_unfavored ~ teams$win_ratio + teams$team)
print(summary(regUnfavoredTeam))

Call:
lm(formula = teams$ratio_corr_unfavored ~ teams$win_ratio + teams$team)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.39959 -0.08608 -0.00109  0.05211  0.47492 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.88605    0.07398  11.976  < 2e-16 ***
teams$win_ratio -0.77952    0.08754  -8.905 5.81e-15 ***
teams$teamATL    0.13227    0.09387   1.409  0.16133    
teams$teamBAL    0.19463    0.09633   2.020  0.04551 *  
teams$teamBUF    0.15167    0.10144   1.495  0.13744    
teams$teamCAR    0.02957    0.09388   0.315  0.75331    
teams$teamCHI    0.17470    0.09405   1.858  0.06562 .  
teams$teamCIN    0.12409    0.09397   1.321  0.18910    
teams$teamCLE    0.17261    0.09431   1.830  0.06962 .  
teams$teamDAL    0.08958    0.09538   0.939  0.34952    
teams$teamDEN    0.10305    0.09386   1.098  0.27441    
teams$teamDET    0.07868    0.09397   0.837  0.40410    
teams$teamGB     0.06812    0.09650   0.706  0.48159    
teams$teamHOU    0.07793    0.09387   0.830  0.40802    
teams$teamIND    0.17593    0.09446   1.862  0.06493 .  
teams$teamJAX    0.03949    0.09414   0.419  0.67562    
teams$teamKC     0.15497    0.10636   1.457  0.14765    
teams$teamLAC    0.14119    0.09464   1.492  0.13827    
teams$teamLAR    0.18399    0.09600   1.916  0.05763 .  
teams$teamLVR    0.07405    0.09395   0.788  0.43207    
teams$teamMIA    0.11852    0.09433   1.256  0.21133    
teams$teamMIN    0.28559    0.09520   3.000  0.00327 ** 
teams$teamNE     0.08791    0.10075   0.873  0.38458    
teams$teamNO     0.01032    0.09674   0.107  0.91522    
teams$teamNYG    0.09764    0.09389   1.040  0.30043    
teams$teamNYJ    0.06975    0.09413   0.741  0.46009    
teams$teamPHI    0.26264    0.09486   2.769  0.00650 ** 
teams$teamPIT    0.02303    0.09547   0.241  0.80977    
teams$teamSEA    0.19750    0.09550   2.068  0.04072 *  
teams$teamSF     0.07878    0.09517   0.828  0.40936    
teams$teamTB     0.18519    0.10004   1.851  0.06656 .  
teams$teamTEN    0.01980    0.09548   0.207  0.83605    
teams$teamWAS    0.09817    0.09387   1.046  0.29772    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1484 on 123 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared:  0.5417,    Adjusted R-squared:  0.4225 
F-statistic: 4.543 on 32 and 123 DF,  p-value: 5.815e-10
Code
regTotalTeam <- lm(teams$ratio_corr_total ~ teams$win_ratio + teams$team)
print(summary(regTotalTeam))

Call:
lm(formula = teams$ratio_corr_total ~ teams$win_ratio + teams$team)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.258170 -0.066978 -0.001225  0.071344  0.231886 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.611871   0.055371  11.050  < 2e-16 ***
teams$win_ratio -0.077729   0.064578  -1.204  0.23096    
teams$teamATL    0.062538   0.070497   0.887  0.37670    
teams$teamBAL    0.089987   0.072278   1.245  0.21542    
teams$teamBUF    0.155729   0.072445   2.150  0.03348 *  
teams$teamCAR   -0.057909   0.070500  -0.821  0.41295    
teams$teamCHI    0.159921   0.070624   2.264  0.02525 *  
teams$teamCIN    0.135756   0.070566   1.924  0.05661 .  
teams$teamCLE    0.077827   0.070812   1.099  0.27382    
teams$teamDAL    0.037130   0.071593   0.519  0.60492    
teams$teamDEN    0.027621   0.070490   0.392  0.69583    
teams$teamDET    0.096159   0.070570   1.363  0.17542    
teams$teamGB     0.079802   0.072405   1.102  0.27248    
teams$teamHOU    0.102134   0.070494   1.449  0.14984    
teams$teamIND    0.073257   0.070925   1.033  0.30362    
teams$teamJAX    0.036101   0.070692   0.511  0.61046    
teams$teamKC     0.209051   0.075236   2.779  0.00629 ** 
teams$teamLAC    0.037932   0.071050   0.534  0.59436    
teams$teamLAR    0.128579   0.072042   1.785  0.07668 .  
teams$teamLVR    0.015484   0.070551   0.219  0.82663    
teams$teamMIA    0.086864   0.070829   1.226  0.22232    
teams$teamMIN    0.156510   0.071457   2.190  0.03033 *  
teams$teamNE     0.133633   0.071755   1.862  0.06486 .  
teams$teamNO     0.069993   0.072578   0.964  0.33669    
teams$teamNYG    0.113411   0.070512   1.608  0.11023    
teams$teamNYJ    0.093907   0.070685   1.329  0.18638    
teams$teamPHI    0.107276   0.071213   1.506  0.13445    
teams$teamPIT   -0.009167   0.071656  -0.128  0.89841    
teams$teamSEA    0.055007   0.071675   0.767  0.44423    
teams$teamSF     0.050447   0.071435   0.706  0.48136    
teams$teamTB     0.115514   0.071269   1.621  0.10754    
teams$teamTEN   -0.013591   0.071662  -0.190  0.84988    
teams$teamWAS    0.092685   0.070497   1.315  0.19096    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1115 on 127 degrees of freedom
Multiple R-squared:  0.2389,    Adjusted R-squared:  0.04718 
F-statistic: 1.246 on 32 and 127 DF,  p-value: 0.196
Code
regCorrFavplot <- ggplot(teams, aes(x=win_ratio, y=ratio_corr_favored)) +     
            geom_point(aes(colour = team)) +   
            stat_smooth(method = "lm",
                        formula = y ~ x,
                        geom = "smooth")

regCorrFavplot2 <- ggplot(teams, aes(x=win_ratio, y=ratio_corr_favored)) +     
            geom_point(aes(colour = team)) + 
            stat_smooth(method = "lm",
                        formula = y ~ x,
                        geom = "smooth") + facet_wrap(~schedule_season)

regCorrFavplot3 <- ggplot(teams, aes(x=win_ratio, y=ratio_corr_favored)) +     
            geom_point(aes(colour = team)) + 
            stat_smooth(method = "lm",
                        formula = y ~ x,
                        geom = "smooth") + facet_wrap(~team)

regCorrUnplot <- ggplot(teams, aes(x=win_ratio, y=ratio_corr_unfavored)) +     
            geom_point(aes(colour = team)) +   
            stat_smooth(method = "lm",
                        formula = y ~ x,
                        geom = "smooth")

regCorrUnplot2 <- ggplot(teams, aes(x=win_ratio, y=ratio_corr_unfavored)) +     
            geom_point(aes(colour = team)) + 
            stat_smooth(method = "lm",
                        formula = y ~ x,
                        geom = "smooth") + facet_wrap(~schedule_season)

regTotalplot <- ggplot(teams, aes(x=win_ratio, y=ratio_corr_total)) +     
            geom_point(aes(colour = team)) + 
            stat_smooth(method = "lm",
                        formula = y ~ x,
                        geom = "smooth") + facet_wrap(~schedule_season)

regTotalplot2 <- ggplot(teams, aes(x=win_ratio, y=ratio_corr_total)) +     
            geom_point(aes(colour = team)) + 
            stat_smooth(method = "lm",
                        formula = y ~ x,
                        geom = "smooth") + facet_wrap(~team)

regCorrFavplot
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 4 rows containing missing values (`geom_point()`).

Code
regCorrFavplot2
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Removed 4 rows containing missing values (`geom_point()`).

Code
regCorrFavplot3
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Removed 4 rows containing missing values (`geom_point()`).

Code
regCorrUnplot
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Removed 4 rows containing missing values (`geom_point()`).

Code
regCorrUnplot2
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Removed 4 rows containing missing values (`geom_point()`).

Code
regTotalplot

Code
regTotalplot2

After looking through the regression models and diagnostics, the best performing model I found is (teams\(ratio_corr_favored ~ teams\)win_ratio). It makes sense that this model is well-performing because if you are a team with a high win_ratio across multiple years, you will inherently be favored more often and correctly favored more often. Looking at the regression lines by team is helpful for determining which teams are performing the best in the league, but ultimately that regression is not helpful as a general regression tool as it ultimately just shows the team with the highest win_ratio.

Code
plot(regFavoredTeam)