ATP Tennis Analysis - Final Project

Final Project - Using ATP Tennis Data Files: analysis of player populations and success.

Jason O’Connell
2022-05-10
Show code
ul {
  line-height: 1;
}
li {
  line-height: 1;
}
...

Latest Overview

ATP Tennis Statistics Analysis - Final Project

This analysis looks at the ATP tennis statistics since the beginning of the open era in 1969. The data is sourced from GitHub 1 excludes any Davis Cup matches including only ATP professional tennis tournaments from each year through 2019. Years 2020 and 2021 have been excluded due to interruptions in professional tournament play due to the Covid-19 pandemic.

An analysis of the general number of players participating in tournaments and the overall number of tournaments in the dominance of the top players is presented. Some commentary about the court surface and the number of players who win very few matches on tour will be discussed.

Primary topics of inquiry:

Initialize Libraries

Show code

Create Working Data Set

First read ATP data files from GITHUB source (https://github.com/JeffSackmann/tennis_atp/).

The data is collected into a single dataframe by pending each Year’s ATP tournament data file. These files include both tournament match information and match information from other competitions such as the Davis Cup; Davis Cup matches will be excluded from the analysis.

Finally I am only interested in ATP tournament data so will filter our Davis Cup matches which are also included in the data set.

Show code
atp_files <- paste("https://raw.githubusercontent.com/JeffSackmann/tennis_atp/master/atp_matches_",1969:2019,".csv", sep = '')
atp_base<-map_dfr(atp_files, read_csv)

Data Enrichment

This r chunk will enrich the atp_base data with some variable for use later.

Show code
atp_base<-atp_base%>%  
  mutate(season=year(ymd(tourney_date)))%>%
#  mutate_at(c("winner_rank","loser_rank"), ~replace(.,is.na(.), 9999))%>%
  mutate(topten_players=case_when(
    winner_rank <= 10 & loser_rank <= 10 ~ as.integer("2"),
    winner_rank <= 10 | loser_rank <= 10 ~ as.integer("1"),
    TRUE ~ as.integer("0"),
  ))

Calculate Ranking Information

Following code collects ranking information by season and player and calculates each players average ranking by season. Players appear in the data set for each match they play in either the winner or loser column. To accurately calculate the average ranking the data must be transformed to include only one row for each player for each tournament and then average over the season.

Show code
avg_ranking<-atp_base%>%
#  filter(season == 1980)%>%
  select(season, tourney_id, winner_id, winner_rank, loser_id, loser_rank)%>%
  pivot_longer(cols = c("winner_id","loser_id"), names_to = "winner_loser_cat", values_to = "player_id")%>%
  pivot_longer(cols = ends_with("rank"), names_to = "rank_type", values_to = "rank")%>%
#the following filter is critical to eliminate duplicate rows for winner/loser since the 2 pivot longers create a 2 rows for each winner and loser. I wonly wont loser/loser and winner/winner eliminating loser/winner and winner/loser rows
  filter((winner_loser_cat == "loser_id" & rank_type == "loser_rank") | (winner_loser_cat == "winner_id" & rank_type == "winner_rank"))%>%
  select(season, tourney_id, player_id, rank)%>%
  distinct()%>%
  filter(!is.na(rank))%>%
  group_by(season, player_id)%>%
  summarize(avg_rank = mean(rank))

Base Statistics

Lets start with some simple statistics by year. 2

Tournament Matches

With the advent of the open era of professional tennis in 1969 conversion from the historically amateur status tournaments to professional tournaments was accomplished over the course of 1969 - 1971 demonstrated by the growth in the number of tournaments. Since a peak in 1975 the overall number of matches played on the ATP professional tennis tour has declined. In 1988 players took more control and took a more active role in governance of the ATP and participation increased for a few years. However in the early 90s there’s a significant increase in the prize money at each tournament requiring greater levels of commitment from sponsors in an overall decline in the number of matches going forward.

Also of Interest is the type of surface that is being played on for each tournament professional tennis match. In recent years hard courts have come to dominate the game of tennis. Early in the open era of tennis tournaments four distinct surfaces were played; clay, grass, hard courts and, when indoors, carpet. In more recent years the play on carpet has been eliminated as technology has allowed for temporary hard courts to be used in Indoor Stadium arenas. Grass Court tennis originally the surface of choice for 3 out of the 4 grand slam tournaments has gradually been reduced with the conversion of the Australian Open and the US Open to various versions of hard court. Grass courts have been used at Wimbledon since 1877, the US Open from 1881 to 1974, and the Australian Open from 1905 to 1987.3 The last several decades the number of tournaments in matches played on grass courts has remained relatively stable due to the commitment of the All England Club to host Wimbledon championships on grass and several tournaments Also played on grass for players to prepare for the major.

Show code
atp_base %>%
  group_by(season) %>%
  summarize(season, num_matches = n()) %>%
  distinct() %>%
#  kable(col.names = c("Season","Total Matches"), caption = "Number of Tournament Matches by Year")
  ggplot(aes(x=season,y=num_matches)) +
    geom_line(color="#0000FF") +
    labs(x="Season", y="ATP Matches Played", title="Total ATP Matched Played by Season\n") +
    theme_classic()+
    theme(axis.text.x = element_text(size = 10), axis.title.x = element_text(size = 12),
        axis.text.y = element_text(size = 10), axis.title.y = element_text(size = 12),
        plot.title = element_text(size = 14, face = "bold", color = "darkblue"))
Show code
atp_base %>%
  filter(str_detect('Clay|Hard|Carpet|Grass', surface)) %>%
  group_by(season, surface) %>%
  summarize(season, surface, num_matches = n()) %>%
  distinct() %>%
#  kable(col.names = c("Season","Total Matches"), caption = "Number of Tournament Matches by Year")
  ggplot(aes(x=season,y=num_matches)) +
    geom_line(aes(color=surface)) +
    facet_grid(cols = vars(surface)) +
    scale_color_manual(labels=c("Carpet","Clay","Grass","Hard"), values=c("orange", "#FF0000", "darkgreen", "#0000FF")) +
    guides(x = guide_axis(n.dodge = 2))+
    labs(x="Season", y="ATP Matches Played", title="Total ATP Matched Played by Season by Surface\n", color = "Surface\n") +
    theme_minimal()

Tournaments

As can be seen in the figure below the number of tournaments being played follows the same pattern as the total number of matches played by season given that each tournament is played with a fairly consistent number of entries.

Show code
atp_base %>%
  select(season,tourney_id) %>%
  distinct() %>%
  group_by(season) %>%
  summarize(season, num_tournaments = n()) %>%
  distinct() %>%
#  kable(col.names = c("Season","Tournaments"), caption = "Number of Tournaments by Year")
  ggplot(aes(x=season,y=num_tournaments)) +
    geom_line(color="blue")+
    labs(x="Season", y="ATP Tournaments Held", title="Total ATP Tournaments Held by Season\n") +
    theme_classic()+
    theme(axis.text.x = element_text(size = 10), axis.title.x = element_text(size = 12),
        axis.text.y = element_text(size = 10), axis.title.y = element_text(size = 12),
        plot.title = element_text(size = 14, face = "bold", color = "darkblue"))

Players

More Surprisingly however is that overall participation in the number of players participating in ATP tournaments overall has sharply decreased consistently through the decades. Historically there was a smaller contingent of players who would travel to all the tournaments in the participation from local qualifiers contributed to the overall participation rates being much higher. Since 1990 ended the escalation of prize money in sponsorships at each tournament a greater number of players have been able to support travel expenses at two tournaments throughout the globe. As a consequence each tournament is getting higher ranked players throughout the draw of the tournament. to maintain prize money in sponsorships and TV viewership getting the top ranked players to participate in Tournaments has become essential. and therefore the number of spots open to qualifiers who made live more local to the tournament geography has diminished.

Show code
bind_rows(
  atp_base %>%
    select(season,winner_id) %>%
    distinct(),
  atp_base %>%
    select(season,loser_id) %>%
    distinct()
) %>%
  group_by(season) %>%
  summarize(season, num_players = n()) %>%
  distinct() %>%
#  kable(col.names = c("Season","Players"), caption = "Number of Players by Year")
  ggplot(aes(x=season,y=num_players)) +
    geom_line(color="blue")+
    labs(x="Season", y="ATP Players Involved", title="Total ATP Professional Participation by Season\n") +
    theme_classic()+
    theme(axis.text.x = element_text(size = 10), axis.title.x = element_text(size = 12),
        axis.text.y = element_text(size = 10), axis.title.y = element_text(size = 12),
        plot.title = element_text(size = 14, face = "bold", color = "darkblue"))

Top Winners

If we look at the top players in terms of wins in a single season we can see that a significant number of players from the early 1970s and 1980s. This is due to the fact that the top players were playing in many more tournaments each week then today’s top ranked players. The only player since 1982 to be included in the top 20 wins in a single 80P season is Roger Federer in 2006 But we will see later Ted Rogers 2006 season what’s the second best season for a male professional tennis player in history.

Show code
atp_base %>%
    select(season,winner_id, winner_name) %>%
    group_by(season, winner_id, winner_name) %>%
    summarize(season, winner_id, winner_name, num_wins = n()) %>%
    distinct() %>%
    arrange(desc(num_wins)) %>%
    head(20)%>%
    kable(col.names = c("Season","Player ID","Player Name","Wins"), caption = "Top Winners")%>%
    remove_column(c(2))
Table 1: Top Winners
Season Player Name Wins
1977 Guillermo Vilas 131
1973 Ilie Nastase 120
1972 Ilie Nastase 119
1982 Ivan Lendl 109
1980 Ivan Lendl 108
1977 Brian Gottfried 107
1975 Arthur Ashe 102
1974 Bjorn Borg 99
1979 John McEnroe 99
1976 Jimmy Connors 98
1976 Raul Ramirez 97
1981 Ivan Lendl 95
1974 Jimmy Connors 94
1975 Manuel Orantes 94
1974 John Newcombe 93
1973 Tom Okker 92
1973 Jimmy Connors 92
1974 Guillermo Vilas 92
2006 Roger Federer 92
1975 Ilie Nastase 91

Create Player Statistics Table

A player statistics table is created to summarize wins and loses and other statitics for each player each season. Step 1: joins the atp data with itself to include first winner data and then loser data. Step 2: correct NA win totals, calcs total matches, and win_percent each year Step 3: include new columns to categorize the player data.

Show code
player_table<-full_join(
  atp_base %>%    
    select(season,winner_id, winner_name) %>%
    group_by(season, winner_id, winner_name) %>%
    summarize(season, winner_id, winner_name, num_wins = n()) %>%
    distinct() %>%
    rename(player_id = winner_id, player_name = winner_name),
  atp_base %>%
    select(season,loser_id, loser_name) %>%
    group_by(season, loser_id, loser_name) %>%
    summarize(season, loser_id, loser_name, num_loses = n()) %>%
    distinct() %>%
    rename(player_id = loser_id, player_name = loser_name),
  by=c("season"="season","player_id"="player_id","player_name"="player_name")
) %>%
  mutate(num_wins=if_else(is.na(num_wins), as.integer(0), num_wins)) %>%
  mutate(num_matches=num_wins+num_loses, win_percent=100*num_wins/(num_wins+num_loses))%>%
  mutate(decade=10*trunc(season/10))%>%
  merge(avg_ranking, by = c("season","player_id"), all.x = TRUE)%>%
  group_by(season)%>%
  mutate(rank_bin=ntile(avg_rank,10))%>%
  ungroup()%>%
  merge(tibble(rank_bin=c(1:10),rank_binA=c("Top 10th %ile","11-20th %ile","21-30th %ile","31-40th %ile","41-50th %ile","51-60th %ile","61-70th %ile","71-80th %ile","81-90th %ile","Bottom 10th %ile"),rank_bin_sort=c(1:10)),by = c("rank_bin"), all.x=TRUE)%>%
  replace_na(list(rank_binA = 'Unranked', rank_bin_sort = 11))

I found it useful later to look at match wins for rounds other then the first round so here that measure is added to the player table. It will be shown later that players in the lower ranks win very few match all year. When wins in only the first round of the tournment are excluded they win on average very few. Given that prize money for first or second round loses barely cover the expense to travel, compete, pay for coaching, trainers etc. These lower ranked players will struggle to make ends meet.

Show code
# exFirstRnd<-atp_base%>%
#   filter(draw_size%in%c(128,64,32))%>%
#   filter(draw_size != as.integer(str_replace(round,"R","")))


player_table<-merge(player_table, 
      atp_base%>%
        filter(draw_size%in%c(128,64,32))%>%
        filter(draw_size != as.integer(str_replace(round,"R","")))%>%
        select(season, winner_id)%>%
        group_by(season, winner_id)%>%
        summarise(season, winner_id, num_wins_2rplus=n())%>%
        distinct() %>%
        rename(player_id = winner_id),
      by = c("season","player_id"),
      all.x=TRUE
)%>%
  replace_na(list(num_wins_2rplus = 0))

Player total wins histogram

This chart has an early view about how many ATP players are participating in tournaments but winning very few matches.

Show code
player_table%>%
  select(season, player_id, num_wins)%>%
  filter(season %in% c(1980,1990,2000,2010))%>%
#  group_by(season, num_wins)%>%
#  summarize(season, num_wins, count = n())%>%
#  distinct()%>%
  ggplot(aes(x=num_wins))+
  geom_histogram(binwidth = 15, fill = "SkyBlue")+
#  scale_fill_brewer(palette = "Blues")+
  facet_grid(rows = vars(season))+
  labs(title="Number of ATP players and win total (Binwidth=15)", x="Wins", y="Number of Players")+
  theme_bw()

Players total wins point chart

This chart demonstrates that same point but visually it is a little more diffcult to see the trend given how many player literally win 0-1 match all year.

Show code
top_winner<-player_table%>%
  select(season, player_name, num_wins)%>%
  filter(season %in% c(1980,1990,2000,2010))%>%
  group_by(season)%>%
  slice(which.max(num_wins))

player_table%>%
  select(season, player_id, num_wins)%>%
  filter(season %in% c(1980,1990,2000,2010))%>%
  group_by(season, num_wins)%>%
  summarize(season, num_wins, count = n())%>%
  distinct()%>%
  ggplot(aes(x=num_wins, y=count))+
  geom_point(color = "Blue")+
  geom_smooth(fill = "SkyBlue") +
  geom_point(data = top_winner, y=1, x=top_winner$num_wins, color="red") +
  geom_text(data = top_winner, aes(y=1, x=top_winner$num_wins, label=top_winner$player_name), hjust=1, vjust=2, size=3)+
  facet_grid(cols = vars(season))+
  labs(title="Number of ATP players and win totals", subtitle = "(Highest win total labeled)", x="Wins", y="Number of Players")+
  theme_bw()

Player Statistics

John McEnroe’s 1984 season is considered by most as the most dominate Men’s tennis season ever (ranked #3 all-time behind Steffi Graff’s 1988 and Martina Navratilova’s 1983 seasons) according to the Bleacher Report 4

Show code
player_table%>%
  select(season, player_id, player_name, win_percent)%>%
  arrange(desc(win_percent))%>%
#  top_n(20, win_percent) %>%
  head(20)%>%
  kable(digits = 0, col.names = c("Season","Player ID","Player Name","Wins Percent"), caption = "Top Winners")%>%
  remove_column(c(2))
Table 2: Top Winners
Season Player Name Wins Percent
1984 John McEnroe 96
2005 Roger Federer 95
2006 Roger Federer 95
2015 Novak Djokovic 93
1974 Jimmy Connors 93
1986 Ivan Lendl 93
2004 Roger Federer 93
1977 Bjorn Borg 92
1982 Ivan Lendl 92
1978 Jimmy Connors 92
1980 Bjorn Borg 92
2013 Rafael Nadal 92
1979 Bjorn Borg 92
1989 Ivan Lendl 92
2018 Rafael Nadal 92
1976 Jimmy Connors 92
1987 Ivan Lendl 91
2017 Roger Federer 91
2011 Novak Djokovic 91
1985 Ivan Lendl 91

Network

Let’s take a closer look at the 1984 ATP season and see the matches being played amongst the most successful players that year.

Show code
# create data:
vSeason <- as.integer("1984")
player_nodes <- player_table%>% 
  filter(season==vSeason, num_matches>20)%>%
  arrange(desc(win_percent))%>%
  head(10)%>%
  select(player_id, player_name, win_percent)%>%
  rename(id=player_id, label=player_name)
  
player_links <- atp_base %>%
  filter(season==vSeason, loser_id%in%player_nodes$id, winner_id%in%player_nodes$id)%>%
  select(loser_id, winner_id)%>%
  group_by(loser_id, winner_id)%>%
  summarize(loser_id, winner_id, occurances=n())%>%
  distinct()%>%
  rename(from=loser_id,to=winner_id)%>%
  mutate(
    color=case_when(from==as.integer(100581) ~ "red", to==as.integer(100581) ~ "green", TRUE ~ "lightgray"),
    width=case_when(from==as.integer(100581) ~ occurances, to==as.integer(100581) ~ occurances, TRUE ~ as.integer(1)),
    label=case_when(from==as.integer(100581) ~ toString(occurances), to==as.integer(100581) ~ toString(occurances), TRUE ~ ""),
    font.size=11,
    font.color="black",
    smooth=TRUE,
#    arrows="middle",
    shadow=FALSE
    )

player_nodes<-player_nodes%>%
  merge(player_links%>%
          group_by(to)%>%
          summarize(to, total_wins=sum(occurances))%>%
          select(to, total_wins)%>%
          distinct()%>%
          rename(id = to), by=c("id"), all.x=TRUE)%>%
          replace_na(list(total_wins = 0))%>%
          mutate(
            shape="dot",
            shadow=TRUE, 
            title = paste(total_wins," top 10 wins"), 
            size=(total_wins+1)*1.5, 
            font.size=12,
            font.color="black",
            border_width = 2
            )

visNetwork(player_nodes, player_links, height = "500px", width = "100%",
           main="Matches between Top 10 Players (1984)",
           footer="Highlighting John McEnroe's wins (green) & losses (red)")%>%
  visLayout(randomSeed = 133,improvedLayout = TRUE)

.

Comparison of players grouped into 10 categories of ranking

The figure below begins to show the situation of match wins by ranking category for a sample of years, 1980, 1990, 2000, 2010. We can see here that average wins for an entire year get close to 0-2 for players outside the top third of the rankings. We can see a few outliers in the lower rankings and these are likely younger players on rapid raise to the uppper ranking.

Show code
player_table%>%
  filter(as.integer(decade/10) %in% c(198:201))%>%
  ggplot(aes(x=factor(reorder(rank_binA,rank_bin_sort)), y=num_wins))+
  geom_boxplot()+
  coord_flip()+
    facet_wrap(facets = vars(decade)) +
    labs(title = "Wins by relative ranking category by decade\n", x="", y="Toal Wins") +
    theme_bw()

The competitive landscape for less ranked players is gradually increaseing over time. As seen in the following figure, which now included all matched each decade (not just a sampling of years), the top third highest ranked players accounts for roughly 75% of the wins over the year.

Show code
player_table%>%
  filter(as.integer(decade/10) %in% c(198:201))%>%
  ggplot(aes(x=decade, y=num_wins, fill=factor(reorder(rank_binA,rank_bin_sort))))+
    geom_bar(position="fill", stat = "identity")+
    
    #scale_color_viridis(discrete = TRUE, option = "turbo")+
    scale_fill_viridis(discrete = TRUE, option = "turbo") +
    labs(title= "Matches Won by relative ranking each decade\n", x="Decade", y="Percentage of Matches Won", fill="Ranking Category")+
    theme_bw()

Pack Circles

Let’s take a look at this data in a slight different visual across the entire 4+ decade period of ATP match data. We can clearly see that the top third ranked professional tennis players are accounting for the lions share of maych wins.

Show code
data<-player_table%>%
  filter(season>1973)%>%
  group_by(rank_binA)%>%
  summarize(rank_binA,num_wins1 = sum(num_wins))%>%
  select(rank_binA, num_wins1)%>%
    distinct()%>%
#  filter(season == "1980")%>%
  arrange(desc(num_wins1))
#  head(10)

# player_table%>%
#   select(season, player_name, num_wins)%>%
#   filter(season == "1980")%>%
#   groud_by(season,num_wind)
#   arrange(desc(num_wins))%>%
#   head(10)


packing<-circleProgressiveLayout(data$num_wins1, sizetype='area')

data<-cbind(data,packing)

dat.gg<-circleLayoutVertices(packing, npoints=50)

ggplot() + 
  
  # Make the bubbles
  geom_polygon(data = dat.gg, aes(x, y, group = id, fill=as.factor(id)), colour = "black", alpha = 0.6) +
  
  # Add text in the center of each bubble + control its size
  geom_text(data = data, aes(x, y, size=num_wins1, label = paste(rank_binA,"\n",num_wins1, sep = ""))) +
  scale_size_continuous(range = c(1,4)) +
  scale_fill_viridis(discrete = TRUE, option = "turbo") +
  
  # General theme:
  theme_void() + 
  theme(legend.position="none") +
  coord_equal()+
  
  labs(title="Ranking Category by Win Totals\n(1973-2019)\n")

Treemap

The following treemap diagram provide another view of the entire match data set of match wins by ranking category. Interestingly even for the top 10% of ranked players the average number of wins for an entire season is less then 40. Also shown is the fact that lower ranked players are winning perhaps 1 or 2 matches all year. Clearly the prize money in these lower categories will not support an entire season of travel and coaching.

Show code
player_table%>%
  select(season, rank_binA, rank_bin_sort, num_wins)%>%
  filter(season > 1973)%>%
  group_by(rank_binA, rank_bin_sort)%>%
  summarize(rank_binA, rank_bin_sort, totalwins=sum(num_wins), count = n())%>%
  distinct()%>%
  ggplot(aes(area = totalwins/count, label = round(totalwins/count,0), fill=reorder(rank_binA,rank_bin_sort)))+
    geom_treemap()+
#  scale_fill_discrete(high="red", mid="white",low="blue",midpoint = 100) +
   geom_treemap_text(colour = "white", place = "center", size = 12)+
    #scale_color_viridis(discrete = TRUE, option = "turbo")+
    scale_fill_viridis(discrete = TRUE, option = "turbo") +
    labs(title="Average wins per player per season (all time 1973-2019)\n by Relative Ranking\n",
         x="", y="",
         fill = "Rank Percentile") +
    theme_bw()
Show code
#  labs(title="ATP 1980, Label = Number of wins, Size = Number of Players")

Finally, to complete the analysis, first round match wins will be excluded. It is clear that even for player in the top 10% of rankings on average they are winning 10 matches outside the first round of each tournament.

Show code
player_table%>%
  select(season, rank_binA, rank_bin_sort, num_wins_2rplus)%>%
  filter(season > 1973)%>%
  group_by(rank_binA, rank_bin_sort)%>%
  summarize(rank_binA, rank_bin_sort, totalwins=sum(num_wins_2rplus), count = n())%>%
  distinct()%>%
  ggplot(aes(area = totalwins/count, label = round(totalwins/count,0), fill=reorder(rank_binA,rank_bin_sort)))+
    geom_treemap()+
#  scale_fill_discrete(high="red", mid="white",low="blue",midpoint = 100) +
   geom_treemap_text(colour = "white", place = "center", size = 12)+
    #scale_color_viridis(discrete = TRUE, option = "turbo")+
    scale_fill_viridis(discrete = TRUE, option = "turbo") +
    labs(title="Average wins per player per season (all time 1973-2019)\nby Relative Ranking\nExcluding 1st Round",
         x="", y="",
         fill = "Rank Percentile") +
    theme_bw()
Show code
#  labs(title="ATP 1980, Label = Number of wins, Size = Number of Players")

Conclusion

After careful analysis of ATP match data since the start of the open era of tennis. Several conclusiongs are support by the data. 1. Overall matches, tournaments, and player participation is declining over time. 2. Playing surfaces have generally been trending toward more hard court matches but mainly inversely proportional to the decline of indoor carpet surfaces. 3. John McEnroe’s 1984 season was the most dominate season across all ATP competitions. 4. Finally: The data supports my hypothesis that there are many, many players at the fringe of ATP who are not winning enough matches to support their careers. Even players in the top tenth of the rankings are winning on average too few matches (and associate prize money) to support the travel and expenses to compete. The struggle of the less ranked players makes the ATP tour possible and those elite few players who are winning 50+ matches a year and earning the big prize money depend on this lesser competition’s participation.

For further analysis

  1. Closer look at the top 10 or 20 players as compared to the rest of “the field”.
  2. Closer look at ranking growth and decline as player develop and age.

Appendix 1 - About the data

The data includes detailed information about each match played on the ATP tour in a given year.

#Footer

Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill at https://rstudio.github.io/distill.


  1. Data files source: https://github.com/JeffSackmann/tennis_atp↩︎

  2. Only ATP tournament matches are include. Excludes Davis Cup, Challenge, etc.↩︎

  3. https://en.wikipedia.org/wiki/Tennis_court↩︎

  4. https://bleacherreport.com/articles/1708407-ranking-the-10-most-dominant-seasons-in-tennis-history↩︎

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

O'Connell (2022, May 11). Data Analytics and Computational Social Science: ATP Tennis Analysis - Final Project. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsnorthonumgithubiodacss601-blogposts2022-05-02-finaprojectv3/

BibTeX citation

@misc{o'connell2022atp,
  author = {O'Connell, Jason},
  title = {Data Analytics and Computational Social Science: ATP Tennis Analysis - Final Project},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsnorthonumgithubiodacss601-blogposts2022-05-02-finaprojectv3/},
  year = {2022}
}