Final Project Check In 1 - Darron Bunt
Author

Darron Bunt

Published

May 7, 2023

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(ggplot2)

Research Question

In April 2013, Doug J. Chung published a research paper1 endeavoring to quantify and model the impact of the so-called “Flutie Effect” - the spillover impact that athletics has on the quantity and quality of applicants to US colleges (named after Boston College quarterback Doug Flutie, who in 1984 threw a Hail Mary touchdown pass to secure victory with six seconds in a game against the University of Miami, qualifying the team to compete in the Cotton Bowl2). The legacy of Flutie’s on-field success has been credited with catalyzing a 30% increase in undergraduate applications at Boston College, though institutional officials have argued that other, non-athletic factors were the “true” reason behind the increase. This trend of increased applications after prominent athletic successes, however, has been observed at other institutions including Georgetown, Northwestern, Boise State, and Texas Christian University.

Chung was able to find a statistically significant relationship between athletic success and both the quantity (number of applications) and quality (SAT scores of those applicants) of applicants to a given institution; his findings included that when a school rises from being classified as a mediocre football program to great one, applications rise by 18.7%3.

While in my job I do not focus directly on applications to US colleges, I do work with on campus marketing and communications leaders, and generate insights from data derived from online conversation about their schools so that they can better understand that conversation in ways that can help them to develop, refine, and align their communications strategies with the goals of the institutions that they serve4. A trend I frequently observe while analyzing this online conversation data is the impact that athletics has on the volume and reach of mentions related to schools.

In benchmarking work that we’ve undertaken in order to better understand online conversation trends in higher education, we’ve found that, on average, 63% of all online conversation related to schools is about their athletics - and for some schools, this proportion can be as high as 91%5. And while the proportion of overall online conversation relating to different colleges is already quite high, there are certain significant events within the realm of college sports that send mention volume even higher. One such event is March Madness.

Run by the NCAA every year during the month of March, the men’s version of the March Madness tournament is widely regarded as one of biggest American sporting events. 68 teams participate in the three-week long single-elimination tournament, which produced $1.14 billion in annual revenue for the NCAA in 20226. One of the most exciting parts of the tournament are the “Cinderella stories”; teams with a low ranking that manage to eliminate higher ranked teams.

In 2022, I became interested in developing a greater understanding of the impact that March Madness could have on a school’s online conversation volume and began tracking that year’s Cinderella - the Saint Peter’s College Peacocks, a #15 seed that made it all the way to the Elite 8. In their Sweet 16 game against Purdue, Saint Peter’s had more mentions per minute than their average monthly volume for the prior five months. Their total mention volume during the month of March was 12,380 times more than that same monthly average7.

After completing that work, I became increasingly curious about how participation in the tournament affected conversation volume and reach for all teams, not just Cinderella stories. I wanted to investigate the factors that might influence more/less conversation volume for any given team, and indeed, whether conversation about the team was the only facet of conversation about a college that increased - or if the school as a whole received more conversation overall. I became very interested in learning more about the overall trends that relate to participation in events like March Madness and how we could use that information to provide strategic insights to colleges across the US.

Bringing it back to the Flutie Effect - if on-field athletic success can impact enrollment at US colleges, it stands to reason that online conversation about athletic success (or athletic events in general) could have a similar impact. Accordingly, it would be prudent to further explore the impact an event like the men’s March Madness tournament has on online conversation volume for US colleges.

Specifically, the research question that I want to explore is:

What impact does participation in the men’s March Madness tournament have on online conversation about the schools involved?

And more specifically, some nuances to this question that I want to consider are:

  • How does mention volume differ when schools are participating in the tournament?

    • Is there a difference in reach for schools participating in the tournament?
    • Is there a correlation between increase in mention volume and increase in reach?
  • What is the baseline of conversation volume/reach increase that can be expected when a team participates in the tournament?

  • What variables contribute to increased/decreased mention volume/reach during the tournament?

    • Size of school
    • Major / mid-major
    • Round of the tournament
    • Favorite / underdog
    • Upset
    • Time of day
  • If there is a change in mention volume/reach during the tournament, how long does this change last?

    • Is this length of time similar or different for volume and reach?
    • Do the same variables that contribute to increased/decreased mention volume also impact the length of that increase?
  • Is the increase in conversation volume/reach only related to athletics?

Answers to these questions will provide a better understanding of the overall impact that events such as the men’s March Madness tournament have on conversation volume/reach and can subsequently serve to inform strategies and tactics that can be implemented by marketing and communications professionals who focus on digital media in higher education. That is, schools can use this information to develop further hypotheses to be tested that would facilitate a goal of leveraging and maximizing the impact of athletic participation and success on their online conversation.

Hypothesis

My overarching hypothesis is that participation in the men’s March Madness tournament increases online conversation about the schools that are involved. I further hypothesize that:

  • There is a correlation between increases in mention volume and overall/average reach; that is, it’s not just the number of mentions about each school that increases, but also the number of people who see those mentions.
  • The increase in conversation (volume and reach) for each school will be influenced by six factors: school size, classification as a major or mid-major school, which round of the tournament the game happens during, whether the team was the favorite (ranked higher) or the underdog (ranked lower), whether the game was considered a major upset, and the time of day the game took place.
  • The change in mention volume/reach will return to baseline within a matter of days after a team’s exit from the tournament.
  • The variables I hypothesize contribute to variations in how volume/reach increase will not also impact the length of that increase.
  • The increase in conversation volume/reach will not only be observed in athletics mention volume.

Descriptive Statistics

A description and summary of your data. How was your data collected by its original collectors? What are the important variables of interest for your research question? Use functions like glimpse() and summary() to present your data.

The data used for this project was collected by me using the Brandwatch Consumer Research platform. I first wrote Boolean to search for mentions about all 68 schools with teams in the men’s March Madness tournament, using the same parameters to construct each one (specifically, the school’s full name, shortened name(s), acronym(s), mascot(s), website URL, athletics website URL, and Twitter usernames for the school’s flagship, flagship athletics, and men’s basketball team accounts). The Boolean was then used to run a query within Brandwatch to pull all relevant, retrievable online mentions for the 68 schools made between January 1 and April 30 2023. This search returned 8,099,045 mentions.

One of the features included in Brandwatch is the ability to segment data into categories, and this can be done in a wide range of manners (using Boolean or a variety of pre-built parameters such as content sources and mention types). I used this feature to create a variety of categories that would be necessary for me to be able to parse the full dataset into manageable chunks that could be used to answer my research question.

Variables of Interest

DEPENDENT VARIABLE: Mention volume (Numeric, Continuous variable)

Reach (an estimate of the number of people who saw any given Twitter post, as provided by Brandwatch using a proprietary algorithm) will also be considered as a dependent variable.

INDEPENDENT VARIABLES

  • Date - Categorical variable - between January 1 and April 30 2023.

  • School - Categorical variable - which of the 68 schools in the men’s March Madness tournament each mention is attributed to.

  • Size of School8 - Categorical variable - Very Small (1-1,000 students), Small (1,001-2,999 students), Medium (3,000-9,999 students), Large (10,000-19,999 students), Very Large (20,000+ students)

  • Major or Mid-Major9 - Binary variable - Schools that play in the ACC, AAC, Big East, Big 10, Big 12, Pac-12, or SEC are considered Major; schools that do not (but are participating in March Madness) are considered Mid-Major.
  • Round of the Tournament - Categorical variable - First Four, Round of 64, Round of 32, Sweet 16, Elite 8, Final 4, National Championship

  • Favorite or Underdog - Binary variable - the higher ranked team is considered the favorite; the lower ranked team is considered the underdog

  • Is Upset - Binary variable - Yes/No for upset. A game is considered an upset when the winner of the game was seeded 5+ places lower than the team they defeated.

  • Time of Day - Categorical variable - five options based on advertised tip-off time. Early afternoon (12-2), Mid Afternoon (2-4), Late Afternoon (4-6), Early Evening (6-8), Late Evening (8-10)

  • Is Athletics Mention - Binary variable - Yes/No. Athletics conversation has been classified using Boolean and categorization that I created for other projects.

  • Significant Non-Athletic Event - Binary variable - Yes/No. Was there a significant non-athletic event (crisis, etc.) that caused an unrelated increase in volume?