Final Project Kpop Collaboration Network

Author

Erika Nagai

Published

May 20, 2023

Overview

The phenomenon of K-pop has recently emerged as a prominent cultural force, gaining significant attention and popularity not only in South Korea and other Asian nations but globally as well, particularly among the younger generation. An interesting aspect of K-pop is that many K-pop artists collaborate with other artists, often those who are not based in South Korea. In this analysis, my objective is to delve into the dynamics of such collaborations within the K-pop industry and beyond, across various musical genres, and to investigate how these collaborative patterns have evolved over time. To achieve this goal, I aim to examine the social network of K-pop collaborations, with a view to gaining a better understanding of the underlying trends and structures shaping this phenomenon.

Literature review and hypothesis

During the 1990s, Seo Taiji gained significant popularity in the K-pop industry by pioneering a fusion of hip-hop and euro-pop-inspired songs (Kyung, 2021). Subsequently, from the late 1990s to the mid-2020s, K-pop experienced a surge in popularity in other Asian regions, particularly China and Japan (Kyung, 2021; Shin, 2009). The notable global success of Psy’s “Gangnam Style” in 2012 served as a pivotal moment, propelling K-pop into the U.S. market and instilling hope within the industry for accessing a broader international audience (Kyung, 2021). Based on these historical developments, I hypothesized: H1. K-pop started to collaborate on songs with international artists, particularly those from Western countries, following the impact of “Gangnam Style” in 2012.”

Methods of studies

To analyze the K-pop music landscape, the following steps were taken: First, the name of K-pop artists were gathered by prompting ChatGPT on famous K-pop artists for the last 20 years (specifically, as of 2005, 2010, 2012, 2015, 2018, and 2020) and by extracting all the artists’ name from Spotify official playlists “Top-KPop Artists of 2022,” “Millenium K-Pop,” “Best of 2008: K-Pop”. Then, all the songs who are sung by these artists were gathered using Spotify API and only collaboration songs, which are sung by more than one artist were used for this analysis. This resulted in 1123 songs by 962 artists/groups. (Refer to “Preparation_SpotifyID.ipynb” and “K-Pop Social Network ANalysis.ipynb” for the process of data collection.)

Read in & Describe data

collab_songs.csv:

This is a list of collaboration songs collected by

gathering the name of the top 50 Kpop artists in 2010, 2015, and 2020 by asking ChatGPT
manually collecting their Spotify artist id
collecting all the singles/albums data of the artists by using Spotify API and
removing the songs that are NOT collaborative by filtering out those who have only one artist registered

artist.csv:

This is a list of artists performed the collaboration songs collected by

extracting the unique artists from the collab_songs.csv
collecting their information (genre and followers) using Spotify API

song_detail.csv:

This is a list of songs with detail information prepared by

extracting the detailed song information (release year, available market etc) by using Spotify API based on the track id from kpop_collab.csv

Code

# install libraries
library(igraph)

Warning: package 'igraph' was built under R version 4.2.2


Attaching package: 'igraph'

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

Code

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:igraph':

    as_data_frame, groups, union

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Code

library(readr)

Warning: package 'readr' was built under R version 4.2.2

Code

library(ggplot2)
library(tidyr)


Attaching package: 'tidyr'

The following object is masked from 'package:igraph':

    crossing

Code

# Read in data
getwd()

[1] "C:/Users/Microsoft/Documents/DACSS/753_Social_Network/Social_Networks_Spring_2023/posts"

Code

collab_songs <- read_csv("_data/Kpop_analysis_ErikaNagai/final_collab_songs.csv")

Rows: 1123 Columns: 30
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (30): song_name, kpop_artist_name, song_id, artist_id, artist_1, artist_...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

artists <- read_csv("_data/Kpop_analysis_ErikaNagai/final_artists_df.csv")

Rows: 962 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): name, id, genre
dbl (2): top_kpop, followers

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

song_detail <- read_csv("_data/Kpop_analysis_ErikaNagai/song_detail.csv")

New names:
Rows: 1123 Columns: 20
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(11): album, artists, available_markets, external_ids, external_urls, h... dbl
(6): ...1, disc_number, duration_ms, popularity, track_number, release... lgl
(2): explicit, is_local date (1): release_date
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

Code

song_detail <- song_detail %>% select(-c("...1"))

Describe & Clean the data

`collab_songs` dataframe

collab_songs is a dataframe where each observation is a collaboration songs by Kpop top artists and other artists.

It has 1123 rows (collaboration songs) and 30 columns.

Code

library(skimr)

Warning: package 'skimr' was built under R version 4.2.3

Code

# Skim the data
skim(collab_songs)

Data summary
Name	collab_songs
Number of rows	1123
Number of columns	30
_______________________
Column type frequency:
character	30
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
song_name	0	1.00	1	140	1116
kpop_artist_name	0	1.00	1	19	140
song_id	0	1.00	22	22	1123
artist_id	0	1.00	22	22	140
artist_1	0	1.00	1	36	669
artist_id_1	0	1.00	22	22	667
artist_2	697	0.38	1	19	228
artist_id_2	698	0.38	22	22	227
artist_3	990	0.12	3	21	106
artist_id_3	990	0.12	22	22	106
artist_4	1086	0.03	4	19	32
artist_id_4	1086	0.03	22	22	32
artist_5	1103	0.02	4	17	16
artist_id_5	1103	0.02	22	22	16
artist_6	1113	0.01	5	13	10
artist_id_6	1113	0.01	22	22	10
artist_7	1116	0.01	5	17	7
artist_id_7	1116	0.01	22	22	7
artist_8	1120	0.00	5	11	3
artist_id_8	1120	0.00	22	22	3
artist_9	1121	0.00	4	9	2
artist_id_9	1121	0.00	22	22	2
artist_10	1121	0.00	2	7	2
artist_id_10	1121	0.00	22	22	2
artist_11	1122	0.00	12	12	1
artist_id_11	1122	0.00	22	22	1
artist_12	1122	0.00	13	13	1
artist_id_12	1122	0.00	22	22	1
artist_13	1122	0.00	8	8	1
artist_id_13	1122	0.00	22	22	1

Code

summary(collab_songs)

  song_name         kpop_artist_name     song_id           artist_id        
 Length:1123        Length:1123        Length:1123        Length:1123       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   artist_1         artist_id_1          artist_2         artist_id_2       
 Length:1123        Length:1123        Length:1123        Length:1123       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   artist_3         artist_id_3          artist_4         artist_id_4       
 Length:1123        Length:1123        Length:1123        Length:1123       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   artist_5         artist_id_5          artist_6         artist_id_6       
 Length:1123        Length:1123        Length:1123        Length:1123       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   artist_7         artist_id_7          artist_8         artist_id_8       
 Length:1123        Length:1123        Length:1123        Length:1123       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   artist_9         artist_id_9         artist_10         artist_id_10      
 Length:1123        Length:1123        Length:1123        Length:1123       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
  artist_11         artist_id_11        artist_12         artist_id_12      
 Length:1123        Length:1123        Length:1123        Length:1123       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
  artist_13         artist_id_13      
 Length:1123        Length:1123       
 Class :character   Class :character  
 Mode  :character   Mode  :character

collab_songs has following columns.

song_name
song_id: this can be used as a key when joining with song_detail dataframe
kpop_artist_name: This is a name of the Kpop top artists.(KPop top artists are from Spotify Top playlists or the answers from Chat GPT)
artist_id: This is a Spotify Artist id of the Kpop top artists. This can be used as a key when joining with artist dataframe
artist_[i]: This is a name of the artists that collaborated with Kpop top artists or the Kpop top artists themselves, which means that it might duplicate with kpop_artist_name (i is a number from 1 to 7)
artist_id_[i]: This is a Spotify Artist ID of the artists that collaborated with Kpop top artists or the Kpop top artists themselves, which means that it might duplicate with artist_id (i is a number from 1 to 7). This can be used as a key when joining with artists dataframe.

Code

collab_songs <- collab_songs %>% 
  select(c("song_name", "song_id", "kpop_artist_name", "artist_id", "artist_1", "artist_id_1", "artist_2", "artist_id_2", "artist_3", "artist_id_3", "artist_4", "artist_id_4", "artist_5", "artist_id_5", "artist_6", "artist_id_6", "artist_7", "artist_id_7", "artist_8", "artist_id_8", "artist_9", "artist_id_9", "artist_10", "artist_id_10", "artist_11", "artist_id_11", "artist_12", "artist_id_12", "artist_13", "artist_id_13"))

collab_songs

# A tibble: 1,123 × 30
   song_name     song_id kpop_…¹ artis…² artis…³ artis…⁴ artis…⁵ artis…⁶ artis…⁷
   <chr>         <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
 1 Angel Pt. 1 … 5qyjy0… BTS     3Nrfpe… Jimin   1oSPZh… JVKE    164Uj4… Kodak …
 2 Dreamers [Mu… 1RDvyO… BTS     3Nrfpe… Jung K… 6HaGTQ… BTS     3Nrfpe… FIFA S…
 3 Bad Decision… 0xzI1K… BTS     3Nrfpe… benny … 5CiGnK… BTS     3Nrfpe… Snoop …
 4 Left and Rig… 0mBP9X… BTS     3Nrfpe… Charli… 6VuMaD… Jung K… 6HaGTQ… BTS    
 5 My Universe … 5FvxRv… BTS     3Nrfpe… Coldpl… 4gzpq5… BTS     3Nrfpe… David …
 6 My Universe … 1Dlczm… BTS     3Nrfpe… Coldpl… 4gzpq5… BTS     3Nrfpe… Galant…
 7 My Universe … 6Lgbf4… BTS     3Nrfpe… Coldpl… 4gzpq5… BTS     3Nrfpe… <NA>   
 8 My Universe … 6BeOJP… BTS     3Nrfpe… Coldpl… 4gzpq5… BTS     3Nrfpe… SUGA   
 9 Butter - Meg… 474Vqn… BTS     3Nrfpe… Megan … 181bsR… <NA>    <NA>    <NA>   
10 Savage Love … 4TgxFM… BTS     3Nrfpe… Jawsh … 56mfhU… Jason … 07YZf4… BTS    
# … with 1,113 more rows, 21 more variables: artist_id_3 <chr>, artist_4 <chr>,
#   artist_id_4 <chr>, artist_5 <chr>, artist_id_5 <chr>, artist_6 <chr>,
#   artist_id_6 <chr>, artist_7 <chr>, artist_id_7 <chr>, artist_8 <chr>,
#   artist_id_8 <chr>, artist_9 <chr>, artist_id_9 <chr>, artist_10 <chr>,
#   artist_id_10 <chr>, artist_11 <chr>, artist_id_11 <chr>, artist_12 <chr>,
#   artist_id_12 <chr>, artist_13 <chr>, artist_id_13 <chr>, and abbreviated
#   variable names ¹kpop_artist_name, ²artist_id, ³artist_1, ⁴artist_id_1, …

Artists

artists is a dataframe where each row represents an artist (individual/group) that have participated in Kpop collaboration songs by top Kpop artists.

name: the name of the artist
id: This is a Spotify Artist ID of the artist. It can be used as a key when joining with collab_songs dataframe
top_kpop: if the artist is one of the top 50 Kpop artists in 2010, 2015, or 2020, the value is 1. Otherwise, the value is 0.
genre: the genre of the artist (can be multiple)
followers: the number of the followers on Spotify. Spotify API doesn’t provide the number of artists’ monthly followers so the number of followers is the only index of popularity of artists.

artist dataframe has 962 unique rows, meaning that there are 962 artists (including K-pop artists, and non-kpop artists) that have participated in K-pop musical collaboration

Code

skim(artists)

Data summary
Name	artists
Number of rows	962
Number of columns	5
_______________________
Column type frequency:
character	3
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
name	1	1	36	962
id	1	22	22	962
genre	1	2	154	237

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
top_kpop	0	1	0.15	0.35	0	0.00	0	0.0	1	▇▁▁▁▂
followers	0	1	1346115.00	5889520.19	0	1167.25	39480	387868.5	111956794	▇▁▁▁▁

Code

summary(artists)

     name                id               top_kpop         genre          
 Length:962         Length:962         Min.   :0.0000   Length:962        
 Class :character   Class :character   1st Qu.:0.0000   Class :character  
 Mode  :character   Mode  :character   Median :0.0000   Mode  :character  
                                       Mean   :0.1455                     
                                       3rd Qu.:0.0000                     
                                       Max.   :1.0000                     
   followers        
 Min.   :        0  
 1st Qu.:     1167  
 Median :    39480  
 Mean   :  1346115  
 3rd Qu.:   387868  
 Max.   :111956794

Code

artists

# A tibble: 962 × 5
   name                                id                  top_k…¹ genre follo…²
   <chr>                               <chr>                 <dbl> <chr>   <dbl>
 1 Relax Beer                          04BxwccHO4U2wMwVLv…       0 []     2   e0
 2 KAHI                                04C0MPapc1HXpTbspJ…       0 []     0     
 3 LE                                  04JxNGSux7QdmrCvnQ…       0 []     4.2 e1
 4 Sunwoojunga                         04L3elxyr0XFua2Ek3…       0 ['k-…  9.49e4
 5 Miso                                04xEkodoWyFji8icX9…       0 ['ko…  7.97e4
 6 Symba                               06S3fr7xEES7e3QPXh…       0 []     6.66e4
 7 Tommy “TBHits” Brown                06moEoVtCKweJmGYid…       0 []     5   e0
 8 m-flo loves Sik-K & eill & 向井太一 072YnMT5EniZhvDssX…       0 []     8.64e2
 9 Sebastian Yatra                     07YUOmWljBTXwIseAU…       0 ['co…  2.15e7
10 Jason Derulo                        07YZf4WDAMNwqr4jfg…       0 ['da…  1.18e7
# … with 952 more rows, and abbreviated variable names ¹top_kpop, ²followers

Code

artists %>% 
  group_by(id) %>%
  summarize(n=n()) %>%
  arrange(desc(n))

# A tibble: 962 × 2
   id                         n
   <chr>                  <int>
 1 04BxwccHO4U2wMwVLvYTLO     1
 2 04C0MPapc1HXpTbspJhmWg     1
 3 04JxNGSux7QdmrCvnQowyr     1
 4 04L3elxyr0XFua2Ek3domW     1
 5 04xEkodoWyFji8icX911jM     1
 6 06moEoVtCKweJmGYid8vU7     1
 7 06S3fr7xEES7e3QPXhu3ay     1
 8 072YnMT5EniZhvDssXL2VB     1
 9 07YUOmWljBTXwIseAUd9TW     1
10 07YZf4WDAMNwqr4jfgOZ8y     1
# … with 952 more rows

Code

# Clean data by removing unnecessary columns

artists <- artists %>% relocate(id, .before = name)

Code

# This is the list of the genre names that appear in artist dataframe
artists %>%
  select(genre) %>%
  separate_rows(genre, sep = ",\\s*") %>%
  mutate(genre = gsub("\\[|'|\\]", "", genre)) %>%
  arrange(genre) %>%
  count(genre)

# A tibble: 230 × 2
   genre                                 n
   <chr>                             <int>
 1 ""                                  353
 2 "a cappella"                          1
 3 "afrobeats"                           1
 4 "alt z"                               5
 5 "alternative r&b"                     2
 6 "american 21st century classical"     1
 7 "anime"                               1
 8 "art pop"                             3
 9 "asian american hip hop"              5
10 "atl hip hop"                         4
# … with 220 more rows

Code

library(stringr)
artists %>% filter(str_detect(genre, "desi pop"))

# A tibble: 1 × 5
  id                     name         top_kpop genre                     follo…¹
  <chr>                  <chr>           <dbl> <chr>                       <dbl>
1 4IKVDbCSBTxBeAsMKjAuTs Armaan Malik        0 ['desi pop', 'filmi', 'm…  1.79e7
# … with abbreviated variable name ¹followers

Code

artists %>% filter(str_detect(genre, "j-|japanese|josei"))

# A tibble: 8 × 5
  id                     name         top_kpop genre                     follo…¹
  <chr>                  <chr>           <dbl> <chr>                       <dbl>
1 1qM11R4ylJyQiPJ0DffE9z Lilas Ikuta         0 ['j-pop', 'japanese teen…  4.30e5
2 2mGYHril2LuZodRtTX06BC Kumi Koda           0 ['j-pop']                  5.69e5
3 2oNStf3CKKLM5lnzELWMcH Taichi Mukai        0 ['japanese r&b']           1.31e5
4 2vjeuQwzSP5ErC1S41gONX CHANMINA            0 ['j-rap', 'josei rap']     5.98e5
5 3AiES4wyTOfJvNgqz9baDn eill                0 ['anime', 'japanese r&b']  1.17e5
6 3JsHnjpbhX4SnySpvpa9DK V                   1 ['j-division', 'korean o…  1.31e7
7 3yzQHdj9G34CVZ5rVUDrOM Crystal Kay         0 ['j-pop', 'japanese r&b']  3.00e5
8 4UhiMIdxKqQxmzdE9nYe6O m-flo               0 ['j-pop']                  3.32e5
# … with abbreviated variable name ¹followers

I added two new columns to the artist dataFrame. The first column region_genre, which indicate the broader regional genre of each artist, such as K-pop, Latin-pop, or J-pop (Japanese pop), which would help us to better understand the overall geographical market. The second column would simply indicate whether each artist is K-pop or not.

First, I decided to classify the overall genre by detecting the follwoing word in the genre column

K-pop: “korean”, “k-” (for example k-pop, k-rap). Also any artists whose column “top-kpop” is 1 are considered as K-pop genre.

Latin-pop: “latin”, “latino”, “chicano”, “chileno”, “bachata”, “colombian”, “mexican”, “puerto rican”, “reggaeton”, “dominicano”

East Asia: “japanese”, “j-” (for example j-pop, j-rap, j-core), “visual-kei”, “josei”, “kawaii”, “chinese
,”c-“,”taiwan”

South East Asia: “vietnamese”, “viet”, “v-pop”, “singaporean”, “desi pop”, “bollywood”, “indian”, “indonesian”, “malaysian”, “thai”, “burmese”

Europe: “uk” (uk-pop, uk-hiphop), “norwegian”, “swedish”, “belgian”

US or Others: artists whose genre doesn’t include the above words but has any word

Unknown: artists whose genre is empty

Code

artists %>% group_by_all()%>%
  filter(n()>1) %>%
  ungroup()

# A tibble: 0 × 5
# … with 5 variables: id <chr>, name <chr>, top_kpop <dbl>, genre <chr>,
#   followers <dbl>

Code

artists <- artists %>% 
  mutate(region_category=case_when(
    grepl("k-|koreanp", genre) | top_kpop == 1 ~ "K-pop",
    grepl("latin|latino|chicano|chileno|bachata|colombian|mexican|puerto rican|reggaeton|dominicano", genre) ~ "Latino",
    grepl("j-|japanese|visual-kei|josei|visualkawaii|chinese|c-|taiwan", genre) ~ "East Asia",
    grepl("vietnamese|viet|v-|singaporean|desi pop|bollywood|indian |indonesian|malaysian|thai|burmese", genre) ~ "South East Asia",
    grepl("uk-|uk ", genre) ~ "Europe",
    grepl("\\[\\]", genre) ~ "Unknown",
    TRUE ~ "US or Other"

  ))

artists %>% 
  group_by(region_category) %>%
  summarize(n=n())

# A tibble: 7 × 2
  region_category     n
  <chr>           <int>
1 East Asia          13
2 Europe             11
3 K-pop             319
4 Latino             11
5 South East Asia    11
6 Unknown           349
7 US or Other       248

There are 349 artists whose genre is unknown. I checked them and changed region_category of some of them.

Code

print(artists %>% 
  filter(region_category == "Unknown") %>%
  arrange(desc(name)))

# A tibble: 349 × 6
   id                     name               top_kpop genre followers region_c…¹
   <chr>                  <chr>                 <dbl> <chr>     <dbl> <chr>     
 1 23aPUZaR8bESXN4UD3T2Sx 香取慎吾                  0 []        20927 Unknown   
 2 2VrDFxXbEHomf7A8Q87uRA 朴宰範                    0 []           18 Unknown   
 3 7BrydByY8Q9MZjovkvRsCP 尹美萊                    0 []          150 Unknown   
 4 76yPSQzTmeCY5fCsrMFoSw 혜미                      0 []            2 Unknown   
 5 3lTYYNwaDmW0rrsAz5aG8o 헤리티지(Heritage)        0 []           37 Unknown   
 6 1WLp3CeL7X1Ic7DTu67CCP 한겨울                    0 []            0 Unknown   
 7 7fCSy7Had5Fs540ilzxV20 트리키                    0 []          103 Unknown   
 8 76JVW7YYK3nRTVvRxUalpI 토끼                      0 []            2 Unknown   
 9 6OK3Lf8Ws6A4UdRw3lMNZo 태완                      0 []           57 Unknown   
10 0BeIulKOpcvsabwlt4u8qp 태양                      0 []         8010 Unknown   
# … with 339 more rows, and abbreviated variable name ¹region_category

Code

# I noticed there were a few Japanese artists so I manually changed his region_category "East Asia"

artists$region_category[artists$name == "香取慎吾"] <- "East Asia"
artists$region_category[artists$name == "Yutaka Furukawa"] <- "East Asia"
artists$region_category[artists$name == "Takanori Nishikawa T.M.Revolution"] <- "East Asia"
artists$region_category[artists$name == "Naoko Tanaka"] <- "East Asia"
artists$region_category[artists$name == "m-flo loves Sik-K & eill & 向井太一"] <- "East Asia"


# Artists whose name include Hangul, I manually changed their region_category to "K-pop"
artists <- artists %>% arrange(desc(name)) %>% arrange(region_category) # row 371 - 404 are in Hangul

artists$region_category[371:404] <- "K-pop"
artists$region_category[artists$name == "nafla (나플라)"] <- "K-pop"
artists$region_category[artists$name == "MC 몽"] <- "K-pop"

Code

print(artists %>%
  filter(region_category == "Unknown" & followers > 10000) %>%
  arrange(desc(followers))) %>%
  select(c("id", "name"))

# A tibble: 55 × 6
   id                     name        top_kpop genre followers region_category
   <chr>                  <chr>          <dbl> <chr>     <dbl> <chr>          
 1 70DFixYAFPv4Pf9kgSfR9O MARK               0 []       509054 Unknown        
 2 7wFDo161xYdeaiLz3KIHoM Gallant            0 []       302064 Unknown        
 3 2FgZrgTMX6Sk0VNcOsEPmm Punch              0 []       301148 Unknown        
 4 7cEaNXXTHx3LokbjUUyHal BIG Naughty        0 []       286344 Unknown        
 5 31SBgHxc8eqZUk9MdveH42 PREP               0 []       254055 Unknown        
 6 0JOxt5QOwq0czoJxvSc5hS GASHI              0 []       235666 Unknown        
 7 2qDIR2WlcW3llkGqJWg9VJ Lolo Zouaï         0 []       210312 Unknown        
 8 79R17q4kiPsimHDtdOlN2L SEHUN              0 []       184818 Unknown        
 9 31IZdHrCZ5pRhLz4zBxN3o Riff Raff          0 []       179063 Unknown        
10 5C01hDqpEmrmDfUhX9YWsH FIFA Sound         0 []       167771 Unknown        
# … with 45 more rows

# A tibble: 55 × 2
   id                     name       
   <chr>                  <chr>      
 1 70DFixYAFPv4Pf9kgSfR9O MARK       
 2 7wFDo161xYdeaiLz3KIHoM Gallant    
 3 2FgZrgTMX6Sk0VNcOsEPmm Punch      
 4 7cEaNXXTHx3LokbjUUyHal BIG Naughty
 5 31SBgHxc8eqZUk9MdveH42 PREP       
 6 0JOxt5QOwq0czoJxvSc5hS GASHI      
 7 2qDIR2WlcW3llkGqJWg9VJ Lolo Zouaï 
 8 79R17q4kiPsimHDtdOlN2L SEHUN      
 9 31IZdHrCZ5pRhLz4zBxN3o Riff Raff  
10 5C01hDqpEmrmDfUhX9YWsH FIFA Sound 
# … with 45 more rows

Code

# I manually changed the region_categoroy for artists whose followers are over 15000 by checking the online information

artists$region_category[artists$id == "70DFixYAFPv4Pf9kgSfR9O"] <- "K-pop"
artists$region_category[artists$name == "Gallant"] <- "US or Other"
artists$region_category[artists$name == "Punch"] <- "K-pop"
artists$region_category[artists$name == "BIG Naughty"] <- "K-pop"
artists$region_category[artists$name == "PREP"] <- "Europe"
artists$region_category[artists$name == "GASHI"] <- "US or Other"
artists$region_category[artists$name == "Lolo Zouaï"] <- "US or Other"
artists$region_category[artists$name == "SEHUN"] <- "K-pop"
artists$region_category[artists$name == "Riff Raff"] <- "US or Other"
artists$region_category[artists$name == "FIFA Sound"] <- "US or Other"

artists$region_category[artists$name == "vaultboy"] <- "US or Other"
artists$region_category[artists$name == "D_LITE"] <- "K-pop"
artists$region_category[artists$name == "Seraphine"] <- "K-pop"
artists$region_category[artists$name == "Wonstein"] <- "K-pop"
artists$region_category[artists$name == "YooA"] <- "K-pop"
artists$region_category[artists$name == "Wuki"] <- "US or Other"
artists$region_category[artists$name == "MINNIE"] <- "K-pop"
artists$region_category[artists$name == "Symba"] <- "US or Other"
artists$region_category[artists$name == "HWANG MIN HYUN"] <- "K-pop"


artists$region_category[artists$name == "T.O.P"] <- "K-pop"
artists$region_category[artists$name == "End of the World"] <- "East Asia"
artists$region_category[artists$name == "MELOH"] <- "K-pop"
artists$region_category[artists$name == "LUCAS"] <- "K-pop"
artists$region_category[artists$name == "Paul Blanco"] <- "US or Other"
artists$region_category[artists$name == "JEON WOONG"] <- "K-pop"
artists$region_category[artists$name == "Sion"] <- "K-pop"
artists$region_category[artists$name == "KEN THE 390"] <- "East Asia"
artists$region_category[artists$name == "BM"] <- "K-pop"

artists$region_category[artists$name == "Minhyun"] <- "K-pop"
artists$region_category[artists$name == "Yong Jun Hyung"] <- "K-pop"


artists$region_category[artists$name == "Lucas"] <- "K-pop" # I learned he is a Hong Kong rapper but is based in South Korea and is a member of Kpop group NCT.
artists$region_category[artists$name == "Paniel"] <- "K-pop"
artists$region_category[artists$name == "PANIEL"] <- "K-pop"
artists$region_category[artists$name == "PENIEL"] <- "K-pop"


artists$region_category[artists$name == "LIM YOUNG MIN"] <- "K-pop"
artists$region_category[artists$name == "KIM DONG HYUN"] <- "K-pop"

artists$region_category[artists$name == "D-LITE"] <- "K-pop"

artists$region_category[artists$name == "LUNA"] <- "K-pop"
artists$region_category[artists$name == "LEE DAE HWI"] <- "K-pop"
artists$region_category[artists$name == "LEEGIKWANG"] <- "K-pop"

artists$region_category[artists$name == "T.O.P."] <- "K-pop"

artists$region_category[artists$name == "Zior Park"] <- "K-pop"
artists$region_category[artists$name == "Sandara Park"] <- "K-pop"
artists$region_category[artists$name == "U-KWON"] <- "K-pop"
artists$region_category[artists$name == "HOYA"] <- "K-pop"
artists$region_category[artists$name == "Slom"] <- "K-pop"

artists$region_category[artists$name == "TAEHYUN"] <- "K-pop"
artists$region_category[artists$name == "WOOSEOK"] <- "K-pop"
artists$region_category[artists$name == "ABLE"] <- "K-pop"
artists$region_category[artists$name == "Imad Royal"] <- "US or Other"


artists$region_category[artists$name == "J.UNA"] <- "K-pop"
artists$region_category[artists$name == "Yeeun"] <- "K-pop"
artists$region_category[artists$name == "Reiley"] <- "Europe"
artists$region_category[artists$name == "NARSHA"] <- "K-pop"
artists$region_category[artists$name == "Son Dong Woon"] <- "K-pop"
artists$region_category[artists$name == "LEE CHANHYUK"] <- "K-pop"
artists$region_category[artists$name == "The Chosen"] <- "Latino"
artists$region_category[artists$name == "JeHwi"] <- "K-pop"
artists$region_category[artists$name == "inverness"] <- "US or Other"

Code

library(stringr)
print(artists %>% 
  filter(region_category == "Unknown" & str_detect(name,"\\(")))

# A tibble: 11 × 6
   id                     name                     top_k…¹ genre follo…² regio…³
   <chr>                  <chr>                      <dbl> <chr>   <dbl> <chr>  
 1 54JXhmpGDN8NdAIq44z4gt Young Jae (B.A.P)              0 []       4541 Unknown
 2 5bRCVFekTRnptEuJ0ZxTtf Yoon Han (Pop Pianist)         0 []          4 Unknown
 3 2KbOz7b91wiUpnv34Twd9f YOOK SUNGJAE (BTOB)            0 []        119 Unknown
 4 2oG0hcbCfRofbtKnuS1fWF Yeo-Eun (MelodyDay)            0 []         27 Unknown
 5 5KnPgqPc7aCvEfXrCs4QjV Yella Diamond) (Perform…       0 []          3 Unknown
 6 76EfJwRQeOeQ5aMh3FF7z4 Sanchez (of Phantom)           0 []         15 Unknown
 7 6KIjwPp994JQ2LJ8IBvZPc LEE MINHYUK (BTOB)             0 []        648 Unknown
 8 3LcvhSx1kL5VwaOHhAN1B5 Kwon Soonil(Urban Zakap…       0 []          4 Unknown
 9 0sSAFVekiDtX7PXDQvHnlK JONG UP (B.A.P)                0 []       5853 Unknown
10 1htPJUlogkZBjbKp86uuLF DAE HYUN (B.A.P)               0 []       6874 Unknown
11 3schR1HLbYu3RqqPDiDFrE ANYUJIN (IVE)                  0 []       1924 Unknown
# … with abbreviated variable names ¹top_kpop, ²followers, ³region_category

Code

# Some artists are from a group where he/she belongs to
artists$region_category[str_detect(artists$name, "B.A.P")] <- "K-pop"
artists$region_category[str_detect(artists$name, "BTOB")] <- "K-pop"
artists$region_category[str_detect(artists$name, "MelodyDay")] <- "K-pop"
artists$region_category[str_detect(artists$name, "Urban Zakapa")] <- "K-pop"

Code

artists %>% filter(region_category == "Unknown") %>%
  filter(str_detect(name, "of")) %>%
  select(c("id", "name"))

# A tibble: 5 × 2
  id                     name                     
  <chr>                  <chr>                    
1 4TYswX6bKUjM9rbEL7CMBH YEJI & RYUJIN of ITZY    
2 76EfJwRQeOeQ5aMh3FF7z4 Sanchez (of Phantom)     
3 40zyx4iztMjRbIIoI802r4 Felix of Stray Kids      
4 5Fa7oN67rqbrgxbRVux7F4 CHOI JUNG HOON of JANNABI
5 4nzWj1u4IslWFr3B5f7HfY Ahin of MOMOLAND

Code

# Some artists are from a group where he/she belongs to
artists$region_category[str_detect(artists$name, "ITZY")] <- "K-pop"
artists$region_category[str_detect(artists$name, "Stray Kids")] <- "K-pop"
artists$region_category[str_detect(artists$name, "MOMOLAND")] <- "K-pop"
artists$region_category[str_detect(artists$name, "JANNABI")] <- "K-pop"

Code

artists <- artists %>% 
  mutate(
    kpop = case_when(
    region_category == "K-pop" ~ "yes",
    TRUE ~ "no"),
    show_kpop_top = case_when(
      top_kpop == 1 ~ name,
      TRUE ~ ""
    )
  )

song_detail (song_detail.csv)

song_detail is a dataframe where each observation is a collaboration song by K-pop top artists and other artists. While collab_songs provides the general information of the artists that worked on the song, this dataframe provides more detailed information about the songs themselves such as duration, track number, release date etc…

It has 1123 rows (collaboration songs), which is the same as the number of the rows of collab_songs, and 19 columns.

Code

summary(song_detail)

    album             artists          available_markets   disc_number
 Length:1123        Length:1123        Length:1123        Min.   :1   
 Class :character   Class :character   Class :character   1st Qu.:1   
 Mode  :character   Mode  :character   Mode  :character   Median :1   
                                                          Mean   :1   
                                                          3rd Qu.:1   
                                                          Max.   :1   
  duration_ms      explicit       external_ids       external_urls     
 Min.   :  5889   Mode :logical   Length:1123        Length:1123       
 1st Qu.:192577   FALSE:1036      Class :character   Class :character  
 Median :211300   TRUE :87        Mode  :character   Mode  :character  
 Mean   :212247                                                        
 3rd Qu.:230346                                                        
 Max.   :459853                                                        
     href                id             is_local           name          
 Length:1123        Length:1123        Mode :logical   Length:1123       
 Class :character   Class :character   FALSE:1123      Class :character  
 Mode  :character   Mode  :character                   Mode  :character  
                                                                         
                                                                         
                                                                         
   popularity    preview_url         track_number        type          
 Min.   : 0.00   Length:1123        Min.   : 1.000   Length:1123       
 1st Qu.:17.00   Class :character   1st Qu.: 1.000   Class :character  
 Median :28.00   Mode  :character   Median : 1.000   Mode  :character  
 Mean   :29.02                      Mean   : 2.102                     
 3rd Qu.:40.00                      3rd Qu.: 3.000                     
 Max.   :86.00                      Max.   :11.000                     
     uri             release_date         release_year 
 Length:1123        Min.   :2001-12-19   Min.   :2001  
 Class :character   1st Qu.:2015-05-26   1st Qu.:2015  
 Mode  :character   Median :2018-03-08   Median :2018  
                    Mean   :2017-08-14   Mean   :2017  
                    3rd Qu.:2020-08-24   3rd Qu.:2020  
                    Max.   :2023-05-09   Max.   :2023

The necessary columns are as follows

id: This is a Spotify song id. Can be used as a key when joining with collab_songs
name: name of the songs
popularity: the popularity index from Spotify
release_year:

Code

# # I will remove unnecessary columns
# song_detail <- song_detail %>% 
#   select(c("id", "name", "popularity", "release_year"))
# 
# song_detail$release_year <- as.integer(song_detail$release_year)
# song_detail %>% arrange(release_year)
# ```
# 
# ## Extra data cleaning
# 
# `collab_songs` include unofficial songs by unofficial accounts. For example, there are a few songs by After School and Blackpink. However, if you look at the Spotify album page, this "Blackpink" is different from "BLACK PINK". There seem to be several cases that the artist ID is not the correct one. So I removed the songs that are not by "official" (=with more monthly followers) accounts.
# 
# 
# ```{r}
# 
# # Remove the songs by artist_1 whose id doesn't exist
# collab_songs <- collab_songs %>%
#   left_join(artists, by = c("artist_id_1" = "id")) %>%
#   filter(!is.na(name)) %>%
#   select(-c("name", "top_kpop", "genre", "followers", "name_lower", "region_category", "kpop"))
# 
# # Remove the songs by artist_2 whose id doesn't exist 
# collab_songs <- collab_songs %>% 
#   left_join(artists, by = c("artist_id_2" = "id")) %>%
#   filter(is.na(artist_2)|!is.na(name)) %>%
#   select(-c("name", "top_kpop", "genre", "followers", "name_lower", "region_category", "kpop"))
# 
# # Remove the songs by artist_3 whose id doesn't exist 
# collab_songs <- collab_songs %>% 
#   left_join(artists, by = c("artist_id_3" = "id")) %>%
#   filter(is.na(artist_3)|!is.na(name)) %>%
#   select(-c("name", "top_kpop", "genre", "followers", "name_lower", "region_category", "kpop"))
# 
# # Remove the songs by artist_4 whose id doesn't exist 
# collab_songs <- collab_songs %>% 
#   left_join(artists, by = c("artist_id_4" = "id")) %>%
#   filter(is.na(artist_4)|!is.na(name)) %>%
#   select(-c("name", "top_kpop", "genre", "followers", "name_lower", "region_category", "kpop"))
# 
# # Remove the songs by artist_5 whose id doesn't exist 
# collab_songs <- collab_songs %>% 
#   left_join(artists, by = c("artist_id_5" = "id")) %>%
#   filter(is.na(artist_5)|!is.na(name)) %>%
#   select(-c("name", "top_kpop", "genre", "followers", "name_lower", "region_category", "kpop"))
# 
# # Remove the songs by artist_6 whose id doesn't exist 
# collab_songs <- collab_songs %>% 
#   left_join(artists, by = c("artist_id_6" = "id")) %>%
#   filter(is.na(artist_6)|!is.na(name)) %>%
#   select(-c("name", "top_kpop", "genre", "followers", "name_lower", "region_category", "kpop"))
# 
# # Remove the songs by artist_7 whose id doesn't exist 
# collab_songs <- collab_songs %>% 
#   left_join(artists, by = c("artist_id_7" = "id")) %>%
#   filter(is.na(artist_7)|!is.na(name)) %>%
#   select(-c("name", "top_kpop", "genre", "followers", "name_lower", "region_category", "kpop"))
# 
# collab_songs

Exploratory Data Analysis

How many K-pop collaboration songs are released by year?

Code

collab_songs %>% 
  left_join(song_detail, by = c("song_id"= "id")) %>%
  ggplot(aes(x=release_year)) + 
  geom_bar() +
  scale_x_continuous(breaks=seq(2010, 2022, 5)) + 
  xlim(2000, 2023) +
  labs(title = "The number of K-pop collaboration songs", subtitle = "The number of collaboration songs by K-pop top artists is increasing over the time")

Scale for 'x' is already present. Adding another scale for 'x', which will
replace the existing scale.

Warning: Removed 1 rows containing missing values (geom_bar).

Is the popularity of K-pop collaboration songs increasing?

Code

song_detail %>%
  group_by(release_year) %>%
  summarize(mean_popularity = mean(popularity)) %>%
  ggplot(aes(x=release_year, y=mean_popularity)) + geom_line() +
  labs(title = "Popularity of K-pop collaboration songs", subtitle = "The popularity of K-pop collaboration songs is increasing constantly since around 2009")

The popularity of K-pop artists

Code

artists %>% filter(region_category=="K-pop") %>% 
  ggplot(aes(x=followers)) +
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Code

artists %>% 
  filter(region_category == "K-pop") %>%
  arrange(desc(followers)) %>% 
  select(c("name", "followers"))

# A tibble: 403 × 2
   name       followers
   <chr>          <dbl>
 1 BTS         64752045
 2 BLACKPINK   40782301
 3 TWICE       17544057
 4 j-hope      14644327
 5 RM          13576102
 6 V           13096788
 7 Agust D     11679645
 8 Stray Kids  11204838
 9 SEVENTEEN    9155169
10 Red Velvet   8364750
# … with 393 more rows

Data Processing

Convert the data in an edgelist format

To analyze this data in a perspective of social network, I need to convert this data into an edgelist where from_nodes are K-pop top artists, to_nodes are artists who collaborated with them, and edges are songs.

Code

# convert collab_songs to edgelist
edgelist <- collab_songs %>%
  pivot_longer(
    cols = starts_with("artist_id_"),
    names_to = "variable",
    values_to = "to_artist_id"
  )%>% 
  filter(!is.na(to_artist_id)) %>% # Remove the rows where the to_artist_id is blank 
  filter(artist_id != to_artist_id) %>%  # Remove the rows where from_artist and to_artist are the same 
  select(c("artist_id", "to_artist_id", "song_id", "song_name")) %>%
  left_join(artists, by = c("artist_id" = "id")) %>%
  select(c("artist_id", "to_artist_id", "song_id", "song_name", "name")) %>%
  left_join(artists, by = c("to_artist_id" = "id")) %>% 
  select(c("artist_id", "to_artist_id", "song_id", "song_name", "name.x", "name.y", "region_category"))
  
              
# Change the name of columns
colnames(edgelist) <- c("From", "To", "song_id", "song_name", "From_artists", "To_artists", "Collab_region1")

# Add new columns of collaboration types
edgelist <- edgelist %>% mutate(
  Collab_region2 = case_when(
    Collab_region1 == "K-pop" ~ "Domestic/Unknown",
    TRUE ~ "International"
  )
)

# Add a new column of release year
edgelist <- edgelist %>% 
  left_join(song_detail, by = c("song_id" = "id")) %>%
  select(c("From", "To", "song_id", "song_name", "From_artists", "To_artists", "Collab_region1", "Collab_region2", "release_year", "popularity"))

Code

# convert edgelist into a graph
collab.net <- igraph::graph_from_data_frame(edgelist, directed = FALSE, vertices = artists)

Code

# Setting the attributes

# # For visualization
# ## color
artists <- artists %>% mutate(
  color_region_category = case_when(
    region_category == "K-pop" ~ "#FFFFE1",
    region_category == "Latin" ~ "darkgreen",
    region_category == "South East Asia" ~ "#FDCCCC",
    region_category == "East Asia" ~ "#920002",
    region_category == "Europe" ~ "#ABD7E6",
    region_category == "US or Other" ~ "deepskyblue3",
    TRUE ~ "darkgrey" ),
  color_kpop = case_when(
    kpop == "yes" ~ "#FFFFE1",
    TRUE ~ "#010087"
  )
)



V(collab.net)$color <- artists$color_region_category
# 
# 
# V(collab.net)$name_kpop_top <- artists$show_name_kpop_top
# V(collab.net)$name_followers <- artists$show_name_followers

#edgelist <- edgelist %>% left_join(song_detail, by = c("song_id"="id"))

This graph contains 1519 edges (collaboration songs) by 962 artists in the last 20 years.

Code

# Check attributes

summary(collab.net)

IGRAPH 938b92c UN-- 962 1519 -- 
+ attr: name (v/c), top_kpop (v/n), genre (v/c), followers (v/n),
| region_category (v/c), kpop (v/c), show_kpop_top (v/c), color (v/c),
| song_id (e/c), song_name (e/c), From_artists (e/c), To_artists (e/c),
| Collab_region1 (e/c), Collab_region2 (e/c), release_year (e/n),
| popularity (e/n)

Code

igraph::vertex_attr_names(collab.net)

[1] "name"            "top_kpop"        "genre"           "followers"      
[5] "region_category" "kpop"            "show_kpop_top"   "color"

Code

igraph::edge_attr_names(collab.net)

[1] "song_id"        "song_name"      "From_artists"   "To_artists"    
[5] "Collab_region1" "Collab_region2" "release_year"   "popularity"

(Total) Overview of K-pop collaboration network

This is the graph of the last 20 years’ K-pop collaboration.

Code

plot(collab.net,
     vertex.label = NA,
     arrow.mode="-",
     vertex.size = log(V(collab.net)$followers) * 0.4,
     vertex.label.cex = .4,
     vertex.label.color = "black",
     vertex.frame.color = 'lightgrey',
     vertex.label.dist = 0,
     frame.width = 0,
     color = V(collab.net)$color_region_category,
     main = "K pop collaboration network for the last 20 years",
     ref = "The size of the circles means the popularity"
     )

legend(
  "bottomright",
  legend = c("K-pop", "Latin", "East Asia", "South East Asia", "Europe", "US or Other", "Unknown"),
  pt.bg  = c("#FFFFE1", "darkgreen", "#920002","#FDCCCC", "#ABD7E6", "deepskyblue3", "darkgrey"),
  pch    = 21,
  cex    = 1,
  bty    = "n",
  title  = "Genre"
  )

Network overview

Nodes and Edges

As mentioned, this network consists of 962 nodes (artists) and 1519 edges. The number of edges and the number of songs are not the same because one song can have more than 2 artists.

Code

print(vcount(collab.net))

[1] 962

Code

print(ecount(collab.net))

[1] 1519

Density and transitivity

Density and transitivity of K-pop collaboration network is not high.

Code

# Density
igraph::edge_density(collab.net)

[1] 0.003286165

Code

# Transitivity
transitivity(collab.net)

[1] 0.02449146

Degree

This is analysis on nodes.

Code

nodes <- data.frame(
  name = V(collab.net)$name,
  total.degree = degree(collab.net, mode = 'total'),
  eigen.centrality = evcent(collab.net)$vector)


library(reshape2)


Attaching package: 'reshape2'

The following object is masked from 'package:tidyr':

    smiths

Code

nodes %>% melt %>% 
  ggplot(aes(x = value, fill = variable, color = variable)) + geom_density(alpha = .2, bw = 5) +
  ggtitle('Degree Distribution')

Using name as id variables

Dyad Census

It is expected that there is no asymmetry dyad because collaboration network is NOT directed. There are quite many null dyads, meaning that there are many artists that don’t work together.

Code

dyad.census(collab.net)

$mut
[1] 1199

$asym
[1] 0

$null
[1] 461042

Triad Census

Most triads are empty because when artist A works with artist B, they don’t work with the same artists.

Code

triad.labels <- c("A,B,C, the empty graph.",
                  "A->B, C, the graph with a single directed edge.",
                  "A<->B, C, the graph with a mutual connection between two vertices.",
                  "A<-B->C, the out-star.",
                  "A->B<-C, the in-star.",
                  "A->B->C, directed line.","A<->B<-C.",
                  "A<->B->C.",
                  "A->B<-C, A->C.",
                  "A<-B<-C, A->C.",
                  "A<->B<->C.",
                  "A<-B->C, A<->C.",
                  "A->B<-C, A<->C.",
                  "A->B->C, A<->C.",
                  "A->B<->C, A<->C.",
                  "A<->B<->C, A<->C, the complete graph.")

triad.census.data <- data.frame(label = triad.labels, collab.net = triad.census(collab.net)) %>% melt

Warning in triad.census(collab.net): At core/misc/motifs.c:1165 : Triad census
called on an undirected graph.

Using label as id variables

Code

colnames(triad.census.data) <- c('triad', 'network', 'value')
triad.census.data %>% ggplot(aes(x = value, y = triad, fill = network)) + geom_bar(stat = 'identity', position = 'dodge')

Transitivity

Code

nodes$transitivity <- transitivity(collab.net, type = 'local')

melt(nodes) %>% filter(variable == 'transitivity' | variable == 'weighted.transitivity') %>% 
  ggplot(aes(x = value, fill = variable, color = variable)) + geom_density(alpha = 0.2) +
  ggtitle('Transitivity Distribution')

Using name as id variables

Warning: Removed 688 rows containing non-finite values (stat_density).

Component Structure

There are more than 110 components in this network. Most of them have members less than 25.

Code

wc.collab.net <- cluster_walktrap(collab.net)
member <- membership(wc.collab.net)

nodes$member <- member
table(member) %>% melt %>% ggplot + geom_bar(aes(x = reorder(member, - value), y = value, fill = factor(member)), stat = 'identity')

Closer look

Since the graph of 20 years doesn’t give information about how K-pop collaboration has evolved so I analyzed the graph by year. There are few collaboration songs before 2005, so from here I will limit my analysis to the time span from 2005 till present (as of May 17, 2023).

Code

# Make a graph for each year

graph_from_data_frame_with_all_vertices <- function(df, vertices) {
  # Create a data frame with all possible edges between vertices
  unique_vertices <- unique(c(df$From, df$To))
  artists_period <- artists[artists$id %in% unique_vertices, ]
  # Create the graph with all vertices and existing edges
  graph <- graph_from_data_frame(df, directed = FALSE, vertices = artists_period)
  return(graph)
}


graphs_year <- lapply(2006:2023, function(x) graph_from_data_frame_with_all_vertices(df = edgelist[edgelist$release_year == x, ], vertices = artists))

periods <- c(2006:2023)

# for (i in 1:length(graphs_year)) {
#   V(graphs_year[[i]])$color <- V(graphs_year[[i]])$color_region_category
#   plot(graphs_year[[i]], 
#      vertex.label = V(graphs_year[[i]])$show_kpop_top,
#      arrow.mode="-",
#      vertex.size = log(V(graphs_year[[i]])$followers) * 0.7,
#      vertex.label.cex = .8,
#      vertex.label.color = "black",
#      vertex.label.dist = 0,
#      frame.width = 0,
#      color = V(graphs_year[[i]])$color_region_category,
#      main = paste0("Collaboration by K-pop top artists ", periods[[i]])
#      )
# 
#   legend(
#     "bottomright",
#     legend = c("K-pop", "Latin", "East Asia", "South East Asia", "Europe", "US or Other", "Unknown"),
#     pt.bg  = c("#FFFFE1", "darkgreen", "#920002","#FDCCCC", "#ABD7E6", "#010087", "darkgrey"),
#     pch    = 21,
#     cex    = 1,
#     bty    = "n",
#     title  = "Genre"
#     )
# }

graph_stats <- data.frame(year = 2006:2023, 
                          num_nodes = numeric(length(2006:2023)), 
                          num_edges = numeric(length(2006:2023)), 
                          num_songs = numeric(length(2006:2023)),
                          artists_per_song = numeric(length(2006:2023)),
                          centralization = numeric(length(2006:2023)), 
                          density = numeric(length(2006:2023)), 
                          transitivity = numeric(length(2006:2023))
)

# Loop through each year and calculate the desired graph statistics
for (i in 1:length(graphs_year)) {
  graph <- graphs_year[[i]]
  
  # Fill in the corresponding row of the data frame with the calculated statistics
  graph_stats[i, "num_nodes"] <- vcount(graph)
  graph_stats[i, "num_edges"] <- ecount(graph)
  
  song_ids <- E(graph)$song_id
  num_unique_songs <- length(unique(song_ids))
  graph_stats[i, "num_songs"] <- num_unique_songs

  
  graph_stats[i, "centralization"] <- centr_degree(graph)$centralization
  graph_stats[i, "density"] <- graph.density(graph)
  graph_stats[i, "transitivity"] <- transitivity(graph, type = 'global')

  
}

collab_by_years <- edgelist %>%
  group_by(release_year) %>%
  summarize(
    popularity = mean(popularity),
    K_Pop = sum(Collab_region1 == "K-pop"),
    US_Other = sum(Collab_region1 == "US or Other"),
    E_Asia = sum(Collab_region1 == "East Asia"),
    SE_Asia = sum(Collab_region1 == "South East Asia"),
    Latino = sum(Collab_region1 == "Latino"),
    Europe = sum(Collab_region1 == "Europe"),
    Unknown = sum(Collab_region1 == "Unknown")
    )


graph_stats <- graph_stats %>% left_join(collab_by_years, by = c("year" = "release_year"))%>%
  mutate(song_per_kpop_artists = num_songs/`K_Pop`)

graph_stats

   year num_nodes num_edges num_songs artists_per_song centralization
1  2006         9         6         6                0     0.33333333
2  2007         4         2         2                0     0.00000000
3  2008        18        12         9                0     0.09803922
4  2009        34        35        25                0     0.18003565
5  2010        39        31        26                0     0.11605938
6  2011        53        50        32                0     0.13679245
7  2012        69        56        49                0     0.07907076
8  2013        71        53        42                0     0.06438632
9  2014        96        75        62                0     0.07828947
10 2015       128       120        88                0     0.16633858
11 2016       142       142       107                0     0.07801418
12 2017       133       120        96                0     0.12269310
13 2018       125       126        94                0     0.12083871
14 2019       154       166       121                0     0.28002716
15 2020       190       202       130                0     0.14747981
16 2021       154       132       101                0     0.04108310
17 2022       130       107        87                0     0.04925462
18 2023        94        78        40                0     0.32624113
      density transitivity popularity K_Pop US_Other E_Asia SE_Asia Latino
1  0.16666667   0.00000000   16.83333     4        2      0       0      0
2  0.33333333          NaN   10.00000     0        0      1       0      0
3  0.07843137   0.00000000   12.58333     4        4      0       0      0
4  0.06238859   0.00000000   17.74286    14        5      1       0      0
5  0.04183536   0.00000000   18.58065    19        4      0       0      0
6  0.03628447   0.00000000   17.38000    26        6      0       0      0
7  0.02387042   0.00000000   21.44643    28       12      1       0      0
8  0.02132797   0.00000000   19.01887    29        8      0       0      0
9  0.01644737   0.02654867   23.40000    50        7      0       0      0
10 0.01476378   0.01929260   23.28333    72       17      0       0      0
11 0.01418440   0.03114187   26.70423    90       26      2       0      0
12 0.01367054   0.02158273   30.47500    64       34      3       0      0
13 0.01625806   0.03191489   28.69841    56       40      3       1      4
14 0.01409048   0.01601423   31.20482    82       55      0       2      3
15 0.01125035   0.00000000   32.79208    93       54     19       5      0
16 0.01120448   0.00000000   37.82576    41       58      4       3      3
17 0.01276088   0.02238806   39.60748    34       51      4       1      1
18 0.01784489   0.00000000   46.20513    31       17      1       1      3
   Europe Unknown song_per_kpop_artists
1       0       0              1.500000
2       0       1                   Inf
3       0       4              2.250000
4       0      15              1.785714
5       0       8              1.368421
6       0      18              1.230769
7       0      15              1.750000
8       0      16              1.448276
9       0      18              1.240000
10      0      31              1.222222
11      0      24              1.188889
12      1      18              1.500000
13      2      20              1.678571
14      5      19              1.475610
15      2      29              1.397849
16      2      21              2.463415
17      2      14              2.558824
18      1      24              1.290323

Code

# (3 years moving average range) Closer look: How does the network change every year

Code

# Create a plot with a bar chart and a line chart
graph_stats %>%
  pivot_longer(cols = c("K_Pop", "US_Other", "E_Asia", "SE_Asia", "Latino", "Europe", "Unknown"), names_to = "collab_region") %>%
  mutate(collab_region = factor(collab_region, levels = c("K_Pop", "US_Other", "E_Asia", "SE_Asia", "Latino", "Europe", "Unknown"))) %>%
  ggplot(aes(x = year)) +
  geom_bar(aes(y = value, fill = collab_region), stat = "identity") +
  labs(y = "Number", x = "year", title = "The number of K-pop collaboration songs", sub = "After around 2017 the number of international collaboration songs increased.")  +
  scale_color_manual(" ", values = c("num_edges" = "black")) +
  scale_fill_manual("Collab Region", values = c("K_Pop" = "#FFFFE1", "Latino" = "darkgreen", "E_Asia" = "#920002",
                                               "SE_Asia" = "#FDCCCC", "Europe" = "#ABD7E6", "US_Other" = "#010087",
                                               "Unknown" = "darkgrey"))

Code

  theme_classic()

List of 93
 $ line                      :List of 6
  ..$ colour       : chr "black"
  ..$ size         : num 0.5
  ..$ linetype     : num 1
  ..$ lineend      : chr "butt"
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ rect                      :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ size         : num 0.5
  ..$ linetype     : num 1
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ text                      :List of 11
  ..$ family       : chr ""
  ..$ face         : chr "plain"
  ..$ colour       : chr "black"
  ..$ size         : num 11
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : num 0
  ..$ lineheight   : num 0.9
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ title                     : NULL
 $ aspect.ratio              : NULL
 $ axis.title                : NULL
 $ axis.title.x              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.75points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.top          :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.75points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.x.bottom       : NULL
 $ axis.title.y              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.75points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.title.y.left         : NULL
 $ axis.title.y.right        :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.75points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text                 :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey30"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 2.2points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.top           :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : num 0
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 2.2points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.x.bottom        : NULL
 $ axis.text.y               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 2.2points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.text.y.left          : NULL
 $ axis.text.y.right         :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 0points 2.2points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ axis.ticks                :List of 6
  ..$ colour       : chr "grey20"
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ axis.ticks.x              : NULL
 $ axis.ticks.x.top          : NULL
 $ axis.ticks.x.bottom       : NULL
 $ axis.ticks.y              : NULL
 $ axis.ticks.y.left         : NULL
 $ axis.ticks.y.right        : NULL
 $ axis.ticks.length         : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ axis.ticks.length.x       : NULL
 $ axis.ticks.length.x.top   : NULL
 $ axis.ticks.length.x.bottom: NULL
 $ axis.ticks.length.y       : NULL
 $ axis.ticks.length.y.left  : NULL
 $ axis.ticks.length.y.right : NULL
 $ axis.line                 :List of 6
  ..$ colour       : chr "black"
  ..$ size         : 'rel' num 1
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ axis.line.x               : NULL
 $ axis.line.x.top           : NULL
 $ axis.line.x.bottom        : NULL
 $ axis.line.y               : NULL
 $ axis.line.y.left          : NULL
 $ axis.line.y.right         : NULL
 $ legend.background         :List of 5
  ..$ fill         : NULL
  ..$ colour       : logi NA
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ legend.margin             : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ legend.spacing            : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ legend.spacing.x          : NULL
 $ legend.spacing.y          : NULL
 $ legend.key                : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.key.size           : 'simpleUnit' num 1.2lines
  ..- attr(*, "unit")= int 3
 $ legend.key.height         : NULL
 $ legend.key.width          : NULL
 $ legend.text               :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.text.align         : NULL
 $ legend.title              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ legend.title.align        : NULL
 $ legend.position           : chr "right"
 $ legend.direction          : NULL
 $ legend.justification      : chr "center"
 $ legend.box                : NULL
 $ legend.box.just           : NULL
 $ legend.box.margin         : 'margin' num [1:4] 0cm 0cm 0cm 0cm
  ..- attr(*, "unit")= int 1
 $ legend.box.background     : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ legend.box.spacing        : 'simpleUnit' num 11points
  ..- attr(*, "unit")= int 8
 $ panel.background          :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : logi NA
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ panel.border              : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ panel.spacing             : 'simpleUnit' num 5.5points
  ..- attr(*, "unit")= int 8
 $ panel.spacing.x           : NULL
 $ panel.spacing.y           : NULL
 $ panel.grid                :List of 6
  ..$ colour       : chr "grey92"
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 $ panel.grid.major          : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ panel.grid.minor          : list()
  ..- attr(*, "class")= chr [1:2] "element_blank" "element"
 $ panel.grid.major.x        : NULL
 $ panel.grid.major.y        : NULL
 $ panel.grid.minor.x        : NULL
 $ panel.grid.minor.y        : NULL
 $ panel.ontop               : logi FALSE
 $ plot.background           :List of 5
  ..$ fill         : NULL
  ..$ colour       : chr "white"
  ..$ size         : NULL
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ plot.title                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 1.2
  ..$ hjust        : num 0
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 5.5points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.title.position       : chr "panel"
 $ plot.subtitle             :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 0
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 0points 0points 5.5points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.caption              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : num 1
  ..$ vjust        : num 1
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 5.5points 0points 0points 0points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.caption.position     : chr "panel"
 $ plot.tag                  :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : 'rel' num 1.2
  ..$ hjust        : num 0.5
  ..$ vjust        : num 0.5
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ plot.tag.position         : chr "topleft"
 $ plot.margin               : 'margin' num [1:4] 5.5points 5.5points 5.5points 5.5points
  ..- attr(*, "unit")= int 8
 $ strip.background          :List of 5
  ..$ fill         : chr "white"
  ..$ colour       : chr "black"
  ..$ size         : 'rel' num 2
  ..$ linetype     : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_rect" "element"
 $ strip.background.x        : NULL
 $ strip.background.y        : NULL
 $ strip.placement           : chr "inside"
 $ strip.text                :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : chr "grey10"
  ..$ size         : 'rel' num 0.8
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : NULL
  ..$ lineheight   : NULL
  ..$ margin       : 'margin' num [1:4] 4.4points 4.4points 4.4points 4.4points
  .. ..- attr(*, "unit")= int 8
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ strip.text.x              : NULL
 $ strip.text.y              :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : num -90
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 $ strip.switch.pad.grid     : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ strip.switch.pad.wrap     : 'simpleUnit' num 2.75points
  ..- attr(*, "unit")= int 8
 $ strip.text.y.left         :List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi TRUE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi TRUE
 - attr(*, "validate")= logi TRUE

This graph shows that around 2012, the number of collaboration songs among K-pop artists was limited, with few international collaborations. The network is sparse In 2014, the number of K-pop collaborations started increasing with more Western collaborations. Between 2015 and 2020, K-pop experienced significant growth in terms of the number and types of collaborations. The collaborations werenot limited to the US-based artists, but to small numbers of Latino, East-Asian, South East Asian, and Europe based artists. After 2021, the number of collaborations per year is decreased, and the proportion of collaborations involving K-pop artists with artists from the US or countries other than Korea has been increasing.

(By regime)

The graph above illustrates that collaborations in K-pop have different characteristics depending on the era. Taking into account the aforementioned features and significant milestones in the history of K-pop, such as the 2012 breakout with “Gangnam Style” and the major advancement in the American market by BTS in 2017, I have decided to divide the period from 2005 to 2023 into four eras: 2005-2011, 2012-2016, 2017-2020, and 2021-2023.

Code

edgelist <- edgelist %>% 
  mutate(
    regime = case_when(
      release_year < 2012 ~ "2005-2011",
      release_year >= 2012 & release_year < 2017 ~ "2012-2016",
      release_year >= 2017 & release_year < 2021 ~ "2017-2020",
      TRUE ~ "2021-"
    )) 

edgelist

# A tibble: 1,519 × 11
   From    To    song_id song_…¹ From_…² To_ar…³ Colla…⁴ Colla…⁵ relea…⁶ popul…⁷
   <chr>   <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>     <dbl>   <dbl>
 1 3Nrfpe… 1oSP… 5qyjy0… Angel … BTS     Jimin   K-pop   Domest…    2023      79
 2 3Nrfpe… 164U… 5qyjy0… Angel … BTS     JVKE    US or … Intern…    2023      79
 3 3Nrfpe… 46SH… 5qyjy0… Angel … BTS     Kodak … Latino  Intern…    2023      79
 4 3Nrfpe… 0Erz… 5qyjy0… Angel … BTS     NLE Ch… US or … Intern…    2023      79
 5 3Nrfpe… 7tjV… 5qyjy0… Angel … BTS     Muni L… US or … Intern…    2023      79
 6 3Nrfpe… 6HaG… 1RDvyO… Dreame… BTS     Jung K… K-pop   Domest…    2022      86
 7 3Nrfpe… 5C01… 1RDvyO… Dreame… BTS     FIFA S… US or … Intern…    2022      86
 8 3Nrfpe… 5CiG… 0xzI1K… Bad De… BTS     benny … US or … Intern…    2022      78
 9 3Nrfpe… 7hJc… 0xzI1K… Bad De… BTS     Snoop … US or … Intern…    2022      78
10 3Nrfpe… 6VuM… 0mBP9X… Left a… BTS     Charli… US or … Intern…    2022      82
# … with 1,509 more rows, 1 more variable: regime <chr>, and abbreviated
#   variable names ¹song_name, ²From_artists, ³To_artists, ⁴Collab_region1,
#   ⁵Collab_region2, ⁶release_year, ⁷popularity

Code

#E(collab.net)$period <- edgelist$period

Code

# Split the edgelist dataframe into 4 groups based on "period"

edgelist_list <- split(edgelist, edgelist$regime)

# convert each element of the list into a separate dataframe

graphs <- list()


for (i in seq_along(edgelist_list)) {
  assign(paste0("edgelist_", names(edgelist_list)[i]), edgelist_list[[i]])
  
  unique_vertices <- unique(c(edgelist_list[[i]]$From, edgelist_list[[i]]$To))
  artists_period <- artists[artists$id %in% unique_vertices, ]
  
  # Create the graph from the filtered edgelist and vertices
  graph <- graph_from_data_frame(edgelist_list[[i]], directed = FALSE, vertices = artists_period)

  # Append the graph to the list of graphs
  graphs[[i]] <- graph
}

Code

# Assigning visualization attributes

regime <- c("2005-2011", "2012-2016", "2017-2020", "2021")

for (i in 1:length(graphs)) {
  V(graphs[[i]])$color <- V(graphs[[i]])$color_region_category
  plot(graphs[[i]], 
     vertex.label = NA,
     arrow.mode="-",
     vertex.size = log(V(graphs[[i]])$followers) * 0.7,
     vertex.label.cex = .6,
     vertex.label.color = "black",
     vertex.label.dist = 0,
     vertex.frame.color = 'lightgrey',
     frame.width = 0,
     color = V(graphs[[i]])$color_region_category,
     main = paste0("Collaboration by K-pop top artists\n", regime[[i]])
     )
  
  legend(
    "bottomright",
    legend = c("K-pop", "Latin", "East Asia", "South East Asia", "Europe", "US or Other", "Unknown"),
    pt.bg  = c("#FFFFE1", "darkgreen", "#920002","#FDCCCC", "#ABD7E6", "deepskyblue3", "darkgrey"),
    pch    = 21,
    cex    = 1,
    bty    = "n",
    title  = "Genre"
    )
}

Overview: How the collaboration patterns have changed over time?

Between 2006 and 2010, the number of collaboration songs among K-pop artists was limited, with few international collaborations with artists outside of South Korea. Additionally, the network was sparse and disconnected.
From 2011 to 2015, the number of collaboration songs slightly increased, with more international collaborations and a more connected network.
Between 2016 and 2020, K-pop experienced significant growth in terms of the number and types of collaborations. Notably, BTS, G(I)DLE, and Blackpink have a large number of international collaboration songs.
In 2021, the number of collaboration songs is not as large as that of 2016-2020 due to the shorter time span, but the proportion of international collaborations appears to be higher.

Code

collab_by_regimes <- edgelist %>% 
  group_by(regime) %>%
  summarize(
    popularity = mean(popularity),
    K_Pop = sum(Collab_region1 == "K-pop"),
    US_Other = sum(Collab_region1 == "US or Other"),
    E_Asia = sum(Collab_region1 == "East Asia"),
    SE_Asia = sum(Collab_region1 == "South East Asia"),
    Latino = sum(Collab_region1 == "Latino"),
    Europe = sum(Collab_region1 == "Europe"),
    Unknown = sum(Collab_region1 == "Unknown")
    )

Code

edgelist

# A tibble: 1,519 × 11
   From    To    song_id song_…¹ From_…² To_ar…³ Colla…⁴ Colla…⁵ relea…⁶ popul…⁷
   <chr>   <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>     <dbl>   <dbl>
 1 3Nrfpe… 1oSP… 5qyjy0… Angel … BTS     Jimin   K-pop   Domest…    2023      79
 2 3Nrfpe… 164U… 5qyjy0… Angel … BTS     JVKE    US or … Intern…    2023      79
 3 3Nrfpe… 46SH… 5qyjy0… Angel … BTS     Kodak … Latino  Intern…    2023      79
 4 3Nrfpe… 0Erz… 5qyjy0… Angel … BTS     NLE Ch… US or … Intern…    2023      79
 5 3Nrfpe… 7tjV… 5qyjy0… Angel … BTS     Muni L… US or … Intern…    2023      79
 6 3Nrfpe… 6HaG… 1RDvyO… Dreame… BTS     Jung K… K-pop   Domest…    2022      86
 7 3Nrfpe… 5C01… 1RDvyO… Dreame… BTS     FIFA S… US or … Intern…    2022      86
 8 3Nrfpe… 5CiG… 0xzI1K… Bad De… BTS     benny … US or … Intern…    2022      78
 9 3Nrfpe… 7hJc… 0xzI1K… Bad De… BTS     Snoop … US or … Intern…    2022      78
10 3Nrfpe… 6VuM… 0mBP9X… Left a… BTS     Charli… US or … Intern…    2022      82
# … with 1,509 more rows, 1 more variable: regime <chr>, and abbreviated
#   variable names ¹song_name, ²From_artists, ³To_artists, ⁴Collab_region1,
#   ⁵Collab_region2, ⁶release_year, ⁷popularity

Code

graph_stats_regimes <- data.frame(
                          regime = c("2005-2011", "2012-2016", "2017-2020", "2021-"),
                          start_year = c(2006, 2013, 2017, 2021),
                          end_year = c(2011, 2016, 2020, 2023),
                          num_nodes = numeric(length(regime)), 
                          num_edges = numeric(length(regime)), 
                          num_songs = numeric(length(regime)),
                          num_kpop_artists = numeric(length(regime)),
                          num_no_kpop_artists = numeric(length(regime)),
                          percent_no_kpop_artitst = numeric(length(regime)),
                          artists_per_song = numeric(length(regime)),
                          centralization = numeric(length(regime)), 
                          density = numeric(length(regime)), 
                          transitivity = numeric(length(regime))
)
graph_stats_regimes

     regime start_year end_year num_nodes num_edges num_songs num_kpop_artists
1 2005-2011       2006     2011         0         0         0                0
2 2012-2016       2013     2016         0         0         0                0
3 2017-2020       2017     2020         0         0         0                0
4     2021-       2021     2023         0         0         0                0
  num_no_kpop_artists percent_no_kpop_artitst artists_per_song centralization
1                   0                       0                0              0
2                   0                       0                0              0
3                   0                       0                0              0
4                   0                       0                0              0
  density transitivity
1       0            0
2       0            0
3       0            0
4       0            0

Code

regime

[1] "2005-2011" "2012-2016" "2017-2020" "2021"

Code

# Loop through each year and calculate the desired graph statistics

for (i in 1:length(regime)) {
  start_year <- graph_stats_regimes$start_year[i]
  end_year <- graph_stats_regimes$end_year[i]

  
  # Subset the data for the current 3-year span
  subset_data <- edgelist[edgelist$release_year %in% c(start_year:end_year), ]
  print(start_year)
  print(end_year)
  
  # Create the graph for the current 3-year span
  graph <- graph_from_data_frame_with_all_vertices(df = subset_data, vertices = artists)
  
  # Fill in the corresponding row of the data frame with the calculated statistics
  graph_stats_regimes[i, "num_nodes"] <- vcount(graph)
  graph_stats_regimes[i, "num_edges"] <- ecount(graph)
  
  song_ids <- E(graph)$song_id
  num_unique_songs <- length(unique(song_ids))
  graph_stats_regimes[i, "num_songs"] <- num_unique_songs
  
  kpop_values <- V(graph)$kpop
  num_kpop <- sum(kpop_values == "yes")
  graph_stats_regimes[i, "num_kpop_artists"] <- num_kpop
  
  graph_stats_regimes[i, "num_no_kpop_artists"] <- vcount(graph) - num_kpop
  graph_stats_regimes[i, "percent_no_kpop_artitst"] <- (vcount(graph)-num_kpop)/vcount(graph)
  
  graph_stats_regimes[i, "artists_per_song"] <- vcount(graph)/num_unique_songs
  
  graph_stats_regimes[i, "centralization"] <- centr_degree(graph)$centralization
  graph_stats_regimes[i, "density"] <- graph.density(graph)
  graph_stats_regimes[i, "transitivity"] <- transitivity(graph, type = 'global')
  
  kpop_values <- V(graph)$kpop
  num_kpop <- sum(kpop_values == "yes")
  graph_stats_regimes[i, "num_kpop_artists"] <- num_kpop

  
}

[1] 2006
[1] 2011
[1] 2013
[1] 2016
[1] 2017
[1] 2020
[1] 2021
[1] 2023

Code

graph_stats_regimes %>%
  left_join(collab_by_regimes) %>% 
  mutate(song_per_kpop_artists = num_songs/`K_Pop`)

Joining, by = "regime"

     regime start_year end_year num_nodes num_edges num_songs num_kpop_artists
1 2005-2011       2006     2011       126       136       100               71
2 2012-2016       2013     2016       324       390       299              190
3 2017-2020       2017     2020       432       614       441              207
4     2021-       2021     2023       318       317       228              141
  num_no_kpop_artists percent_no_kpop_artitst artists_per_song centralization
1                  55               0.4365079        1.2600000     0.08673016
2                 134               0.4135802        1.0836120     0.07923403
3                 225               0.5208333        0.9795918     0.20686173
4                 177               0.5566038        1.3947368     0.14513025
      density transitivity popularity K_Pop US_Other E_Asia SE_Asia Latino
1 0.017269841  0.000000000   17.40845    71       21      4       0      0
2 0.007453274  0.021097046   23.65471   269       70      3       0      0
3 0.006595342  0.025306122   31.07003   295      183     25       8      7
4 0.006289308  0.004087193   40.48896   106      126      9       5      7
  Europe Unknown song_per_kpop_artists
1      0      46              1.408451
2      0     104              1.111524
3     10      86              1.494915
4      5      59              2.150943

Code

# #graph_stats_regimes <- graph_stats_regimes %>% 
#   left_join(collab_by_regimes)%>%
#   mutate(song_per_kpop_artists = num_songs/`K_Pop`)

Code

# Define the 3-year spans



# for (i in 1:length(end_years)) {
#   end_year <- end_years[i]
#   start_year <- end_year - 2
#   year <- end_year -1
#   
#   # Subset the data for the current 3-year span
#   subset_data <- edgelist[edgelist$release_year %in% c(start_year:end_year), ]
#   
#   # Create the graph for the current 3-year span
#   graph <- graph_from_data_frame_with_all_vertices(df = subset_data, vertices = artists)
#   
#   V(graph)$color <- V(graph)$color_region_category
#   
#   plot(graph,
#        vertex.label = V(graph)$show_kpop_top,
#        arrow.mode = "-",
#        vertex.size = log(V(graph)$followers) * 0.7,
#        vertex.label.cex = 0.6,
#        vertex.label.color = "black",
#        vertex.frame.color = 'lightgrey',
#        vertex.label.dist = 0,
#        frame.width = 0,
#        color = V(graph)$color_region_category,
#        main = paste0("Collaboration by K-pop top artists ", start_year, "-", end_year)
#   )
# 
# 
#   legend(
#     "bottomright",
#     legend = c("K-pop", "Latin", "East Asia", "South East Asia", "Europe", "US or Other", "Unknown"),
#     pt.bg  = c("#FFFFE1", "darkgreen", "#920002","#FDCCCC", "#ABD7E6", "deepskyblue3", "darkgrey"),
#     pch    = 21,
#     cex    = 1,
#     bty    = "n",
#     title  = "Genre"
#     )
# }

Code

# end_years <- 2008:2023
# graph_stats_3years <- data.frame(
#                           start_year = 2006:2021,
#                           end_year = 2008:2023, 
#                           year = 2007:2022,
#                           num_nodes = numeric(length(2008:2023)), 
#                           num_edges = numeric(length(2008:2023)), 
#                           num_songs = numeric(length(2008:2023)),
#                           num_kpop_artists = numeric(length(2008:2023)),
#                           num_no_kpop_artists = numeric(length(2008:2023)),
#                           percent_no_kpop_artitst = numeric(length(2008:2023)),
#                           artists_per_song = numeric(length(2008:2023)),
#                           centralization = numeric(length(2008:2023)), 
#                           density = numeric(length(2008:2023)), 
#                           transitivity = numeric(length(2008:2023))
# )
# 
# # Loop through each year and calculate the desired graph statistics
# 
# for (i in 1:length(end_years)) {
#   end_year <- end_years[i]
#   start_year <- end_year - 2
#   
#   # Subset the data for the current 3-year span
#   subset_data <- edgelist[edgelist$release_year %in% c(start_year:end_year), ]
#   
#   # Create the graph for the current 3-year span
#   graph <- graph_from_data_frame_with_all_vertices(df = subset_data, vertices = artists)
#   
#   # Fill in the corresponding row of the data frame with the calculated statistics
#   graph_stats_3years[i, "num_nodes"] <- vcount(graph)
#   graph_stats_3years[i, "num_edges"] <- ecount(graph)
#   
#   song_ids <- E(graph)$song_id
#   num_unique_songs <- length(unique(song_ids))
#   graph_stats_3years[i, "num_songs"] <- num_unique_songs
#   
#   kpop_values <- V(graph)$kpop
#   num_kpop <- sum(kpop_values == "yes")
#   graph_stats_3years[i, "num_kpop_artists"] <- num_kpop
#   
#   graph_stats_3years[i, "num_no_kpop_artists"] <- vcount(graph) - num_kpop
#   graph_stats_3years[i, "percent_no_kpop_artitst"] <- (vcount(graph)-num_kpop)/vcount(graph)
#   
#   graph_stats_3years[i, "artists_per_song"] <- vcount(graph)/num_unique_songs
#   
#   graph_stats_3years[i, "centralization"] <- centr_degree(graph)$centralization
#   graph_stats_3years[i, "density"] <- graph.density(graph)
#   graph_stats_3years[i, "transitivity"] <- transitivity(graph, type = 'global')
#   
#   kpop_values <- V(graph)$kpop
#   num_kpop <- sum(kpop_values == "yes")
#   graph_stats_3years[i, "num_kpop_artists"] <- num_kpop
#   
#   
#   
# }
# 
# graph_stats_3years

How does the nature of collaboration change between 2017-2020 and 2021-?

As the first graph showed, the number of collaborations has decreased since 2021 after it peaked in 2020 even though the they are quite diverse with many US-based artists and a few East Asian, South East Asian, and Latino artists.

Here, I will try to find out what has changed since 2021 and what caused this change.

One of the hypotheses is that as K-pop had established its international popularity by 2021, there was no longer a need for as many collaborations with international artists. Instead, artists relatively less popular whose main base is not South Korea started collaborating with K-pop artists to benefit from its popularity.

In Regime 3 (2017-2020) and Regime 4 (2021-), the relationship between K-pop artists and international artists in collaborations has changed. In Regime 3, K-pop artists often collaborated with internationally renowned artists. However, in Regime 4, this trend has shifted, and compared to Regime 3, there seems to be an increasing tendency for more popular K-pop artists to collaborate with relatively less famous non-Korean-based artists. This can be observed in the right graph above, where the slope becomes gentler, with the y-axis representing the popularity of international artists and the x-axis representing the popularity of K-pop artists.

Code

analysis <- edgelist %>% 
  filter(regime == "2017-2020" | regime == "2021-" & Collab_region2 == "International") %>%
  select(-c("Collab_region1", "Collab_region2")) %>%
  left_join(artists %>% select(c("id", "followers")), by = c("From" = "id")) %>%
  left_join(artists %>% select(c("id", "followers")), by = c("To" = "id")) 

analysis <- analysis %>% 
  rename("Followers_of_Kpop_artists" = "followers.x",
         "Followers_of_Intl_artists" = "followers.y")

analysis %>% 
  filter(Followers_of_Kpop_artists < 2000000 & Followers_of_Intl_artists > 1000 & Followers_of_Intl_artists < 4000000) %>%
  ggplot(aes(x = `Followers_of_Kpop_artists`, y = `Followers_of_Intl_artists`)) +
  geom_point(color = "steelblue") +
  geom_smooth(method=lm , color="darkblue", fill="#69b3a2", se=TRUE) +
  geom_rug(col = "steelblue", alpha=0.1) +
  
  labs(x = "The number of followers of K-pop artists", y = "The number of followers of International artists", title = "The popularity of artists who collaborated in K-pop songs") +
  facet_wrap(~ regime)

`geom_smooth()` using formula 'y ~ x'

In order to confirm this hypothesis more quantitatively, a regression model was used to examine the popularity of collaboration songs involving K-pop artists, the popularity of the K-pop artists themselves (measured by the number of followers on Spotify), and the popularity of international artists. (popularity of songs ~ popularity of K-pop artists who participated in the collaboration + popularity of international artists who participated in the collaboration).

The results showed that the influence of the popularity of K-pop artists and international artists on the popularity of collaboration songs was 2.9 and 3.9 (significant with p = 0.05), respectively, from 2017 to 2020. However, from 2021 onwards, these values became 4.8 and 3.08 (significant with p = 0.05), respectively, indicating a reversal in the impact of the popularity of K-pop artists and international artists on the popularity of collaboration song.

Code

lm.pop_reg3 <- lm(popularity ~ Followers_of_Kpop_artists + Followers_of_Intl_artists, data = analysis %>% filter(regime == "2017-2020"))
lm.pop_reg4 <- lm(popularity ~ Followers_of_Kpop_artists + Followers_of_Intl_artists, data = analysis %>% filter(regime == "2021-"))


print(summary(lm.pop_reg3))


Call:
lm(formula = popularity ~ Followers_of_Kpop_artists + Followers_of_Intl_artists, 
    data = analysis %>% filter(regime == "2017-2020"))

Residuals:
    Min      1Q  Median      3Q     Max 
-43.935  -9.397  -0.991   8.596  40.528 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)               2.893e+01  6.409e-01  45.144  < 2e-16 ***
Followers_of_Kpop_artists 3.899e-07  6.279e-08   6.209 9.84e-10 ***
Followers_of_Intl_artists 7.319e-07  1.810e-07   4.044 5.93e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.97 on 611 degrees of freedom
Multiple R-squared:  0.1409,    Adjusted R-squared:  0.1381 
F-statistic: 50.09 on 2 and 611 DF,  p-value: < 2.2e-16

Code

print(summary(lm.pop_reg4))


Call:
lm(formula = popularity ~ Followers_of_Kpop_artists + Followers_of_Intl_artists, 
    data = analysis %>% filter(regime == "2021-"))

Residuals:
   Min     1Q Median     3Q    Max 
-35.89 -13.39   0.54  12.79  38.23 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)               3.428e+01  1.381e+00  24.821  < 2e-16 ***
Followers_of_Kpop_artists 4.781e-07  8.183e-08   5.842 1.96e-08 ***
Followers_of_Intl_artists 3.083e-07  1.215e-07   2.538   0.0119 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 17.9 on 208 degrees of freedom
Multiple R-squared:  0.2222,    Adjusted R-squared:  0.2148 
F-statistic: 29.72 on 2 and 208 DF,  p-value: 4.44e-12

Summary of evoluation of K-pop collaborations for last 20 years.

In summary, K-pop artists gained international popularity and a global audience partially due to collaborations with international artists starting in the mid-2010s. Since 2021,the number of collaboration songs has decreased. Besides, there has been a shift where artists outside of Korea are now collaborating with K-pop artists, leveraging their global popularity, to reach a broader audience.

Additional analysis

(Artists-based analysis) Who are the key artists?

The below are the top 10 artists with the highest eigenvector centrality. It is interesting that they are not necessarily the most popular artists. For example BTS, BLACKPINK, or other more popular K-pop artists are not listed here. 8 of them are K-Pop artists and the rest of them are Yultron, DJ based in Los Angeles, and Gray, a New York base band.

Code

nodes <- data.frame(
  name = V(collab.net)$name,
  followers = V(collab.net)$followers,
  region_category = V(collab.net)$region_category,
  kpop = V(collab.net)$kpop,
  degree = degree(collab.net, mode = 'total'),
  transitivity = transitivity(collab.net, type = 'local'),
  betweenness = betweenness(collab.net),
  eigen_cent = eigen_centrality(collab.net, directed = F, scale = F)$vector)

nodes %>% arrange(desc(eigen_cent)) %>% head(10)

                       name followers region_category kpop degree transitivity
Jay Park           Jay Park   1941923           K-pop  yes    171  0.007947639
Sik-K                 Sik-K    431030           K-pop  yes     84  0.035437431
Loco                   Loco    379142           K-pop  yes     57  0.033868093
pH-1                   pH-1    372581           K-pop  yes     15  0.238095238
Ugly Duck         Ugly Duck      9671           K-pop  yes      9  1.000000000
DJ Wegun           DJ Wegun     13768           K-pop  yes      9  0.000000000
GRAY                   GRAY    282549           K-pop  yes     15  0.200000000
Hit-Boy             Hit-Boy     68444     US or Other   no      7          NaN
YULTRON             YULTRON     34860     US or Other   no      7          NaN
Simon Dominic Simon Dominic    442764           K-pop  yes      9  0.666666667
              betweenness eigen_cent
Jay Park      118616.1730  0.6221463
Sik-K          55250.6056  0.3817554
Loco           39472.8834  0.2740903
pH-1           12630.9924  0.2043511
Ugly Duck          0.0000  0.1870638
DJ Wegun         474.1053  0.1763272
GRAY            3819.0552  0.1550732
Hit-Boy            0.0000  0.1520206
YULTRON            0.0000  0.1520206
Simon Dominic   1598.8051  0.1454675

Code

#nodes_stat_regime_list <- list()  # Create an empty list to store the dataframes

# for (i in 1:length(graphs)) {
#   nodes_stat_regime <- data.frame(
#     genre = V(graphs[[i]])$genre,
#     followers = V(graphs[[i]])$followers,
#     region_category = V(graphs[[i]])$region_category,
#     kpop = V(graphs[[i]])$kpop,
#     degree = degree(graphs[[i]]),
#     transitivity = transitivity(graphs[[i]], type = 'local'),
#     betweenness = betweenness(graphs[[i]]),
#     eigen_cent = eigen_centrality(graphs[[i]], directed = F, scale = F)$vector
#     
#   )
#   
#   nodes_stat_regime_list[[i]] <- nodes_stat_regime  # Store the dataframe in the list
# }

Most nodes have a small eigenvector centrality, whch is less than 0.1. The nodes with a relatively bigger eigenvector centrality are K-pop artists and do not have many followers.

Code

nodes %>% 
  filter(eigen_cent < 0.4 & followers < 4000000) %>% #excluding outliers
  ggplot(aes(x=followers, y=eigen_cent, color=region_category)) +
  geom_point()+
  scale_color_manual("region_category", 
                     values = c("K-pop" = "#FFFFE1", 
                                "Latino" = "darkgreen", 
                                "East Asia" = "#920002",
                                "Southeast Asia" = "#FDCCCC", 
                                "Europe" = "#ABD7E6", 
                                "US or Other" = "#010087",                                      "Unknown" = "darkgrey"))

There is not a clear correlation between the number of collaboration songs and the artist’s popularity.

Code

nodes %>% 
  filter(followers < 4000000) %>% #excluding outliers
  ggplot(aes(x=followers, y=degree, color=region_category)) +
  geom_point()+
  scale_color_manual("region_category", 
                     values = c("K-pop" = "#FFFFE1", 
                                "Latino" = "darkgreen", 
                                "East Asia" = "#920002",
                                "Southeast Asia" = "#FDCCCC", 
                                "Europe" = "#ABD7E6", 
                                "US or Other" = "#010087",                                      "Unknown" = "darkgrey"))

Code

# Make a graph for each year

# graph_from_data_frame_with_all_vertices <- function(df, vertices) {
#   # Create a data frame with all possible edges between vertices
#   unique_vertices <- unique(c(df$From, df$To))
#   artists_period <- artists[artists$id %in% unique_vertices, ]
#   # Create the graph with all vertices and existing edges
#   graph <- graph_from_data_frame(df, directed = FALSE, vertices = artists_period)
#   return(graph)
# }
# 
# 
# graphs_year <- lapply(2006:2023, function(x) graph_from_data_frame_with_all_vertices(df = edgelist[edgelist$release_year == x, ], vertices = artists))
# 
# periods <- c(2006:2023)
# 
# for (i in 1:length(graphs_year)) {
#   V(graphs_year[[i]])$color <- V(graphs_year[[i]])$color_region_category
#   plot(graphs_year[[i]], 
#      vertex.label = V(graphs_year[[i]])$show_kpop_top,
#      arrow.mode="-",
#      vertex.size = log(V(graphs_year[[i]])$followers) * 0.7,
#      vertex.label.cex = .8,
#      vertex.label.color = "black",
#      vertex.label.dist = 0,
#      frame.width = 0,
#      color = V(graphs_year[[i]])$color_region_category,
#      main = paste0("Collaboration by K-pop top artists ", periods[[i]])
#      )
# 
#   legend(
#     "bottomright",
#     legend = c("K-pop", "Latin", "East Asia", "South East Asia", "Europe", "US or Other", "Unknown"),
#     pt.bg  = c("#FFFFE1", "darkgreen", "#920002","#FDCCCC", "#ABD7E6", "#010087", "darkgrey"),
#     pch    = 21,
#     cex    = 1,
#     bty    = "n",
#     title  = "Genre"
#     )
# }

Code

# collab_by_years <- edgelist %>% 
#   group_by(release_year) %>%
#   summarize(
#     popularity = mean(popularity),
#     K_Pop = sum(Collab_region1 == "K-pop"),
#     US_Other = sum(Collab_region1 == "US or Other"),
#     E_Asia = sum(Collab_region1 == "East Asia"),
#     SE_Asia = sum(Collab_region1 == "South East Asia"),
#     Latino = sum(Collab_region1 == "Latino"),
#     Europe = sum(Collab_region1 == "Europe"),
#     Unknown = sum(Collab_region1 == "Unknown")
#     )
# 
# summary(collab_by_years)

Code

# graph_stats <- data.frame(year = 2006:2023, 
#                           num_nodes = numeric(length(2006:2023)), 
#                           num_edges = numeric(length(2006:2023)), 
#                           num_songs = numeric(length(2006:2023)),
#                           artists_per_song = numeric(length(2006:2023)),
#                           centralization = numeric(length(2006:2023)), 
#                           density = numeric(length(2006:2023)), 
#                           transitivity = numeric(length(2006:2023))
# )
# 
# # Loop through each year and calculate the desired graph statistics
# for (i in 1:length(graphs_year)) {
#   graph <- graphs_year[[i]]
#   
#   # Fill in the corresponding row of the data frame with the calculated statistics
#   graph_stats[i, "num_nodes"] <- vcount(graph)
#   graph_stats[i, "num_edges"] <- ecount(graph)
#   
#   song_ids <- E(graph)$song_id
#   num_unique_songs <- length(unique(song_ids))
#   graph_stats[i, "num_songs"] <- num_unique_songs
# 
#   
#   graph_stats[i, "centralization"] <- centr_degree(graph)$centralization
#   graph_stats[i, "density"] <- graph.density(graph)
#   graph_stats[i, "transitivity"] <- transitivity(graph, type = 'global')
# 
#   
# }
# 
# graph_stats <- graph_stats %>% left_join(collab_by_years, by = c("year" = "release_year"))%>%
#   mutate(song_per_kpop_artists = num_songs/`K_Pop`)
# 
# graph_stats

Discussion

This network analysis was conducted to illustrate the general characteristics and changes in K-pop over the past roughly 20 years. The analysis revealed that K-pop has increased the number of collaborations, particularly with Western artists, since the mid-2010s. In the late 2010s, K-pop expanded its collaborations to include a diverse range of artists from Asia, Europe, and Latin America, reaching its peak in the number of collaborations in 2020. This suggests that K-pop aimed to approach markets outside of Asia and increase its international popularity, especially after the massive success of “Gangnam Style” in 2012. There is no evidence to prove, however, the increase in the number of collaborations with Latino artists after 2018 may be due to the huge hit of “Despacito” by Luis Fonsi in 2017.

From 2021, the number of collaborations decreased, but the proportion of international collaborations surpassed collaborations between K-pop artists themselves. This indicates the presence of relatively less popular artists who are not based in Korea, but collaborate with internationally renowned K-pop artists to reach a wider audience.

According to the linear regression model, the popularity of collaborative songs between 2017 and 2020 was influenced by the popularity of the international artists involved in the collaborations. However, from 2021 onwards, it was found that the popularity of K-pop artists had a greater impact.

While this analysis represents one factor that can explain the recent prominence of K-pop, it lacks detailed analysis in terms of network analysis and has certain limitations in the data. For instance, community identification is limited. Due to the reliance on limited data primarily obtained from the Spotify API, it was not possible to analyze based on meaningful information such as the artists’ affiliations with agencies. Additionally, the data used in this analysis reflects the popularity of songs and artists as of May 2023, as current and retrospective information is unavailable from the Spotify API.

The findings proposed by this analysis are believed to be worth researching further based on detailed data and analysis in the future.

Source

KIM, K. H. (2021). Short History of K-Pop, K-Cinema, and K-Television. In Hegemonic Mimicry: Korean Popular Culture of the Twenty-First Century (pp. 35–84). Duke University Press. https://doi.org/10.2307/j.ctv1xrb6s6.5

Shin, H. (2009). Have you ever seen the Rain? And who’ll stop the Rain?: The globalizing project of Korean pop (K-pop). Inter-Asia Cultural Studies, 10(4), 507–523.

Overview

Literature review and hypothesis

Methods of studies

Read in & Describe data

Describe & Clean the data

collab_songs dataframe

Artists

song_detail (song_detail.csv)

Exploratory Data Analysis

How many K-pop collaboration songs are released by year?

Is the popularity of K-pop collaboration songs increasing?

The popularity of K-pop artists

Data Processing

Convert the data in an edgelist format

(Total) Overview of K-pop collaboration network

Network overview

Nodes and Edges

Density and transitivity

Degree

Dyad Census

Triad Census

Transitivity

Component Structure

Closer look

(By regime)

Overview: How the collaboration patterns have changed over time?

How does the nature of collaboration change between 2017-2020 and 2021-?

Summary of evoluation of K-pop collaborations for last 20 years.

Additional analysis

(Artists-based analysis) Who are the key artists?

Discussion

Source

`collab_songs` dataframe