Gene to Gene Network Analysis (bird’s eye view)
This is a gene disease dataset set. This has been published by Carlos Castillo, a computer scientist based out o Barcelona Spain.The dataset has been used to teach network analysis to students at the university. The dataset was a result of the research conducted by Goh, K. I., Cusick, M. E., Valle, D., Childs, B., Vidal, M., & Barabási, A. L. (2007). “The human disease network”. Proceedings of the National Academy of Sciences, 104(21), 8685-8690. (https://doi.org/10.1073/pnas.0701361104). The effort here is to understand that many diseases may have common genetic origin. Hence, we want to find out which are those genes that may be connected by common among disorders. Therefore, here we will look Gene to Gene network. Below are the features of the data set:
We will only use the below two features:
Using these two features a bi-partite adjacency matrix will be created which then will be converted to one to one mode (Gene to Gene) by matrix operation. The disorder name would be pivoted in the data set to get the bi-partite matrix in place.
library(readr)
urlfile = ("https://raw.githubusercontent.com/chatox/networks-science-course/master/practicum/data/disease-genes.csv")
geny = read_csv(url(urlfile))
head(geny)
# A tibble: 6 x 6
ID Name Genes `OMIM ID` Chromosome Class
<dbl> <chr> <chr> <dbl> <chr> <chr>
1 1 17,20-lyase deficienc~ CYP17A1, C~ 609300 10q24.3 Endoc~
2 1 17-alpha-hydroxylase/~ CYP17A1, C~ 609300 10q24.3 Endoc~
3 3 2-methyl-3-hydroxybut~ HADH2, ERAB 300256 Xp11.2 Metab~
4 4 2-methylbutyrylglycin~ ACADSB 600301 10q25-q26 Metab~
5 5 3-beta-hydroxysteroid~ HSD3B2 201810 1p13.1 Metab~
6 6 3-hydroxyacyl-CoA deh~ HADHSC, SC~ 601609 4q22-q26 Metab~
library(stringr)
# Genes is character column and we need to split each row into list of genes
# strsplit() will split the character to list of characters
geny$Genes <- strsplit(geny$Genes, split = ", ")
head(geny)
# A tibble: 6 x 6
ID Name Genes `OMIM ID` Chromosome Class
<dbl> <chr> <list> <dbl> <chr> <chr>
1 1 17,20-lyase deficiency, i~ <chr [~ 609300 10q24.3 Endoc~
2 1 17-alpha-hydroxylase/17,2~ <chr [~ 609300 10q24.3 Endoc~
3 3 2-methyl-3-hydroxybutyryl~ <chr [~ 300256 Xp11.2 Metab~
4 4 2-methylbutyrylglycinuria <chr [~ 600301 10q25-q26 Metab~
5 5 3-beta-hydroxysteroid deh~ <chr [~ 201810 1p13.1 Metab~
6 6 3-hydroxyacyl-CoA dehydro~ <chr [~ 601609 4q22-q26 Metab~
# creating a new dataframe
geny_new <- geny[, c(2,3)]
colnames(geny_new)[1] <- "Disorder"
head(geny_new)
# A tibble: 6 x 2
Disorder Genes
<chr> <list>
1 17,20-lyase deficiency, isolated <chr [3]>
2 17-alpha-hydroxylase/17,20-lyase deficiency <chr [3]>
3 2-methyl-3-hydroxybutyryl-CoA dehydrogenase deficiency <chr [2]>
4 2-methylbutyrylglycinuria <chr [1]>
5 3-beta-hydroxysteroid dehydrogenase, type II, deficiency <chr [1]>
6 3-hydroxyacyl-CoA dehydrogenase deficiency <chr [2]>
# replicating rows for Genes having more than 1 gene per disease
library(tidyverse)
geny_new <- unnest(geny_new)
head(geny_new)
# A tibble: 6 x 2
Disorder Genes
<chr> <chr>
1 17,20-lyase deficiency, isolated CYP17A1
2 17,20-lyase deficiency, isolated CYP17
3 17,20-lyase deficiency, isolated P450C17
4 17-alpha-hydroxylase/17,20-lyase deficiency CYP17A1
5 17-alpha-hydroxylase/17,20-lyase deficiency CYP17
6 17-alpha-hydroxylase/17,20-lyase deficiency P450C17
geny_new["count"] <- 1
admat <- geny_new %>%
pivot_wider(id_cols = Genes, names_from = Disorder, values_from = count, values_fn = list(count = ~1))
# take out first column, make it column name:
admat2 <- as.matrix(admat[,-1])
rownames(admat2) <- admat$Genes
# now admat2 is the bipartite adjacency matrix
# it can stay NA, too
admat2[is.na(admat2)] <- 0
# this is now weighted, each cell will show how many disorders the genes share.
adj_mat <- admat2 %*% t(admat2)
library(statnet)
network.stat<-network(adj_mat, directed=F, matrix.type="adjacency", ignore.eval=FALSE, names.eval="weight")
print(network.stat)
Network attributes:
vertices = 3823
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 6643
missing edges= 0
non-missing edges= 6643
Vertex attribute names:
vertex.names
Edge attribute names not shown
The gene to gene network has below details:
# Extracting vertex attribute values from statnet object
head(network.stat %v% "vertex.names")
[1] "CYP17A1" "CYP17" "P450C17" "HADH2" "ERAB" "ACADSB"
head(network.stat %e% "weight")
[1] 2 2 2 1 1 1
summary(network.stat %e% "weight")
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 1.000 1.642 2.000 11.000
sna::dyad.census(network.stat)
Mut Asym Null
[1,] 6643 0 7299110
sna::triad.census(network.stat, mode = "graph")
0 1 2 3
[1,] 9279748342 25323442 7400 14887
gtrans(network.stat)
[1] 0.857859
isolates(network.stat)
[1] 6 7 14 17 18 22 32 33 39 89 105 106 109
[14] 119 131 145 156 197 202 203 204 205 225 226 227 273
[27] 285 288 289 290 303 304 325 326 333 345 346 361 372
[40] 381 382 383 389 390 391 395 415 424 479 488 504 519
[53] 531 534 535 536 537 553 554 572 585 586 592 607 608
[66] 609 610 621 629 636 644 647 656 661 668 674 679 683
[79] 750 757 763 764 772 773 774 775 776 778 783 784 785
[92] 786 808 809 813 825 866 874 881 915 928 940 975 978
[105] 979 982 995 1029 1036 1043 1044 1061 1077 1090 1091 1092 1117
[118] 1144 1167 1197 1211 1244 1254 1255 1278 1296 1328 1343 1344 1366
[131] 1375 1379 1390 1391 1398 1399 1450 1453 1480 1492 1519 1527 1535
[144] 1552 1556 1583 1586 1587 1593 1598 1599 1602 1610 1639 1655 1658
[157] 1669 1682 1687 1698 1702 1708 1711 1725 1752 1771 1788 1792 1793
[170] 1794 1795 1796 1799 1812 1815 1816 1822 1823 1824 1825 1830 1851
[183] 1888 1889 1890 1895 1896 1910 1911 1912 1913 1914 1919 1920 1921
[196] 1927 1939 1950 1974 1989 1994 1995 1996 2004 2012 2013 2018 2019
[209] 2024 2025 2026 2027 2028 2029 2036 2037 2045 2056 2057 2058 2059
[222] 2062 2086 2087 2104 2105 2108 2109 2112 2125 2128 2133 2173 2174
[235] 2183 2184 2185 2187 2188 2217 2224 2227 2230 2231 2254 2262 2273
[248] 2279 2291 2294 2314 2315 2332 2333 2354 2364 2370 2371 2386 2402
[261] 2406 2427 2437 2446 2450 2456 2469 2472 2473 2478 2481 2515 2521
[274] 2529 2542 2565 2566 2580 2581 2582 2603 2612 2615 2628 2636 2637
[287] 2650 2651 2673 2702 2706 2707 2731 2751 2781 2782 2783 2784 2789
[300] 2790 2791 2796 2809 2814 2823 2824 2844 2851 2863 2878 2928 2935
[313] 2939 2948 2966 2971 2974 2987 2990 3041 3050 3059 3070 3071 3072
[326] 3086 3087 3099 3105 3119 3122 3161 3211 3212 3237 3244 3248 3249
[339] 3250 3251 3258 3264 3276 3279 3287 3291 3292 3310 3319 3327 3340
[352] 3349 3357 3358 3363 3364 3384 3395 3407 3417 3418 3430 3431 3434
[365] 3435 3446 3447 3467 3470 3471 3472 3473 3482 3486 3494 3496 3507
[378] 3525 3528 3529 3532 3542 3551 3566 3585 3600 3608 3612 3616 3617
[391] 3622 3639 3651 3652 3656 3657 3660 3668 3669 3674 3675 3689 3690
[404] 3714 3715 3726 3738 3748 3749 3750 3751 3752 3753 3756 3773 3788
[417] 3801 3802 3805 3822 3823
#subset vertex.names attribute to get names of isolates
x <- as.vector(network.stat%v%'vertex.names')[c(isolates(network.stat))]
x
[1] "ACADSB" "HSD3B2" "AUH" "CUL7" "TPMT"
[6] "HLA-B" "CAT" "MDM2" "COL2A1" "EGFR"
[11] "ADA" "ADSL" "MEN1" "POR" "TBS19"
[16] "GCNT2" "FGA" "TYR" "ALDH2" "GABRA2"
[21] "ALDOA" "CYP11B2" "ACTN3" "ACAT1" "AMACR"
[26] "ENAM" "AMPD3" "APOA1" "GSN" "LYZ"
[31] "ALB" "MC1R" "NRAMP2" "SPTB" "COL3A1"
[36] "XPNPEP2" "HP" "AT3" "FGF10" "APOA2"
[41] "APOC3" "APOH" "VPS33B" "ARG1" "ASL"
[46] "DDC" "AGA" "ALOX5" "TF" "GATA4"
[51] "GLO1" "DRD4" "BBS1" "BBS7" "BBS2"
[56] "BBS4" "BBS5" "RFX5" "RFXAP" "BSND"
[61] "FTL" "SLC19A3" "MYF6" "GP1BA" "GP1BB"
[66] "GP9" "HLA-DPB1" "BTD" "HRAS" "TBXA2R"
[71] "ABO" "DAF" "AQP3" "KEL" "LW"
[76] "BSG" "RHCE" "XG" "PHB" "PLOD2"
[81] "MAOA" "MYC" "C1QA" "C1QB" "C1QG"
[86] "C1S" "C2" "C3" "C6" "C7"
[91] "C8B" "C9" "ASPA" "FGFR4" "CPS1"
[96] "SCO2" "TNNC1" "COX15" "MYH8" "CRYBB1"
[101] "CD8A" "CP" "CLN2" "CLN5" "CLN6"
[106] "CETP" "DNM2" "KIAA1985" "CHD7" "NSDHL"
[111] "CHIT" "LIPA" "EXT1" "CYBA" "NCF1"
[116] "NCF2" "ASS" "COH1" "ODC1" "BUB1"
[121] "NDUFS6" "ALG6" "ALG12" "ALG8" "PLG"
[126] "CPO" "IGBP1" "CPT1A" "CPT2" "HLA-DQB1"
[131] "INSL3" "ELN" "CTH" "CTNS" "D2HGD"
[136] "HSD17B4" "DFNA5" "MYO1A" "ESPN" "KIAA1199"
[141] "DRPLA" "WT1" "AQP2" "INSR" "GCK"
[146] "PTF1A" "AKT2" "IPF1" "VEGF" "LIG1"
[151] "TOP1" "DBH" "FAAH" "F2" "COL7A1"
[156] "EDARADD" "PKP1" "COL1A2" "SPTA1" "COX10"
[161] "TLR4" "ENO3" "EPX" "ITGA6" "ME2"
[166] "SYN1" "OPCML" "HBA1" "HBB" "HBA2"
[171] "EPOR" "LOR" "RNF6" "EXT2" "NPC1L1"
[176] "GLA" "MCFD2" "F7" "F10" "F11"
[181] "F13B" "FANCF" "LCAT" "FMO3" "KNG"
[186] "FSHB" "TDGF1" "FBP1" "ALDOB" "KHK"
[191] "FUCA1" "FUT6" "GALK1" "GALE" "GALT"
[196] "GAMT" "GBA" "CYP7B1" "DMBT1" "MC2R"
[201] "GCS1" "FTCD" "GCDH" "GK" "GNMT"
[206] "PHKG2" "G6PT1" "GAA" "GBE1" "GYS2"
[211] "PYGL" "PFKM" "GLB1" "GM2A" "KIAA1279"
[216] "DHH" "CTLA4" "MLPH" "GHRHR" "STAT5B"
[221] "IGF1" "OAT" "ELA2" "HMOX1" "AK1"
[226] "BPGM" "GPI" "HK1" "TPI1" "F5"
[231] "LIPC" "MET" "HMGCL" "HMGCS2" "TBX5"
[236] "CBS" "MTHFR" "HBG1" "HBG2" "EPHX2"
[241] "GLRB" "GLUD1" "APOC2" "AASS" "INS"
[246] "KCNMB1" "ADD1" "TSHR" "PAX9" "LHB"
[251] "PTH" "GCMB" "PAX8" "TSHB" "ICHYN"
[256] "IGHG2" "CD3E" "CD3G" "MYH2" "ITPA"
[261] "GABRB3" "IVD" "AHI1" "NAGA" "IGKC"
[266] "DSG1" "GALC" "LDHB" "PDX1" "GHR"
[271] "ALAD" "SURF1" "COL4A6" "TAL2" "ARNT"
[276] "AF1Q" "NUMA1" "BCL2" "TCRA" "ABL1"
[281] "LIG4" "STAR" "ECM1" "AKAP10" "LPA"
[286] "CILP" "RAP1GDS1" "BCL8" "VMD2" "MASP2"
[291] "PYGM" "XK" "NF2" "OPHN1" "ARSA"
[296] "COL10A1" "CYB5" "DIA1" "MMAA" "MMAB"
[301] "RFXANK" "MCPH1" "SIX6" "EDNRA" "TK2"
[306] "SUCLA2" "MYMY3" "GNPTAG" "HYAL1" "PHKA1"
[311] "IL12RB1" "MDS1" "CBFB" "AMPD1" "ECGF1"
[316] "ITGA7" "CLCN1" "NAGS" "NHS" "HSN2"
[321] "RAC2" "GNAT1" "NP" "POMC" "MC4R"
[326] "UCP3" "MC3R" "OA1" "SAG" "PAX2"
[331] "OTC" "RIL" "NDUFV2" "PIGA" "PEX12"
[336] "PTS" "PHGDH" "PRPS1" "PHKB" "PSP"
[341] "GLI2" "LHX3" "PKD1" "PKDTS" "COL4A1"
[346] "UROS" "UROD" "PEPD" "MSR1" "PROS1"
[351] "SRD5A2" "PSORS6" "CTSK" "NOS1" "PC"
[356] "PDHB" "OGG1" "CA2" "OPRM1" "LRAT"
[361] "IMPDH1" "RP2" "CERKL" "RP9" "USH2A"
[366] "MERTK" "RBP4" "RHD" "CYP2R1" "VDR"
[371] "ESCO2" "WNT4" "HEXB" "NAGLU" "EMX2"
[376] "TRAR4" "SOST" "SPR" "USP26" "IL7R"
[381] "LHX4" "NODAL" "TBX4" "SPG3A" "EPB42"
[386] "PPP2R2B" "SCA25" "PLEKHG4" "SCA8" "TDP1"
[391] "MESP2" "HMGCR" "SSADH" "SI" "SUOX"
[396] "SOD3" "FBLN1" "WHN" "DAD1" "HBD"
[401] "LCRB" "HRG" "PROC" "TRHR" "TBG"
[406] "TALDO1" "TRPS1" "HADHB" "SPG20" "FAH"
[411] "TAT" "HPD" "TBX3" "APRT" "GGCX"
[416] "TKT" "XDH" "XPA" "DDB2" "PEX16"
[421] "PEX3"
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Kumar (2022, Feb. 17). Data Analytics and Computational Social Science: The Human Disease Taana Baana. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomak64823865458/
BibTeX citation
@misc{kumar2022the, author = {Kumar, Abhinav}, title = {Data Analytics and Computational Social Science: The Human Disease Taana Baana}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomak64823865458/}, year = {2022} }