Abstract
Complex diseases are associated with a variety of genomic factors. Identifying such risk factors can help us to better understand the pathogenesis of these diseases, which in turn may help the development of prevention and intervention strategies in terms of personalized treatments. Disease-associated risk factors can be identified by genome-wide association studies (GWAS) or the detection of differentially expressed (DE) genes. Traditional single-marker- or single-gene-based approaches lack adequate statistical powers due to the stringent threshold of multiple testing corrections. To address this problem, researchers have proposed pathway-based approaches, leveraging our knowledge of the gene–gene interactions to increase the power of conducting a large number of statistical tests. Many pathway-based approaches treat pathways as lists of genes that ignore the topology structures of genes. Including the topology structure into analysis allows modeling of specific gene–gene interactions. Previously, such approaches only consider two-way interactions, and they only incorporate a single pathway in GWAS or DE analysis. However, genes participate in different biological processes and thus interact with each other simultaneously in different pathways. We hereby extend the approach by combining multiple biological pathways into genomic studies. In the proposed approaches, the topology structures of biological pathways are modeled by a Markov random field, which is a graphical model to present the dependence structure in the dataset. Finally, a Bayesian framework is constructed to incorporate the knowledge from topological structures of biological pathways with the evidence from biological experiments. The inference of gene status, like disease association status, can be made based on the marginal posterior probability obtained from Bayesian analysis. The proposed approaches are evaluated with simulations studies and a lung cancer dataset. The results show that combining multiple biological pathways can enhance the power of detection and control the false positive rate.
Yujing Cao and Yu Zhang authors contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006). http://www.jstor.org/stable/30047444
Barabási, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12(1), 56–68 (2011)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57(1), 289–300 (1995). http://www.jstor.org/stable/2346101
Besag, J.: Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B Methodol. 36(2), 192–236 (1974)
Bokanizad, B., Tagett, R., Ansari, S., Helmi, B.H., Draghici, S.: SPATIAL: A System-level PAThway Impact AnaLysis approach. Nucl. Acids Res. 44(11), 5034–5044 (2016)
Buniello, A., MacArthur, J.A.L., Cerezo, M., Harris, L.W., Hayhurst, J., Malangone, C., McMahon, A., Morales, J., Mountjoy, E., Sollis, E., et al.: The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucl. Acids Res. 47(D1), D1005–D1012 (2019)
Bush, W.S., Moore, J.H.: Genome-wide association studies. PLoS Comput. Biol. 8(12), e1002822 (2012)
Casella, G., George, E.I.: Explaining the Gibbs sampler. Am. Stat. 46(3), 167–174 (1992)
Chen, M., Cho, J., Zhao, H.: Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLoS Genet. 7(4), 1–13 (2011). https://doi.org/10.1371/journal.pgen.1001353
Chen, M., Zang, M., Wang, X., Xiao, G.: A powerful Bayesian meta-analysis method to integrate multiple gene set enrichment studies. Bioinformatics (Oxford, England) 29, 862–869 (2013). https://doi.org/10.1093/bioinformatics/btt068
Chen, X., Wang, L., Hu, B., Guo, M., Barnard, J., Zhu, X.: Pathway-based analysis for genome-wide association studies using supervised principal components. Genet. Epidemiol. 34(7), 716–724 (2010)
Cookson, W., Liang, L., Abecasis, G., Moffatt, M., Lathrop, M.: Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 10(3), 184–194 (2009)
Creixell, P., Reimand, J., Haider, S., Wu, G., Shibata, T., Vazquez, M., Mustonen, V., Gonzalez-Perez, A., Pearson, J., Sander, C., et al.: Pathway and network analysis of cancer genomes. Nat. Methods 12(7), 615 (2015)
Dutta, B., Wallqvist, A., Reifman, J.: Pathnet: a tool for pathway analysis using topological information. Source Code Biol. Med. 7(1), 1 (2012)
Franke, A., McGovern, D.P., Barrett, J.C., Wang, K., Radford-Smith, G.L., Ahmad, T., Lees, C.W., Balschun, T., Lee, J., Roberts, R., et al.: Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat. Genet. 42(12), 1118–1125 (2010)
Freytag, S., Manitz, J., Schlather, M., Kneib, T., Amos, C.I., Risch, A., Chang-Claude, J., Heinrich, J., Bickeböller, H.: A network-based kernel machine test for the identification of risk pathways in genome-wide association studies. Hum. Hered. 76(2), 64–75 (2014)
Hindorff, L.A., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., Manolio, T.A.: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. 106(23), 9362–9367 (2009)
Hou, J., Acharya, L., Zhu, D., Cheng, J.: An overview of bioinformatics methods for modeling biological pathways in yeast. Brief. Funct. Genomics 15(2), 95–108 (2016)
Hou, L., Chen, M., Zhang, C.K., Cho, J., Zhao, H.: Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. Hum. Mol. Genet. 23(10), 2780–2790 (2014)
Jassal, B., Matthews, L., Viteri, G., Gong, C., Lorente, P., Fabregat, A., Sidiropoulos, K., Cook, J., Gillespie, M., Haw, R., Loney, F., May, B., Milacic, M., Rothfels, K., Sevilla, C., Shamovsky, V., Shorser, S., Varusai, T., Weiser, J., Wu, G., Stein, L., Hermjakob, H., D’Eustachio, P.: The reactome pathway knowledgebase. Nucl. Acids Res. 48, D498–D503 (2020). https://doi.org/10.1093/nar/gkz1031
Jin, L., Zuo, X.Y., Su, W.Y., Zhao, X.L., Yuan, M.Q., Han, L.Z., Zhao, X., Chen, Y.D., Rao, S.Q.: Pathway-based analysis tools for complex diseases: a review. Genomics Proteomics Bioinformatics 12(5), 210–220 (2014)
Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., Tanabe, M.: Data, information, knowledge and principle: back to metabolism in KEGG. Nucl. Acids Res. 42(D1), D199–D205 (2014)
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., Tanabe, M.: KEGG as a reference resource for gene and protein annotation. Nucl. Acids Res. 44(D1), D457–D462 (2016)
Krauss, G.: Biochemistry of Signal Transduction and Regulation. Wiley, London (2006)
Lin, Z., Li, M., Sestan, N., Zhao, H.: A markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data. Stat. Appl. Genet. Mol. Biol. 15(2), 139–150 (2016)
Liu, J., Peissig, P., Zhang, C., Burnside, E., McCarty, C., Page, D.: Graphical-model based multiple testing under dependence, with applications to genome-wide association studies. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, vol. 2012, p. 511. NIH Public Access (2012)
Liu, L., Lei, J., Roeder, K., et al.: Network assisted analysis to reveal the genetic basis of autism. Ann. Appl. Stat. 9(3), 1571–1600 (2015)
Loscalzo, J., Kohane, I., Barabasi, A.L.: Human disease classification in the postgenomic era: a complex systems approach to human pathobiology. Mol. Syst. Biol. 3(1), 124 (2007)
Luo, L., Peng, G., Zhu, Y., Dong, H., Amos, C.I., Xiong, M.: Genome-wide gene and pathway analysis. Eur. J. Hum. Genet. 18(9), 1045–1053 (2010)
Mitrea, C., Taghavi, Z., Bokanizad, B., Hanoudi, S., Tagett, R., Donato, M., Voichita, C., Draghici, S.: Methods and approaches in the topology-based analysis of biological pathways. Front. Physiol. 4(278), 1–22 (2013)
Mokry, M., Middendorp, S., Wiegerinck, C.L., Witte, M., Teunissen, H., Meddens, C.A., Cuppen, E., Clevers, H., Nieuwenhuis, E.E.: Many inflammatory bowel disease risk loci include regions that regulate gene expression in immune cells and the intestinal epithelium. Gastroenterology 146(4), 1040–1047 (2014)
Mourad, R., Sinoquet, C., Leray, P.: Probabilistic graphical models for genetic association studies. Brief. Bioinform. 13(1), 20–33 (2012)
Nica, A.C., Dermitzakis, E.T.: Using gene expression to investigate the genetic basis of complex disorders. Hum. Mol. Genet. 17(R2), R129–R134 (2008)
Pan, W.: Relationship between genomic distance-based regression and kernel machine regression for multi-marker association testing. Genet. Epidemiol. 35(4), 211–216 (2011). https://doi.org/10.1002/gepi.20567
Pan, W., Kim, J., Zhang, Y., Shen, X., Wei, P.: A powerful and adaptive association test for rare variants. Genetics 197(4), 1081–95 (2014). https://doi.org/10.1534/genetics.114.165035
Pan, W., Kwak, I.Y., Wei, P.: A powerful pathway-based adaptive test for genetic association with common or rare variants. Am. J. Hum. Genet. 97(1), 86–98 (2015). https://doi.org/10.1016/j.ajhg.2015.05.018
Pavlopoulos, G.A., Secrier, M., Moschopoulos, C.N., Soldatos, T.G., Kossida, S., Aerts, J., Schneider, R., Bagos, P.G.: Using graph theory to analyze biological networks. BioData Mining 4(1), 1 (2011)
Rapin, N., Bagger, F.O., Jendholm, J., Mora-Jensen, H., Krogh, A., Kohlmann, A., Thiede, C., Borregaard, N., Bullinger, L., Winther, O., et al.: Comparing cancer vs normal gene expression profiles identifies new disease entities and common transcriptional programs in AML patients. Blood 123(6), 894–904 (2014)
Ripke, S., O’Dushlaine, C., Chambert, K., Moran, J.L., Kähler, A.K., Akterin, S., Bergen, S., Collins, A.L., Crowley, J.J., Fromer, M., et al.: Genome-wide association analysis identifies 14 new risk loci for schizophrenia. Nat Genet. 45(10), 1150–1159 (2013)
Rodchenkov, I., Babur, O., Luna, A., Aksoy, B.A., Wong, J.V., Fong, D., Franz, M., Siper, M.C., Cheung, M., Wrana, M., Mistry, H., Mosier, L., Dlin, J., Wen, Q., O’Callaghan, C., Li, W., Elder, G., Smith, P.T., Dallago, C., Cerami, E., Gross, B., Dogrusoz, U., Demir, E., Bader, G.D., Sander, C.: Pathway commons 2019 update: integration, analysis and exploration of pathway data. Nucl. Acids Res. 48, D489–D497 (2020). https://doi.org/10.1093/nar/gkz946
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)
Shedden, K., Taylor, J.M.G., Enkemann, S.A., Tsao, M.S., Yeatman, T.J., Gerald, W.L., Eschrich, S., Jurisica, I., Giordano, T.J., Misek, D.E., Chang, A.C., Zhu, C.Q., Strumpf, D., Hanash, S., Shepherd, F.A., Ding, K., Seymour, L., Naoki, K., Pennell, N., Weir, B., Verhaak, R., Ladd-Acosta, C., Golub, T., Gruidl, M., Sharma, A., Szoke, J., Zakowski, M., Rusch, V., Kris, M., Viale, A., Motoi, N., Travis, W., Conley, B., Seshan, V.E., Meyerson, M., Kuick, R., Dobbin, K.K., Lively, T., Jacobson, J.W., Beer, D.G.: Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat. Med. 14, 822–827 (2008). https://doi.org/10.1038/nm.1790
Slenter, D.N., Kutmon, M., Hanspers, K., Riutta, A., Windsor, J., Nunes, N., Mélius, J., Cirillo, E., Coort, S.L., Digles, D., et al.: Wikipathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucl. Acids Res. 46(D1), D661–D667 (2018)
Song, G.G., Lee, Y.H.: Pathway analysis of genome-wide association study on asthma. Hum. Immunol. 74(2), 256–260 (2013)
Tarca, A.L., Draghici, S., Khatri, P., Hassan, S.S., Mittal, P., Kim, J.S., Kim, C.J., Kusanovic, J.P., Romero, R.: A novel signaling pathway impact analysis. Bioinformatics 25(1), 75–82 (2009)
Wei, P., Pan, W.: Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor. Ann. Appl. Stat. 6(1), 334 (2012)
Wu, M.C., Kraft, P., Epstein, M.P., Taylor, D.M., Chanock, S.J., Hunter, D.J., Lin, X.: Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86(6), 929–942 (2010)
Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M., Lin, X.: Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89(1), 82–93 (2011). https://doi.org/10.1016/j.ajhg.2011.05.029
Zalkin, H., DAGLEY, S., Nicholson, D.E.: An Introduction to Metabolic Pathways. Wiley, London (1971)
Zhi, W., Minturn, J., Rappaport, E., Brodeur, G., Li, H.: Network-based analysis of multivariate gene expression data. In: Statistical Methods for Microarray Data Analysis: Methods and Protocols, pp. 121–139 (2013)
Acknowledgements
This work was partially supported by the National Institutes of Health [grant R15GM131390 to X.W.].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Cao, Y., Zhang, Y., Wang, X., Chen, M. (2021). Graphical Modeling of Multiple Biological Pathways in Genomic Studies. In: Zhao, Y., Chen, (.DG. (eds) Modern Statistical Methods for Health Research. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-72437-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-72437-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72436-8
Online ISBN: 978-3-030-72437-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)