A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets

Zuo, Chandler; Chen, Kailei; Keleş, Sündüz

doi:10.1007/978-3-319-31957-5_2

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9649))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

2160 Accesses

Abstract

Current analytic approaches for querying large collections of chromatin immunoprecipitation followed by sequencing (ChIP-seq) data from multiple cell types rely on individual analysis of each dataset (i.e., peak calling) independently. This approach discards the fact that functional elements are frequently shared among related cell types and leads to overestimation of the extent of divergence between different ChIP-seq samples. Methods geared towards multi-sample investigations have limited applicability in settings that aim to integrate 100s to 1000s of ChIP-seq datasets for query loci (e.g., thousands of genomic loci with a specific binding site). Recently, [1] developed a hierarchical framework for state-space matrix inference and clustering, named MBASIC, to enable joint analysis of user-specified loci across multiple ChIP-seq datasets. Although this versatile framework both estimates the underlying state-space (e.g., bound vs. unbound) and also groups loci with similar patterns together, its Expectation-Maximization based estimation structure hinders its applicability with large numbers of loci and samples. We address this limitation by developing a MAP-based Asymptotic Derivations from Bayes (MAD-Bayes) framework for MBASIC. This results in a K-means-like optimization algorithm which converges rapidly and hence enables exploring multiple initialization schemes and flexibility in tuning. Comparisons with MBASIC indicates that this speed comes at a relatively insignificant loss in estimation accuracy. Although MAD-Bayes MBASIC is specifically designed for the analysis of user-specified loci, it is able to capture overall patterns of histone marks from multiple ChIP-seq datasets similar to those identified by genome-wide segmentation methods such as ChromHMM and Spectacle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from £29.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hyperacetylated chromatin domains mark cell type-specific genes and suggest distinct modes of enhancer function

Article Open access 11 September 2020

Profiling haplotype specific CpG and CpH methylation within a schizophrenia GWAS locus on chromosome 14 in schizophrenia and healthy subjects

Article Open access 13 March 2020

Computationally Tractable Multivariate HMM in Genome-Wide Mapping Studies

References

Zuo, C., Hewitt, K.J., Bresnick, E.H., Keleş, S.: A hierarchical framework for state-space matrix inference and clustering. Ann. Appl. Stat. (Revised)
Google Scholar
The ENCODE project consortium: an integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Google Scholar
Roadmap epigenomics consortium: integrative analysis of 111 reference human epigenomes. Nature 518(7539), 317–330 (2015)
Google Scholar
Bardet, A.F., He, Q., Zeitlinger, J., Stark, A.: A computational pipeline for comparative ChIP-seq analyses. Nat. Protoc. 7(1), 45–61 (2012)
Article Google Scholar
Bao, Y., Vinciotti, V., Wit, E., AC’t Hoen, P.: Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data. BMC Bioinform. 14(1), 169 (2013)
Article Google Scholar
Zeng, X., Sanalkumar, R., Bresnick, E.H., Li, H., Chang, Q., Keleş, S.: jMOSAiCS: joint analysis of multiple ChIP-seq datasets. Genome Biol. 14, R38 (2013). Highly accessed. An R package for joint analysis of multiple ChIP-seq datasets. Available in Bioconductor http://bioconductor.org/packages/2.12/bioc/html/jmosaics.html
Article Google Scholar
Kuan, P.F., Chung, D., Pan, G., Thomson, J., Stewart, R., Keleş, S.: A statistical framework for the analysis of ChIP-Seq data. J. Am. Stat. Assoc. 106, 891–903 (2011). Software available on Galaxy http://toolshed.g2.bx.psu.edu/ and also on Bioconductor http://bioconductor.org/packages/2.8/bioc/html/mosaics.html
Article MATH MathSciNet Google Scholar
Bao, Y., Vinciotti, V., Wit, E., ’t Hoen, P.: Joint modeling of ChIP-seq data via a Markov random field model. Biostatistics 15(2), 296–310 (2014)
Article Google Scholar
Chen, K.B., Hardison, R., Zhang, Y.: dCaP: detecting differential binding events in multiple conditions and proteins. BMC Genomics 15(9), 1–14 (2014)
Article Google Scholar
Ernst, J., Kellis, M.: Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28(8), 817–825 (2010)
Article Google Scholar
Hoffman, M.M., Buske, O.J., Wang, J., Weng, Z., Bilmes, J.A., Noble, W.S.: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012)
Article Google Scholar
Song, J., Chen, K.C.: Spectacle: fast chromatin state annotation using spectral learning. Genome Biol. 16(1), 33 (2015)
Article Google Scholar
Sohn, K.A., Ho, J.W.K., Djordjevic, D., Jeong, H.H., Park, P.J., Kim, J.H.: hiHMM: Bayesian non-parametric joint inference of chromatin state maps. Bioinformatics, btv117 (2015)
Google Scholar
Liang, K., Keleş, S.: Detecting differential binding of transcription factors with ChIP-seq. Bioinformatics 28(1), 121–122 (2012). Available in Bioconductor (http://www.bioconductor.org/packages/2.12/bioc/html/DBChIP.html)
Article Google Scholar
Mahony, S., Edwards, M.D., Mazzoni, E.O., Sherwood, R.I., Kakumanu, A., Morrison, C.A., Wichterle, H., Gifford, D.K.: An integrated model of multiple-condition ChIP-Seq data reveals predeterminants of Cdx2 binding. PLoS Comput. Biol. 10(3), e1003501 (2014)
Article Google Scholar
Song, Q., Smith, A.D.: Identifying dispersed epigenomic domains from ChIP-Seq data. Bioinformatics 27, 870–1 (2011)
Article Google Scholar
Ferguson, J.P., Cho, J.H., Zhao, H.: A new approach for the joint analysis of multiple ChIP-seq libraries with application to histone modification. Stat. Appl. Genet. Mol. Biol. 11(3), Article 1 (2012)
Google Scholar
Taslim, C., Huang, T., Lin, S.: DIME: R-package for identifying differential ChIP-seq based on an ensemble of mixture models. Bioinformatics 27(11), 1569–70 (2011)
Article Google Scholar
Ji, H., Li, X., Wang, Q.F., Ning, Y.: Differential principal component analysis of ChIP-seq. Proc. Nat. Acad. Sci. U.S.A. 110(17), 6789–6794 (2013)
Article Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B Met. 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Zuo, C., Keleş, S.: A statistical framework for power calculations in ChIP-seq experiments. Bioinformatics 30(6), 853–860 (2014)
Article Google Scholar
Broderick, T., Kulis, B., Jordan, M.: MAD-Bayes: MAP-based asymptotic derivations from Bayes. In: Proceedings of the 30th International Conference on Machine Learning (2013)
Google Scholar
Blackwell, D., MacQueen, J.B.: Ferguson distributions via Polya urn schemes. Ann. Stat. 1(2), 353–355 (1973)
Article MATH MathSciNet Google Scholar
Aldous, D.J.: Exchangeability and related topics. In: Hennequin, P.L. (ed.) École d’Été de Probabilités de Saint-Flour XIII, vol. 1117, pp. 1–198. Springer, Heidelberg (1983)
Chapter Google Scholar
Hewitt, K.J., Kim, D.H., Devadas, P., Prathibha, R., Zuo, C., Sanalkumar, R., Johnson, K.D., Kang, Y.A., Kim, J.S., Dewey, C.N., Keleş, S., Bresnick, E.: Hematopoietic signaling mechanism revealed from a stem/progenitor cell cistrome. Mol. Cell 59(1), 62–74 (2015)
Article Google Scholar
Johnson, K.D., Hsu, A., Ryu, M.J., Boyer, M.E., Keleş, S., Zhang, J., Lee, Y., Holland, S.M., Bresnick, E.H.: Cis-element mutation in a GATA-2-dependent immunodeficiency syndrome governs hematopoiesis and vascular integrity. J. Clin. Inv. 10(122), 3692–3704 (2012)
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Wei, Y., Li, X., Wang, Q.F., Ji, H.: iASeq: integrative analysis of allele-specificity of protein-DNA interactions in multiple ChIP-seq datasets. BMC Genomics 13, 681 (2012)
Article Google Scholar
Gerstein, M.B., Kundaje, A., Hariharan, M., Landt, S.G., Yan, K.K., Cheng, C., Mu, X.J., Khurana, E., Rozowsky, J., Alexander, R., Min, R., Alves, P., Abyzov, A., Addleman, N., Bhardwaj, N., Boyle, A.P., Cayting, P., Charos, A., Chen, D.Z., Cheng, Y., Clarke, D., Eastman, C., Euskirchen, G., Frietze, S., Fu, Y., Gertz, J., Grubert, F., Harmanci, A., Jain, P., Kasowski, M., Lacroute, P., Leng, J., Lian, J., Monahan, H., O’Geen, H., Ouyang, Z., Partridge, E.C., Patacsil, D., Pauli, F., Raha, D., Ramirez, L., Reddy, T.E., Reed, B., Shi, M., Slifer, T., Wang, J., Wu, L., Yang, X., Yip, K.Y., Zilberman-Schapira, G., Batzoglou, S., Sidow, A., Farnham, P.J., Myers, R.M., Weissman, S.M., Snyder, M.: Architecture of the human regulatory network derived from ENCODE data. Nature 489(7414), 91–100 (2012)
Article Google Scholar
Wei, Y., Tenzen, T., Ji, H.: Joint analysis of differential gene expression in multiple studies using correlation motifs. Biostatistics 16(1), 31–46 (2015)
Article MathSciNet Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Cluster analysis: basic concepts and algorithms. In: Introduction to Data Mining, chap. 8 (2005)
Google Scholar
Landt, S.G., Marinov, G.K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B.E., Bickel, P., Brown, J.B., Cayting, P., et al.: ChIP-seq guidelines and practices of the encode and modencode consortia. Genome Res. 22(9), 1813–1831 (2012)
Article Google Scholar
Banerjee, A.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
Chandler Zuo, Kailei Chen & Sündüz Keleş

Authors

Chandler Zuo
View author publications
Search author on:PubMed Google Scholar
Kailei Chen
View author publications
Search author on:PubMed Google Scholar
Sündüz Keleş
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Sündüz Keleş .

Editor information

Editors and Affiliations

Princeton University, Princeton, New Jersey, USA
Mona Singh

Appendix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zuo, C., Chen, K., Keleş, S. (2016). A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets. In: Singh, M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science(), vol 9649. Springer, Cham. https://doi.org/10.1007/978-3-319-31957-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-31957-5_2
Published: 08 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31956-8
Online ISBN: 978-3-319-31957-5
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hyperacetylated chromatin domains mark cell type-specific genes and suggest distinct modes of enhancer function

Profiling haplotype specific CpG and CpH methylation within a schizophrenia GWAS locus on chromosome 14 in schizophrenia and healthy subjects

Computationally Tractable Multivariate HMM in Genome-Wide Mapping Studies

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us