Phased Genome Assemblies

Duitama, Jorge

doi:10.1007/978-1-0716-2819-5_16

Jorge Duitama⁴

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2590))

1538 Accesses
5 Citations
1 Altmetric

Abstract

The ultimate goal of de novo assembly of reads sequenced from a diploid individual is the separate reconstruction of the sequences corresponding to the two copies of each chromosome. Unfortunately, the allele linkage information needed to perform phased genome assemblies has been difficult to generate. Hence, most current genome assemblies are a haploid mixture of the two underlying chromosome copies present in the sequenced individual. Sequencing technologies providing long (20 kb) and accurate reads are the basis to generate phased genome assemblies. This chapter provides a brief overview of the main milestones in traditional genome assembly, focusing on the bioinformatic techniques developed to generate haplotype information from different specialized protocols. Using these techniques as a knowledge background, the chapter reviews the current algorithms to generate phased assemblies from long reads with low error rates. Current techniques perform haplotype-aware error correction steps to increase the quality of the raw reads. In addition, variations on the traditional overlap-layout-consensus (OLC) graph have been developed in an effort to eliminate edges between reads sequenced from different chromosome copies. This allows for large presence–absence variants between the chromosome copies to be taken into account. The development of these algorithms, along with the improved sequencing technologies has been crucial to finish chromosome-level assemblies of complex genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from £29.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Protocol: GBP 34.95; Price includes VAT (United Kingdom)

eBook: GBP 95.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 119.99; Price includes VAT (United Kingdom)

Hardcover Book: GBP 179.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Current Progress in Phased Genome Assembly from Long-Read DNA Sequencing Data

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads

Article Open access 07 December 2020

phasebook: haplotype-aware de novo assembly of diploid genomes from long reads

Article Open access 27 October 2021

References

Koren S, Phillippy AM (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 23:110–120. https://doi.org/10.1016/j.mib.2014.11.014
Article CAS PubMed Google Scholar
Mewes HW, Albermann K, Bähr M et al (1997) Overview of the yeast genome. Nature 387(6632 Suppl):7–65. https://doi.org/10.1038/42755
Article PubMed Google Scholar
Adams MD, Celniker SE, Holt RA et al (2000) The genome sequence of Drosophila melanogaster. Science 287:2185–2195. https://doi.org/10.1126/science.287.5461.2185
Article PubMed Google Scholar
Myers EW, Sutton GG, Delcher AL et al (2000) A whole-genome assembly of Drosophila. Science 287(5461):2196–2204. https://doi.org/10.1126/science.287.5461.2196
Article CAS PubMed Google Scholar
The C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2046. https://doi.org/10.1126/science.282.5396.2012
Article Google Scholar
The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815. https://doi.org/10.1038/35048692
Article Google Scholar
International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800. https://doi.org/10.1038/nature03895
Article CAS Google Scholar
The Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562. https://doi.org/10.1038/nature01262
Article CAS Google Scholar
The Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921. https://doi.org/10.1038/35057062
Article Google Scholar
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351. https://doi.org/10.1038/nrg.2016.49
Article CAS PubMed Google Scholar
Li R, Fan W, Tian G et al (2010) The sequence and de novo assembly of the giant panda genome. Nature 463:311–317. https://doi.org/10.1038/nature08696
Article CAS PubMed Google Scholar
Schmutz J, McClean P, Mamidi S et al (2014) A reference genome for common bean and genome-wide analysis of dual domestications. Nat Genet 46:707–713. https://doi.org/10.1038/ng.3008
Article CAS PubMed PubMed Central Google Scholar
The Potato Genome Sequencing Consortium (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195. https://doi.org/10.1038/nature10158
Article CAS Google Scholar
Schnable PS, Ware D, Fulton RS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326(5956):1112–1115. https://doi.org/10.1126/science.1178534
Article CAS PubMed Google Scholar
Denoeud F, Carretero-Paulet L, Dereeper A et al (2014) The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345(6201):1181–1184. https://doi.org/10.1126/science.1255274
Article CAS PubMed Google Scholar
Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 13(5):278–289. https://doi.org/10.1016/j.gpb.2015.08.002
Article PubMed PubMed Central Google Scholar
Eid J, Fehr A, Gray J et al (2009) Real-time DNA sequencing from single polymerase molecules. Science 323(5910):133–138. https://doi.org/10.1126/science.1162986
Article CAS PubMed Google Scholar
Clarke J, Wu HC, Jayasinghe L et al (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol 4:265–270. https://doi.org/10.1038/nnano.2009.12
Article CAS PubMed Google Scholar
Jain M, Olsen HE, Paten B, Akeson M (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol 17(1):239. https://doi.org/10.1186/s13059-016-1103-0
Article CAS PubMed PubMed Central Google Scholar
Chen Y, Nie F, Xie SQ et al (2021) Efficient assembly of nanopore reads via highly accurate and intact error correction. Nat Commun 12:60. https://doi.org/10.1038/s41467-020-20236-7
Article CAS PubMed PubMed Central Google Scholar
Jain M, Koren S, Miga KH et al (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338–345. https://doi.org/10.1038/nbt.4060
Article CAS PubMed PubMed Central Google Scholar
Wenger AM, Peluso P, Rowell WJ et al (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37:1155–1162. https://doi.org/10.1038/s41587-019-0217-9
Article CAS PubMed PubMed Central Google Scholar
Marks RA, Hotaling S, Frandsen PB et al (2021) Representation and participation across 20 years of plant genome sequencing. Nat Plants 7:1571–1578. https://doi.org/10.1038/s41477-021-01031-8
Article CAS PubMed PubMed Central Google Scholar
Kitzman J, MacKenzie A, Adey A et al (2011) Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol 29:59–63. https://doi.org/10.1038/nbt.1740
Article CAS PubMed Google Scholar
Suk EK, McEwen GK, Duitama J et al (2011) A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res 21:1672–1685. https://doi.org/10.1101/gr.125047.111
Article CAS PubMed PubMed Central Google Scholar
Duitama J, McEwen GK, Huebsch T et al (2011) Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res 40(5):2041–2053. https://doi.org/10.1093/nar/gkr1042
Article CAS PubMed PubMed Central Google Scholar
Peters BA, Kermani BG, Sparks AB et al (2012) Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487(7406):190–195. https://doi.org/10.1038/nature11236
Article CAS PubMed PubMed Central Google Scholar
Peters BA, Liu J, Drmanac R (2014) Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genome” sequencing. Front Genet 5:466. https://doi.org/10.3389/fgene.2014.00466
Article CAS PubMed Google Scholar
Redin D, Frick T, Aghelpasand H et al (2019) High throughput barcoding method for genome-scale phasing. Sci Rep 9(1):18116. https://doi.org/10.1038/s41598-019-54446-x
Article CAS PubMed PubMed Central Google Scholar
Wang O, Chin R, Cheng X et al (2019) Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome Res 29(5):798–808. https://doi.org/10.1101/gr.245126.118
Article CAS PubMed PubMed Central Google Scholar
Lieberman-Aiden E, van Berkum NL, Williams L et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293. https://doi.org/10.1126/science.1181369
Article CAS PubMed PubMed Central Google Scholar
Bickhart DM, Rosen BD, Koren S et al (2017) Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet 49:643–650. https://doi.org/10.1038/ng.3802
Article CAS PubMed PubMed Central Google Scholar
Trujillo-Montenegro JH, Rodríguez Cubillos MJ, Loaiza CD et al (2021) Unraveling the genome of a high yielding Colombian sugarcane hybrid. Front Plant Sci 12:694859. https://doi.org/10.3389/fpls.2021.694859
Article PubMed PubMed Central Google Scholar
Browning S, Browning B (2011) Haplotype phasing: existing methods and new developments. Nat Rev Genet 12:703–714. https://doi.org/10.1038/nrg3054
Article CAS PubMed PubMed Central Google Scholar
Delaneau O, Zagury JF, Robinson MR et al (2019) Accurate, scalable and integrative haplotype estimation. Nat Commun 10:5436. https://doi.org/10.1038/s41467-019-13225-y
Article CAS PubMed PubMed Central Google Scholar
Ma L, Xiao Y, Huang H et al (2010) Direct determination of molecular haplotypes by chromosome microdissection. Nat Methods 7(4):299–301. https://doi.org/10.1038/nmeth.1443
Article CAS PubMed PubMed Central Google Scholar
Porubsky D, Garg S, Sanders AD et al (2017) Dense and accurate whole-chromosome haplotyping of individual genomes. Nat Commun 8(1):1293. https://doi.org/10.1038/s41467-017-01389-4
Article CAS PubMed PubMed Central Google Scholar
Campoy JA, Sun H, Goel M et al (2020) Gamete binning: chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes. Genome Biol 21(1):306. https://doi.org/10.1186/s13059-020-02235-5
Article CAS PubMed PubMed Central Google Scholar
Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327. https://doi.org/10.1016/j.ygeno.2010.03.001
Article CAS PubMed Google Scholar
Li Z, Chen Y, Mu D et al (2012) Comparison of the two major classes of assembly algorithms: overlap–layout–consensus and de-bruijn-graph. Brief Funct Genomics 11(1):25–37. https://doi.org/10.1093/bfgp/elr035
Article CAS PubMed Google Scholar
Pevzner PA, Tang H, Tesler G (2004) De novo repeat classification and fragment assembly. Genome Res 14:1786–1796. https://doi.org/10.1101/gr.2395204
Article CAS PubMed PubMed Central Google Scholar
Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. https://doi.org/10.1101/gr.074492.107
Article CAS PubMed PubMed Central Google Scholar
Li R, Zhu H, Ruan J et al (2009) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272. https://doi.org/10.1101/gr.097261.109
Article CAS PubMed Google Scholar
Butler J, MacCallum I, Kleber M et al (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18:810–820. https://doi.org/10.1101/gr.7337908
Article CAS PubMed PubMed Central Google Scholar
Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477. https://doi.org/10.1089/cmb.2012.0021
Article CAS PubMed PubMed Central Google Scholar
Koren S, Walenz BP, Berlin K et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736. https://doi.org/10.1101/gr.215087.116
Article CAS PubMed PubMed Central Google Scholar
Chin CS, Peluso P, Sedlazeck FJ et al (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13(12):1050–1054. https://doi.org/10.1038/nmeth.4035
Article CAS PubMed PubMed Central Google Scholar
Li H (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14):2103–2110. https://doi.org/10.1093/bioinformatics/btw152
Article CAS PubMed PubMed Central Google Scholar
Vaser R, Sović I, Nagarajan N, Šikić M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27:737–746. https://doi.org/10.1101/gr.214270.116
Article CAS PubMed PubMed Central Google Scholar
Kolmogorov M, Yuan J, Lin Y et al (2019) Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37:540–546. https://doi.org/10.1038/s41587-019-0072-8
Article CAS PubMed Google Scholar
Bansal V, Bafna V (2008) HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16):i153–i159. https://doi.org/10.1093/bioinformatics/btn298
Article PubMed Google Scholar
Geraci F (2010) A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem. Bioinformatics 26(18):2217–2225. https://doi.org/10.1093/bioinformatics/btq411
Article CAS PubMed PubMed Central Google Scholar
Edge P, Bafna V, Bansal V (2017) HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res 27:801–812. https://doi.org/10.1101/gr.213462.116
Article CAS PubMed PubMed Central Google Scholar
Nurk S, Koren S, Rhie A, et al (2021) The complete sequence of a human genome. https://www.biorxiv.org. https://doi.org/10.1101/2021.05.26.445798
Hon T, Mars K, Young G et al (2020) Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data 7:399. https://doi.org/10.1038/s41597-020-00743-4
Article CAS PubMed PubMed Central Google Scholar
Myers EW (2005) The fragment assembly string graph. Bioinformatics 21:ii79–ii85. https://doi.org/10.1093/bioinformatics/bti1114
Article CAS PubMed Google Scholar
Chaisson MJ, Tesler G (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinform 13:238. https://doi.org/10.1186/1471-2105-13-238
Article CAS Google Scholar
Nurk S, Walenz BP, Rhie A et al (2020) HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res 30(9):1291–1305. https://doi.org/10.1101/gr.263566.120
Article CAS PubMed PubMed Central Google Scholar
Guan D, McCarthy SA, Wood J et al (2020) Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36:2896–2898. https://doi.org/10.1093/bioinformatics/btaa025
Article CAS PubMed PubMed Central Google Scholar
Cheng H, Concepcion GT, Feng X et al (2021) Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18:170–175. https://doi.org/10.1038/s41592-020-01056-5
Article CAS PubMed PubMed Central Google Scholar
Myers G (1999) A fast bit-vector algorithm for approximate string matching based on dynamic programming. J ACM 46:395–415. https://doi.org/10.1145/316542.316550
Article Google Scholar
Koren S, Rhie A, Walenz B et al (2018) De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol 36:1174–1182. https://doi.org/10.1038/nbt.4277
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
Jorge Duitama

Authors

Jorge Duitama
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jorge Duitama .

Editor information

Editors and Affiliations

Advanced Genomics Technology Laboratory, Complete Genomics/MGI, San Jose, CA, USA
Brock A. Peters
Advanced Genomics Technology Laboratory, Complete Genomics/MGI, San Jose, CA, USA
Radoje Drmanac

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Duitama, J. (2023). Phased Genome Assemblies. In: Peters, B.A., Drmanac, R. (eds) Haplotyping. Methods in Molecular Biology, vol 2590. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2819-5_16

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2819-5_16
Published: 07 November 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2818-8
Online ISBN: 978-1-0716-2819-5
eBook Packages: Springer Protocols

Key words

Publish with us

Policies and ethics