Thanks to visit codestin.com
Credit goes to link.springer.com

Skip to main content

Phased Genome Assemblies

  • Protocol
  • First Online:
Haplotyping

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2590))

Abstract

The ultimate goal of de novo assembly of reads sequenced from a diploid individual is the separate reconstruction of the sequences corresponding to the two copies of each chromosome. Unfortunately, the allele linkage information needed to perform phased genome assemblies has been difficult to generate. Hence, most current genome assemblies are a haploid mixture of the two underlying chromosome copies present in the sequenced individual. Sequencing technologies providing long (20 kb) and accurate reads are the basis to generate phased genome assemblies. This chapter provides a brief overview of the main milestones in traditional genome assembly, focusing on the bioinformatic techniques developed to generate haplotype information from different specialized protocols. Using these techniques as a knowledge background, the chapter reviews the current algorithms to generate phased assemblies from long reads with low error rates. Current techniques perform haplotype-aware error correction steps to increase the quality of the raw reads. In addition, variations on the traditional overlap-layout-consensus (OLC) graph have been developed in an effort to eliminate edges between reads sequenced from different chromosome copies. This allows for large presence–absence variants between the chromosome copies to be taken into account. The development of these algorithms, along with the improved sequencing technologies has been crucial to finish chromosome-level assemblies of complex genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from £29.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Protocol
GBP 34.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 95.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 119.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
GBP 179.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Koren S, Phillippy AM (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 23:110–120. https://doi.org/10.1016/j.mib.2014.11.014

    Article  CAS  PubMed  Google Scholar 

  2. Mewes HW, Albermann K, Bähr M et al (1997) Overview of the yeast genome. Nature 387(6632 Suppl):7–65. https://doi.org/10.1038/42755

    Article  PubMed  Google Scholar 

  3. Adams MD, Celniker SE, Holt RA et al (2000) The genome sequence of Drosophila melanogaster. Science 287:2185–2195. https://doi.org/10.1126/science.287.5461.2185

    Article  PubMed  Google Scholar 

  4. Myers EW, Sutton GG, Delcher AL et al (2000) A whole-genome assembly of Drosophila. Science 287(5461):2196–2204. https://doi.org/10.1126/science.287.5461.2196

    Article  CAS  PubMed  Google Scholar 

  5. The C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2046. https://doi.org/10.1126/science.282.5396.2012

    Article  Google Scholar 

  6. The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815. https://doi.org/10.1038/35048692

    Article  Google Scholar 

  7. International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800. https://doi.org/10.1038/nature03895

    Article  CAS  Google Scholar 

  8. The Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562. https://doi.org/10.1038/nature01262

    Article  CAS  Google Scholar 

  9. The Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921. https://doi.org/10.1038/35057062

    Article  Google Scholar 

  10. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351. https://doi.org/10.1038/nrg.2016.49

    Article  CAS  PubMed  Google Scholar 

  11. Li R, Fan W, Tian G et al (2010) The sequence and de novo assembly of the giant panda genome. Nature 463:311–317. https://doi.org/10.1038/nature08696

    Article  CAS  PubMed  Google Scholar 

  12. Schmutz J, McClean P, Mamidi S et al (2014) A reference genome for common bean and genome-wide analysis of dual domestications. Nat Genet 46:707–713. https://doi.org/10.1038/ng.3008

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. The Potato Genome Sequencing Consortium (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195. https://doi.org/10.1038/nature10158

    Article  CAS  Google Scholar 

  14. Schnable PS, Ware D, Fulton RS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326(5956):1112–1115. https://doi.org/10.1126/science.1178534

    Article  CAS  PubMed  Google Scholar 

  15. Denoeud F, Carretero-Paulet L, Dereeper A et al (2014) The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345(6201):1181–1184. https://doi.org/10.1126/science.1255274

    Article  CAS  PubMed  Google Scholar 

  16. Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 13(5):278–289. https://doi.org/10.1016/j.gpb.2015.08.002

    Article  PubMed  PubMed Central  Google Scholar 

  17. Eid J, Fehr A, Gray J et al (2009) Real-time DNA sequencing from single polymerase molecules. Science 323(5910):133–138. https://doi.org/10.1126/science.1162986

    Article  CAS  PubMed  Google Scholar 

  18. Clarke J, Wu HC, Jayasinghe L et al (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol 4:265–270. https://doi.org/10.1038/nnano.2009.12

    Article  CAS  PubMed  Google Scholar 

  19. Jain M, Olsen HE, Paten B, Akeson M (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol 17(1):239. https://doi.org/10.1186/s13059-016-1103-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Chen Y, Nie F, Xie SQ et al (2021) Efficient assembly of nanopore reads via highly accurate and intact error correction. Nat Commun 12:60. https://doi.org/10.1038/s41467-020-20236-7

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Jain M, Koren S, Miga KH et al (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338–345. https://doi.org/10.1038/nbt.4060

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Wenger AM, Peluso P, Rowell WJ et al (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37:1155–1162. https://doi.org/10.1038/s41587-019-0217-9

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Marks RA, Hotaling S, Frandsen PB et al (2021) Representation and participation across 20 years of plant genome sequencing. Nat Plants 7:1571–1578. https://doi.org/10.1038/s41477-021-01031-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kitzman J, MacKenzie A, Adey A et al (2011) Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol 29:59–63. https://doi.org/10.1038/nbt.1740

    Article  CAS  PubMed  Google Scholar 

  25. Suk EK, McEwen GK, Duitama J et al (2011) A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res 21:1672–1685. https://doi.org/10.1101/gr.125047.111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Duitama J, McEwen GK, Huebsch T et al (2011) Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of single individual haplotyping techniques. Nucleic Acids Res 40(5):2041–2053. https://doi.org/10.1093/nar/gkr1042

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Peters BA, Kermani BG, Sparks AB et al (2012) Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487(7406):190–195. https://doi.org/10.1038/nature11236

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Peters BA, Liu J, Drmanac R (2014) Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genome” sequencing. Front Genet 5:466. https://doi.org/10.3389/fgene.2014.00466

    Article  CAS  PubMed  Google Scholar 

  29. Redin D, Frick T, Aghelpasand H et al (2019) High throughput barcoding method for genome-scale phasing. Sci Rep 9(1):18116. https://doi.org/10.1038/s41598-019-54446-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Wang O, Chin R, Cheng X et al (2019) Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome Res 29(5):798–808. https://doi.org/10.1101/gr.245126.118

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Lieberman-Aiden E, van Berkum NL, Williams L et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289–293. https://doi.org/10.1126/science.1181369

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Bickhart DM, Rosen BD, Koren S et al (2017) Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet 49:643–650. https://doi.org/10.1038/ng.3802

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Trujillo-Montenegro JH, Rodríguez Cubillos MJ, Loaiza CD et al (2021) Unraveling the genome of a high yielding Colombian sugarcane hybrid. Front Plant Sci 12:694859. https://doi.org/10.3389/fpls.2021.694859

    Article  PubMed  PubMed Central  Google Scholar 

  34. Browning S, Browning B (2011) Haplotype phasing: existing methods and new developments. Nat Rev Genet 12:703–714. https://doi.org/10.1038/nrg3054

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Delaneau O, Zagury JF, Robinson MR et al (2019) Accurate, scalable and integrative haplotype estimation. Nat Commun 10:5436. https://doi.org/10.1038/s41467-019-13225-y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ma L, Xiao Y, Huang H et al (2010) Direct determination of molecular haplotypes by chromosome microdissection. Nat Methods 7(4):299–301. https://doi.org/10.1038/nmeth.1443

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Porubsky D, Garg S, Sanders AD et al (2017) Dense and accurate whole-chromosome haplotyping of individual genomes. Nat Commun 8(1):1293. https://doi.org/10.1038/s41467-017-01389-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Campoy JA, Sun H, Goel M et al (2020) Gamete binning: chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes. Genome Biol 21(1):306. https://doi.org/10.1186/s13059-020-02235-5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327. https://doi.org/10.1016/j.ygeno.2010.03.001

    Article  CAS  PubMed  Google Scholar 

  40. Li Z, Chen Y, Mu D et al (2012) Comparison of the two major classes of assembly algorithms: overlap–layout–consensus and de-bruijn-graph. Brief Funct Genomics 11(1):25–37. https://doi.org/10.1093/bfgp/elr035

    Article  CAS  PubMed  Google Scholar 

  41. Pevzner PA, Tang H, Tesler G (2004) De novo repeat classification and fragment assembly. Genome Res 14:1786–1796. https://doi.org/10.1101/gr.2395204

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–829. https://doi.org/10.1101/gr.074492.107

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Li R, Zhu H, Ruan J et al (2009) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20:265–272. https://doi.org/10.1101/gr.097261.109

    Article  CAS  PubMed  Google Scholar 

  44. Butler J, MacCallum I, Kleber M et al (2008) ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res 18:810–820. https://doi.org/10.1101/gr.7337908

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19(5):455–477. https://doi.org/10.1089/cmb.2012.0021

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Koren S, Walenz BP, Berlin K et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736. https://doi.org/10.1101/gr.215087.116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Chin CS, Peluso P, Sedlazeck FJ et al (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13(12):1050–1054. https://doi.org/10.1038/nmeth.4035

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Li H (2016) Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14):2103–2110. https://doi.org/10.1093/bioinformatics/btw152

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Vaser R, Sović I, Nagarajan N, Šikić M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27:737–746. https://doi.org/10.1101/gr.214270.116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Kolmogorov M, Yuan J, Lin Y et al (2019) Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37:540–546. https://doi.org/10.1038/s41587-019-0072-8

    Article  CAS  PubMed  Google Scholar 

  51. Bansal V, Bafna V (2008) HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16):i153–i159. https://doi.org/10.1093/bioinformatics/btn298

    Article  PubMed  Google Scholar 

  52. Geraci F (2010) A comparison of several algorithms for the single individual SNP haplotyping reconstruction problem. Bioinformatics 26(18):2217–2225. https://doi.org/10.1093/bioinformatics/btq411

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Edge P, Bafna V, Bansal V (2017) HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res 27:801–812. https://doi.org/10.1101/gr.213462.116

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Nurk S, Koren S, Rhie A, et al (2021) The complete sequence of a human genome. https://www.biorxiv.org. https://doi.org/10.1101/2021.05.26.445798

  55. Hon T, Mars K, Young G et al (2020) Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data 7:399. https://doi.org/10.1038/s41597-020-00743-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Myers EW (2005) The fragment assembly string graph. Bioinformatics 21:ii79–ii85. https://doi.org/10.1093/bioinformatics/bti1114

    Article  CAS  PubMed  Google Scholar 

  57. Chaisson MJ, Tesler G (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinform 13:238. https://doi.org/10.1186/1471-2105-13-238

    Article  CAS  Google Scholar 

  58. Nurk S, Walenz BP, Rhie A et al (2020) HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res 30(9):1291–1305. https://doi.org/10.1101/gr.263566.120

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Guan D, McCarthy SA, Wood J et al (2020) Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36:2896–2898. https://doi.org/10.1093/bioinformatics/btaa025

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Cheng H, Concepcion GT, Feng X et al (2021) Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18:170–175. https://doi.org/10.1038/s41592-020-01056-5

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Myers G (1999) A fast bit-vector algorithm for approximate string matching based on dynamic programming. J ACM 46:395–415. https://doi.org/10.1145/316542.316550

    Article  Google Scholar 

  62. Koren S, Rhie A, Walenz B et al (2018) De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol 36:1174–1182. https://doi.org/10.1038/nbt.4277

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Duitama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Duitama, J. (2023). Phased Genome Assemblies. In: Peters, B.A., Drmanac, R. (eds) Haplotyping. Methods in Molecular Biology, vol 2590. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2819-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2819-5_16

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2818-8

  • Online ISBN: 978-1-0716-2819-5

  • eBook Packages: Springer Protocols

Key words

Publish with us

Policies and ethics