Thanks to visit codestin.com
Credit goes to genome.cshlp.org

Mapping short DNA sequencing reads and calling variants using mapping quality scores

  1. Heng Li1,
  2. Jue Ruan2, and
  3. Richard Durbin1,3
  1. 1 The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom;
  2. 2 Beijing Genomics Institute, Chinese Academy of Science, Beijing 100029, China

Abstract

New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.

Footnotes

  • 3 Corresponding author.

    3 E-mail rd{at}sanger.ac.uk; fax 44-1223496802.

  • [Supplemental material is available online at www.genome.org. Short-read sequences have been deposited in the European Read Archive (ERA) under accession no. ERA000012 (ftp://ftp.era.ebi.ac.uk/ERA000012/).]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.078212.108.

    • Received March 7, 2008.
    • Accepted August 13, 2008.
  • Freely available online through the Genome Research Open Access option.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server