Base-calling of automated sequencer traces using phred. II. Error probabilities

B Ewing; P Green

Base-calling of automated sequencer traces using phred. II. Error probabilities

Genome Res. 1998 Mar;8(3):186-94.

Authors

B Ewing¹, P Green

Affiliation

¹ Department of Molecular Biotechnology, University of Washington, Seattle, Washington 98195-7730, USA.

PMID: 9521922

Abstract

Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing program consed.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Base Sequence
Chimera
Cloning, Molecular / methods
Data Interpretation, Statistical
Discriminant Analysis
Genetic Vectors
Human Genome Project
Humans
Probability
Quality Control
Reproducibility of Results
Sequence Analysis, DNA / methods*
Sequence Analysis, DNA / standards
Sequence Analysis, DNA / statistics & numerical data*
Software*