Scoring hidden Markov models

C Barrett; R Hughey; K Karplus

doi:10.1093/bioinformatics/13.2.191

Scoring hidden Markov models

Comput Appl Biosci. 1997 Apr;13(2):191-9. doi: 10.1093/bioinformatics/13.2.191.

Authors

C Barrett¹, R Hughey, K Karplus

Affiliation

¹ Department of Computer Engineering, University of California, Santa Cruz 95064, USA.

PMID: 9146967
DOI: 10.1093/bioinformatics/13.2.191

Abstract

Motivation: Statistical sequence comparison techniques, such as hidden Markov models and generalized profiles, calculate the probability that a sequence was generated by a given model. Log-odds scoring is a means of evaluating this probability by comparing it to a null hypothesis, usually a simpler statistical model intended to represent the universe of sequences as a whole, rather than the group of interest. Such scoring leads to two immediate questions: what should the null model be, and what threshold of log-odds score should be deemed a match to the model.

Results: This paper analyses these two issues experimentally. Within the context of the Sequence Alignment and Modeling software suite (SAM), we consider a variety of null models and suitable thresholds. Additionally, we consider HMMer's log-odds scoring and SAM's original Z-scoring method. Among the null model choices, a simple looping null model that emits characters according to the geometric mean of the character probabilities in the columns modeled by the hidden Markov model (HMM) performs well or best across all four discrimination experiments.

Publication types

Comparative Study
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Animals
Calcium-Binding Proteins / genetics
Evaluation Studies as Topic
Ferredoxins / genetics
Globins / genetics
Humans
Linear Models
Markov Chains*
Models, Statistical
Odds Ratio
Sequence Alignment / methods*
Sequence Alignment / statistics & numerical data
Sequence Homology, Amino Acid
Software

Substances

Calcium-Binding Proteins
Ferredoxins
Globins