A statistical learning approach to the modeling of chromatographic retention of oligonucleotides incorporating sequence and secondary structure data

Nucleic Acids Res. 2007;35(12):4195-202. doi: 10.1093/nar/gkm338. Epub 2007 Jun 13.

Abstract

We propose a new model for predicting the retention time of oligonucleotides. The model is based on nu support vector regression using features derived from base sequence and predicted secondary structure of oligonucleotides. Because of the secondary structure information, the model is applicable even at relatively low temperatures where the secondary structure is not suppressed by thermal denaturing. This makes the prediction of oligonucleotide retention time for arbitrary temperatures possible, provided that the target temperature lies within the temperature range of the training data. We describe different possibilities of feature calculation from base sequence and secondary structure, present the results and compare our model to existing models.

Publication types

  • Validation Study

MeSH terms

  • Artificial Intelligence*
  • Base Sequence
  • Chromatography, High Pressure Liquid / methods*
  • Models, Statistical*
  • Nucleic Acid Conformation
  • Oligonucleotides / chemistry
  • Oligonucleotides / isolation & purification*
  • Reproducibility of Results
  • Sequence Homology, Nucleic Acid
  • Temperature

Substances

  • Oligonucleotides