We describe a computational method for the prediction of RNA secondary structure that uses a combination of free energy and comparative sequence analysis strategies. Using a homology-based sequence alignment as a starting point, all favorable pairings with respect to the Turner energy function are identified. Each potentially paired region within a multiple sequence alignment is scored using a function that combines both predicted free energy and sequence covariation with optimized weightings. High scoring regions are ranked and sequentially incorporated to define a growing secondary structure. Using a single set of optimized parameters, it is possible to accurately predict the foldings of several test RNAs defined previously by extensive phylogenetic and experimental data (including tRNA, 5 S rRNA, SRP RNA, tmRNA, and 16 S rRNA). The algorithm correctly predicts approximately 80% of the secondary structure. A range of parameters have been tested to define the minimal sequence information content required to accurately predict secondary structure and to assess the importance of individual terms in the prediction scheme. This analysis indicates that prediction accuracy most strongly depends upon covariational information and only weakly on the energetic terms. However, relatively few sequences prove sufficient to provide the covariational information required for an accurate prediction. Secondary structures can be accurately defined by alignments with as few as five sequences and predictions improve only moderately with the inclusion of additional sequences.
Copyright 1999 Academic Press.