Improved prediction for N-termini of alpha-helices using empirical information

Claire L Wilson; Paul E Boardman; Andrew J Doig; Simon J Hubbard

doi:10.1002/prot.20218

Improved prediction for N-termini of alpha-helices using empirical information

Proteins. 2004 Nov 1;57(2):322-30. doi: 10.1002/prot.20218.

Authors

Claire L Wilson¹, Paul E Boardman, Andrew J Doig, Simon J Hubbard

Affiliation

¹ Department of Biomolecular Sciences, University of Manchester Institute of Science and Technology, Manchester, United Kingdom.

PMID: 15340919
DOI: 10.1002/prot.20218

Abstract

The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.

MeSH terms

Databases, Protein
Empirical Research*
Peptide Fragments / chemistry
Peptides / chemistry*
Predictive Value of Tests
Protein Structure, Tertiary
Research Design
Sequence Alignment / methods
Software
Software Validation

Substances

Peptide Fragments
Peptides