An improved multiple linear regression method has been proposed to predict the content of alpha-helix and beta-strand of a globular protein based on its primary sequence and structural class. The amino acid composition and the auto-correlation functions derived from the hydrophobicity profile of the primary sequence have been taken into account. However, only the compositions of a part of the amino acids and a part of the auto-correlation functions are selected as the regression terms, which lead to the least prediction error. The resubstitution test shows that the average absolute errors are 0.052 and 0.047 with the standard deviations 0.050 and 0.047 for the prediction of helix/strand content, respectively. A rigorous cross-validation test, the jackknife test shows that the average absolute errors are 0.058 and 0.053 with the standard deviations 0.057 and 0.053 for the prediction of helix/strand content, respectively. Both tests indicate the self-consistency and the extrapolating effectiveness of the new method. The high prediction accuracy means that the method is suitable for practical applications.
Copyright 2001 Academic Press.