Protein burying depth (BD) is a structural descriptor that is exploited not only to find whether a residue is exposed or buried, but also to determine how deep a residue is buried. The widely used solvent accessible surface area is mainly focusing on the study of protein surface residues, while protein BD can provide more detailed information about the arrangement of buried residues, which may be used to study protein deep level structure and the formation of protein folding nucleus. In this work, we analyse the relationship of protein BD and sequences, and describe it by nonlinear functions estimated by support vector machines. We examine the functions by crossvalidation tests and find strong correlation between residue BD and local sequence environment. By further taking account the size of the molecule where a residue is located, we find that the correlation coefficient between predicted and observed depths improves from 0.60 to 0.65. Moreover, nearly half of the deepest 10% residues in a protein sequence can be correctly predicted. Our study suggests that a residue's burying extent is able to be predicted, to some degree, by itself and its local neighbouring residues. The methods used to estimate the sequence-depth functions are expected to become more useful in the investigation of protein structures and folding mechanism.
(c) 2007 Wiley-Liss, Inc.