Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences

Yaoxin Wang; Yingjie Xu; Zhenyu Yang; Xiaoqing Liu; Qi Dai

doi:10.1155/2021/5529389

Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences

Comput Math Methods Med. 2021 May 7:2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021.

Authors

Yaoxin Wang¹, Yingjie Xu², Zhenyu Yang¹, Xiaoqing Liu³, Qi Dai¹

Affiliations

¹ College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.
² Qixin School, Zhejiang Sci-Tech University, Hangzhou 310018, China.
³ College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China.

Abstract

Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.

MeSH terms

Algorithms
Amino Acid Sequence
Amino Acids / chemistry
Computational Biology
Databases, Protein / statistics & numerical data
Hydrophobic and Hydrophilic Interactions
Protein Conformation
Protein Structural Elements
Protein Structure, Secondary
Proteins / chemistry*
Proteins / classification*
Sequence Homology, Amino Acid
Support Vector Machine

Substances

Amino Acids
Proteins