iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences

Zhen Chen; Pei Zhao; Fuyi Li; André Leier; Tatiana T Marquez-Lago; Yanan Wang; Geoffrey I Webb; A Ian Smith; Roger J Daly; Kuo-Chen Chou; Jiangning Song

doi:10.1093/bioinformatics/bty140

iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences

Bioinformatics. 2018 Jul 15;34(14):2499-2502. doi: 10.1093/bioinformatics/bty140.

Authors

Zhen Chen¹, Pei Zhao², Fuyi Li³, André Leier^{4

5}, Tatiana T Marquez-Lago^{4

5}, Yanan Wang⁶, Geoffrey I Webb⁷, A Ian Smith³, Roger J Daly³, Kuo-Chen Chou^{8

9}, Jiangning Song^{3

7}

Affiliations

¹ School of Basic Medical Science, Qingdao University, 38 Dengzhou Road, Qingdao, China.
² State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang, China.
³ Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, Australia.
⁴ Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA.
⁵ Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA.
⁶ Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.
⁷ Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia.
⁸ Gordon Life Science Institute, Boston, MA, USA.
⁹ Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.

Abstract

Summary: Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection and dimensionality reduction algorithms, greatly facilitating training, analysis and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit.

Availability and implementation: http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Machine Learning
Molecular Sequence Annotation*
Peptides / chemistry
Peptides / metabolism*
Peptides / physiology
Protein Conformation
Proteins / chemistry
Proteins / metabolism*
Proteins / physiology
Sequence Analysis, Protein / methods*
Software*

Substances

Peptides
Proteins

Grants and funding

R01 AI111965/AI/NIAID NIH HHS/United States