AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding

Lingyan Zheng; Shuiyang Shi; Mingkun Lu; Pan Fang; Ziqi Pan; Hongning Zhang; Zhimeng Zhou; Hanyu Zhang; Minjie Mou; Shijie Huang; Lin Tao; Weiqi Xia; Honglin Li; Zhenyu Zeng; Shun Zhang; Yuzong Chen; Zhaorong Li; Feng Zhu

doi:10.1186/s13059-024-03166-1

AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding

Genome Biol. 2024 Feb 1;25(1):41. doi: 10.1186/s13059-024-03166-1.

Authors

Lingyan Zheng^#^{1

2}, Shuiyang Shi^#¹, Mingkun Lu^#¹, Pan Fang^{2

3}, Ziqi Pan¹, Hongning Zhang¹, Zhimeng Zhou¹, Hanyu Zhang¹, Minjie Mou¹, Shijie Huang¹, Lin Tao⁴, Weiqi Xia⁵, Honglin Li⁶, Zhenyu Zeng^{2

3}, Shun Zhang^{2

3}, Yuzong Chen⁷, Zhaorong Li^{8

9}, Feng Zhu^{10

11

12}

Affiliations

¹ College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China.
² Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China.
³ Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
⁴ Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
⁵ Pharmaceutical Department, Zhejiang Provincial People's Hospital, Hangzhou, 310014, China.
⁶ School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
⁷ State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China.
⁸ Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China. [email protected].
⁹ Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China. [email protected].
¹⁰ College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China. [email protected].
¹¹ Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China. [email protected].
¹² Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China. [email protected].

^# Contributed equally.

Abstract

Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path protein encoding using pre-training, and function annotation by long short-term memory-based decoding. A variety of case studies based on different benchmarks were conducted, which confirmed the superior performance of AnnoPRO among available methods. Source code and models have been made freely available at: https://github.com/idrblab/AnnoPRO and https://zenodo.org/records/10012272.

Keywords: LSTM; Long-tail problem; Pre-training; Protein function annotation; Protein representation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods
Deep Learning*
Humans
Molecular Sequence Annotation
Proteins / metabolism
Software

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding