Abstract
Analyzing microbial samples remains computationally challenging due to their diversity and complexity. The lack of robust de novo protein function prediction methods exacerbates the difficulty in deriving functional insights from these samples. Traditional prediction methods, dependent on homology and sequence similarity, often fail to predict functions for novel proteins and proteins without known homologs. Moreover, most of these methods have been trained on largely eukaryotic data, and have not been evaluated on or applied to microbial datasets. This research introduces DeepGOMeta, a deep learning model designed for protein function prediction as Gene Ontology (GO) terms, trained on a dataset relevant to microbes. The model is applied to diverse microbial datasets to demonstrate its use for gaining biological insights. Data and code are available at https://github.com/bio-ontology-research-group/deepgometa.
Keywords:
Metagenomes; Microbial samples; Protein function.
© 2024. The Author(s).
MeSH terms
-
Bacteria / genetics
-
Bacteria / metabolism
-
Bacterial Proteins / genetics
-
Bacterial Proteins / metabolism
-
Computational Biology / methods
-
Deep Learning*
-
Gene Ontology
-
Microbiota*
-
Proteins / metabolism
-
Software
Substances
-
Proteins
-
Bacterial Proteins
Grants and funding
-
URF/1/4675-01-01, URF/1/4355-01- 01, URF/1/4697-01-01, URF/1/5041-01- 01, REI/1/5334-01-01, FCC/1/1976-46-01, and FCC/1/1976- 34-01/King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR)
-
URF/1/4675-01-01, URF/1/4355-01- 01, URF/1/4697-01-01, URF/1/5041-01- 01, REI/1/5334-01-01, FCC/1/1976-46-01, and FCC/1/1976- 34-01/King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR)
-
URF/1/4675-01-01, URF/1/4355-01- 01, URF/1/4697-01-01, URF/1/5041-01- 01, REI/1/5334-01-01, FCC/1/1976-46-01, and FCC/1/1976- 34-01/King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR)
-
URF/1/4675-01-01, URF/1/4355-01- 01, URF/1/4697-01-01, URF/1/5041-01- 01, REI/1/5334-01-01, FCC/1/1976-46-01, and FCC/1/1976- 34-01/King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR)