Genome mining of metabolic gene clusters in the Rubiaceae family

Comput Struct Biotechnol J. 2023 Nov 20:23:22-33. doi: 10.1016/j.csbj.2023.11.034. eCollection 2024 Dec.

Abstract

The Rubiaceae plant family, comprising 3 subfamilies and over 13,000 species, is known for producing significant bioactive compounds such as caffeine and monoterpene indole alkaloids. Despite an increase in available genomes from the Rubiaceae family over the past decade, a systematic analysis of the metabolic gene clusters (MGCs) encoded by these genomes has been lacking. In this study, we aim to identify and analyze metabolic gene clusters within complete Rubiaceae genomes through a comparative analysis of eight species. Applying two bioinformatics pipelines, we identified 2372 candidate MGCs, organized into 549 gene cluster families (GCFs). To enhance the reliability of these findings, we developed coexpression networks and conducted orthology analyses. Using genomic data from Solanum lycopersicum (Solanaceae) for comparative purposes, we provided a detailed view of predicted metabolic enzymes, pathways, and coexpression networks. We bring some examples of MGCs and GCFs involved in biological pathways of terpenes, saccharides and alkaloids. Such insights lay the groundwork for discovering new compounds and associated MGCs within the Rubiaceae family, with potential implications in developing more robust crop species and expanding the understanding of plant metabolism. This large-scale exploration also provides a new perspective on the evolution and structure-function relationship of these clusters, offering opportunities for the highly efficient utilization of these unique metabolites. The outcome of this study contributes to a broader comprehension of the biosynthetic pathways, elucidating multiple aspects of specialized metabolism and offering innovative avenues for biotechnological applications.

Keywords: Comparative genomics; Metabolic gene cluster; Rubiaceae.