GenFam: A web application and database for gene family-based classification and functional enrichment analysis

Plant Direct. 2019 Dec 4;3(12):e00191. doi: 10.1002/pld3.191. eCollection 2019 Dec.

Abstract

Genome-scale studies using high-throughput sequencing (HTS) technologies generate substantial lists of differentially expressed genes under different experimental conditions. These gene lists need to be further mined to narrow down biologically relevant genes and associated functions in order to guide downstream functional genetic analyses. A popular approach is to determine statistically overrepresented genes in a user-defined list through enrichment analysis tools, which rely on functional annotations of genes based on Gene Ontology (GO) terms. Here, we propose a new computational approach, GenFam, which allows annotation, classification, and enrichment of genes based on their gene family, thus simplifying identification of candidate gene families and associated genes that may be relevant to the query. GenFam and its integrated database comprises of three hundred and eighty-four unique gene families and supports gene family analyses for sixty plant genomes. Four comparative case studies with plant species belonging to different clades and families were performed using GenFam which demonstrated its robustness and comprehensiveness over preexisting functional enrichment tools. To make it readily accessible for plant biologists, GenFam is available as a web-based application where users can input gene IDs and export enrichment results in both tabular and graphical formats. Users can also customize analysis parameters by choosing from the various statistical enrichment tests and multiple testing correction methods. Additionally, the web-based application, source code, and database are freely available to use and download. Website: http://mandadilab.webfactional.com/home/. Source code and database: http://mandadilab.webfactional.com/home/dload/.

Keywords: data integration; database; gene family enrichment analysis; gene ontologies; software; statistics.