Gene Expression Variation Analysis (GEVA): A new R package to evaluate variations in differential expression in multiple biological conditions

J Biomed Inform. 2022 May:129:104053. doi: 10.1016/j.jbi.2022.104053. Epub 2022 Mar 19.

Abstract

Nowadays, there are thousands of publicly available gene expression datasets which can be analyzed in silico using specialized software or the R programming language. However, transcriptomic studies consider experimental conditions individually, giving one independent result per comparison. Here we describe the Gene Expression Variation Analysis (GEVA), a new R package that accepts multiple differential expression analysis results as input and performs multiple statistical steps, such as weighted summarization, quantiles partition, and clustering to find genes whose differential expression varied less across all experiments. The experimental conditions can be divided into groups, which we call factors, where additional ANOVA (Fisher's and Levene's) tests are applied to identify differentially expressed genes in response either specifically to one factor or dependently to all factors. The final results present three possible classifications for relevant genes: similar, factor-dependent, and factor-specific. To validate these results subsequently to the GEVA's development, 28 transcriptomic datasets were tested using 11 different combinations of the available parameters, including several clustering, quantiles, and summarization methods. The final classifications were validated using knockout studies from different organisms, as they lack genes whose differential expression is expected. Although some of the final classifications differed depending on the parameters' choice, the test results from the default parameters corroborated with the published experimental studies regarding the selected datasets. Thus, we conclude that GEVA can effectively find similarities between groups of biological conditions, and therefore could be a robust alternative for multiple comparison analyses.

Keywords: Clustering; Differentially expressed genes; Gene expression; R Package; Statistical analysis; Variation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Gene Expression Profiling* / methods
  • Programming Languages
  • Software*
  • Transcriptome