DeClUt: Decluttering differentially expressed genes through clustering of their expression profiles

Comput Methods Programs Biomed. 2024 Sep:254:108258. doi: 10.1016/j.cmpb.2024.108258. Epub 2024 May 31.

Abstract

Background and objective: differential expression analysis is one of the most popular activities in transcriptomic studies based on next-generation sequencing technologies. In fact, differentially expressed genes (DEGs) between two conditions represent ideal prognostic and diagnostic candidate biomarkers for many pathologies. As a result, several algorithms, such as DESeq2 and edgeR, have been developed to identify DEGs. Despite their widespread use, there is no consensus on which model performs best for different types of data, and many existing methods suffer from high False Discovery Rates (FDR).

Methods: we present a new algorithm, DeClUt, based on the intuition that the expression profile of differentially expressed genes should form two reasonably compact and well-separated clusters. This, in turn, implies that the bipartition induced by the two conditions being compared should overlap with the clustering. The clustering algorithm underlying DeClUt was designed to be robust to outliers typical of RNA-seq data. In particular, we used the average silhouette function to enforce membership assignment of samples to the most appropriate condition.

Results: DeClUt was tested on real RNA-seq datasets and benchmarked against four of the most widely used methods (edgeR, DESeq2, NOISeq, and SAMseq). Experiments showed a higher self-consistency of results than the competitors as well as a significantly lower False Positive Rate (FPR). Moreover, tested on a real prostate cancer RNA-seq dataset, DeClUt has highlighted 8 DE genes, linked to neoplastic process according to DisGeNET database, that none of the other methods had identified.

Conclusions: our work presents a novel algorithm that builds upon basic concepts of data clustering and exhibits greater consistency and significantly lower False Positive Rate than state-of-the-art methods. Additionally, DeClUt is able to highlight relevant differentially expressed genes not otherwise identified by other tools contributing to improve efficacy of differential expression analyses in various biological applications.

Keywords: Biomarkers; Clustering; Differentially expressed genes; RNA-seq; k-center.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computational Biology / methods
  • Gene Expression Profiling* / methods
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Male
  • Prostatic Neoplasms / genetics
  • Software
  • Transcriptome