GFS: fuzzy preprocessing for effective gene expression analysis

BMC Bioinformatics. 2016 Dec 23;17(Suppl 17):540. doi: 10.1186/s12859-016-1327-8.

Abstract

Background: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality of downstream data analysis. However, it is known that standard normalization methods perform poorly, specially in the presence of substantial batch effects and heterogeneity in gene expression data.

Results: We present Gene Fuzzy Score (GFS), a simple preprocessing technique, that is able to largely reduce obscuring variation while retaining useful biological information. Using four sets of publicly available datasets containing batch effects and heterogeneity, we compare GFS with three standard normalization techniques as well as raw gene expression. Each method is evaluated with respect to the quality, consistency, and biological coherence of its processed output. It is found that GFS outperforms other transformation techniques in all three aspects.

Conclusion: Our approach to preprocessing is a stronger alternative to popular normalization techniques. We demonstrate that it achieves the essential goal of preprocessing - it is effective at making expression values from multiple samples comparable, even when they are from separate platforms, in independent batches, or belong to a heterogeneous phenotype.

Keywords: Fuzzy scoring; Gene expression analysis; Normalization; Preprocessing.

MeSH terms

  • Algorithms*
  • Gene Expression Profiling / methods*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*