The Impact of Normalization Methods on RNA-Seq Data Analysis

Biomed Res Int. 2015:2015:621690. doi: 10.1155/2015/621690. Epub 2015 Jun 15.

Abstract

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Gene Expression Profiling / methods
  • Gene Expression Profiling / standards
  • High-Throughput Nucleotide Sequencing / methods
  • High-Throughput Nucleotide Sequencing / standards
  • Humans
  • Models, Genetic
  • Normal Distribution
  • Sensitivity and Specificity
  • Sequence Analysis, RNA / methods*
  • Sequence Analysis, RNA / standards*