Metabolomics provides new insights into disease pathogenesis and biomarker discovery. Samples from large-scale untargeted metabolomics studies are typically analyzed using a liquid chromatography-mass spectrometry platform in several batches. Batch effects that are caused by non-biological systematic biases are unavoidable in large-scale metabolomics studies, even with properly designed experiments. The statistical analysis of large-scale metabolomics data without managing batch effects will yield misleading results. In this study, we propose a novel algorithm, called WaveICA, which is based on the wavelet transform method with independent component analysis, as the threshold processing method to capture and remove batch effects for large-scale metabolomics data. The WaveICA method uses the time trend of samples over the injection order, decomposes the original data into multi-scale data with different features, extracts and removes the batch effect information in multi-scale data, and obtains clean data. The WaveICA method was tested on real metabolomics data. After applying the WaveICA method, scattered quality control samples (QCS) and subject samples in a PCA score plot of the original data were closely clustered, respectively. The average Pearson correlation coefficients for all peaks of the QCS increased from 0.872 to 0.972. Additionally, WaveICA significantly improved the classification accuracy for metabolomics data. The method was compared with three representative methods, and outperformed all of them. To conclude, WaveICA can efficiently remove batch effects while revealing more biological information. This method can be used in large-scale untargeted metabolomics studies to preprocess raw metabolomics data.
Keywords: Batch effect; Data normalization; Independent component analysis; Metabolomics; Wavelet transform.
Copyright © 2019 Elsevier B.V. All rights reserved.