Closing the Knowledge Gap of Post-Acquisition Sample Normalization in Untargeted Metabolomics

Brian Low; Yukai Wang; Tingting Zhao; Huaxu Yu; Tao Huan

doi:10.1021/acsmeasuresciau.4c00047

Closing the Knowledge Gap of Post-Acquisition Sample Normalization in Untargeted Metabolomics

ACS Meas Sci Au. 2024 Oct 14;4(6):702-711. doi: 10.1021/acsmeasuresciau.4c00047. eCollection 2024 Dec 18.

Authors

Brian Low¹, Yukai Wang¹, Tingting Zhao¹, Huaxu Yu¹, Tao Huan¹

Affiliation

¹ Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, BC V6T 1Z1, Canada.

Abstract

Sample normalization is a crucial step in metabolomics for fair quantitative comparisons. It aims to minimize sample-to-sample variations due to differences in the total metabolite amount. When samples lack a specific metabolic quantity to accurately represent their total metabolite amounts, post-acquisition sample normalization becomes essential. Despite many proposed normalization algorithms, understanding remains limited of their differences, hindering the selection of the most suitable one for a given metabolomics study. This study bridges this knowledge gap by employing data simulation, experimental simulation, and real experiments to elucidate the differences in the mechanism and performance among common post-acquisition sample normalization methods. Using public datasets, we first demonstrated the dramatic discrepancies between the outcomes of different sample normalization methods. Then, we benchmarked six normalization methods: sum, median, probabilistic quotient normalization (PQN), maximal density fold change (MDFC), quantile, and class-specific quantile. Our results show that most normalization methods are biased when there is unbalanced data, a phenomenon where the percentages of up- and downregulated metabolites are unequal. Notably, unbalanced data can be sourced from the underlying biological differences, experimental perturbations, and metabolic interference. Beyond normalization algorithms and data structure, our study also emphasizes the importance of considering additional factors contributed by data quality, such as background noise, signal saturation, and missingness. Based on these findings, we propose an evidence-based normalization strategy to maximize sample normalization outcomes, providing a robust bioinformatic solution for advancing metabolomics research with a fair quantitative comparison.