Sample normalization is a crucial step in metabolomics for fair quantitative comparisons. It aims to minimize sample-to-sample variations due to differences in the total metabolite amount. When samples lack a specific metabolic quantity to accurately represent their total metabolite amounts, post-acquisition sample normalization becomes essential. Despite many proposed normalization algorithms, understanding remains limited of their differences, hindering the selection of the most suitable one for a given metabolomics study. This study bridges this knowledge gap by employing data simulation, experimental simulation, and real experiments to elucidate the differences in the mechanism and performance among common post-acquisition sample normalization methods. Using public datasets, we first demonstrated the dramatic discrepancies between the outcomes of different sample normalization methods. Then, we benchmarked six normalization methods: sum, median, probabilistic quotient normalization (PQN), maximal density fold change (MDFC), quantile, and class-specific quantile. Our results show that most normalization methods are biased when there is unbalanced data, a phenomenon where the percentages of up- and downregulated metabolites are unequal. Notably, unbalanced data can be sourced from the underlying biological differences, experimental perturbations, and metabolic interference. Beyond normalization algorithms and data structure, our study also emphasizes the importance of considering additional factors contributed by data quality, such as background noise, signal saturation, and missingness. Based on these findings, we propose an evidence-based normalization strategy to maximize sample normalization outcomes, providing a robust bioinformatic solution for advancing metabolomics research with a fair quantitative comparison.
© 2024 The Authors. Published by American Chemical Society.