Application of high-throughput technologies in metabolomics studies increases the quantity of data obtained, which in turn imposes several problems during data analysis. Correctly and clearly addressed biological question and comprehensive knowledge about data structure and properties are definitely necessary to select proper chemometric tools. However, there is a broad range of chemometric tools available for use with metabolomics data, which makes this choice challenging. Precisely performed data treatment enables valuable extraction of information and its proper interpretation. The effect of an error made at an early stage will be enhanced throughout the later stages, which in combination with other errors made at each step can accumulate and significantly affect the data interpretation. Moreover, adequate application of these tools may help not only to detect, but sometimes also to correct, biological, analytical, or methodological errors, which may affect truthfulness of obtained results. This report presents steps and tools used for LC-MS based metabolomics data extraction, reduction, and visualization. Following such steps as data reprocessing, data pretreatment, data treatment, and data revision, authors want to show how to extract valuable information and how to avoid misinterpretation of results obtained. The purpose of this work was to emphasize problematic characteristics of metabolomics data and the necessity for their attentive and precise treatment. The dataset used to illustrate metabolomics data properties and to illustrate major data treatment challenges was obtained utilizing an animal model of control and diabetic rats, both with and without rosemary treatment. Urine samples were fingerprinted employing LC-QTOF-MS.
Keywords: Chemometrics; Data re-processing; Data treatment; Outliers detection; Validation.
© 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.