Validation of chemometric models - a tutorial

Frank Westad; Federico Marini

doi:10.1016/j.aca.2015.06.056

Validation of chemometric models - a tutorial

Anal Chim Acta. 2015 Sep 17:893:14-24. doi: 10.1016/j.aca.2015.06.056. Epub 2015 Aug 10.

Authors

Frank Westad¹, Federico Marini²

Affiliations

¹ CAMO Software AS, Nedre Vollgate 8, N-0158 Oslo, Norway. Electronic address: [email protected].
² Dept. of Chemistry, University of Rome "La Sapienza", I-00185 Rome, Italy.

PMID: 26398418
DOI: 10.1016/j.aca.2015.06.056

Abstract

In this tutorial, we focus on validation both from a numerical and conceptual point of view. The often applied reported procedure in the literature of (repeatedly) dividing a dataset randomly into a calibration and test set must be applied with care. It can only be justified when there is no systematic stratification of the objects that will affect the validated estimates or figures of merits such as RMSE or R(2). The various levels of validation may, typically, be repeatability, reproducibility, and instrument and raw material variation. Examples of how one data set can be validated across this background information illustrate that it will affect the figures of merits as well as the dimensionality of the models. Even more important is the robustness of the models for predicting future samples. Another aspect that is brought to attention is validation in terms of the overall conclusions when observing a specific system. One example is to apply several methods for finding the significant variables and see if there is a consensus subset that also matches what is reported in the literature or based on the underlying chemistry.

Keywords: Chemometrics; Cross-validation; Resampling; Test set; Validation.

Publication types

Review