In this tutorial, we focus on validation both from a numerical and conceptual point of view. The often applied reported procedure in the literature of (repeatedly) dividing a dataset randomly into a calibration and test set must be applied with care. It can only be justified when there is no systematic stratification of the objects that will affect the validated estimates or figures of merits such as RMSE or R(2). The various levels of validation may, typically, be repeatability, reproducibility, and instrument and raw material variation. Examples of how one data set can be validated across this background information illustrate that it will affect the figures of merits as well as the dimensionality of the models. Even more important is the robustness of the models for predicting future samples. Another aspect that is brought to attention is validation in terms of the overall conclusions when observing a specific system. One example is to apply several methods for finding the significant variables and see if there is a consensus subset that also matches what is reported in the literature or based on the underlying chemistry.
Keywords: Chemometrics; Cross-validation; Resampling; Test set; Validation.
Copyright © 2015 Elsevier B.V. All rights reserved.