Before introducing a new measurement tool it is necessary to evaluate its performance. Several statistical methods have been developed, or used, to evaluate the reliability and validity of a new assessment method in such circumstances. In this paper we review some commonly used methods. Data from a study that was conducted to evaluate the usefulness of a specific measurement tool (the WHO Colour Scale) is then used to illustrate the application of these methods. The WHO Colour Scale was developed under the auspices of the WHO to provide a simple portable and reliable method of detecting anaemia. This Colour Scale is a discrete interval scale, whereas the actual haemoglobin values it is used to estimate are on a continuous interval scale and can be measured accurately using electrical laboratory equipment. The methods we consider are: linear regression, correlation coefficients, paired t-tests plotting differences against mean values and deriving limits of agreement; kappa and weighted kappa statistics, sensitivity and specificity, an intraclass correlation coefficient and the repeatability coefficient. We note that although the definition and properties of each of these methods is well established inappropriate methods continue to be used in medical literature for assessing reliability and validity, as evidenced in the context of the evaluation of the WHO Colour Scale.
Copyright 2004 John Wiley & Sons, Ltd.