Values below detection limit in compositional chemical data

Anal Chim Acta. 2013 Feb 18:764:32-43. doi: 10.1016/j.aca.2012.12.029. Epub 2012 Dec 28.

Abstract

Samples representing part of a whole, usually called compositional data in statistics, are commonplace in analytical chemistry--say chemical data in percentage, ppm, or μg g(-1). Their distinctive feature is that there is an inherent relationship between all the analytes constituting a chemical sample as they only convey relative information. Some compositional data analysis principles and the log-ratio based methodology are outlined here in practical terms. Besides, one often finds that some analytes are not present in sufficient concentration in a sample to allow the measuring instruments to effectively detect them. These non-detects are usually labelled as "<DL" (less-thans) in the data set, indicating that the values are below known detection limits. Many data analysis techniques require complete data sets. Thus, there is a need of sensible replacement strategies for less-thans. The peculiar nature of compositional data determines any data analysis and demands for a specialised treatment of less-thans that, unfortunately, is not usually covered in chemometrics. Some well-founded statistical methods are revisited in this paper aiming to prevent practitioners from relying on popular but untrustworthy approaches. A new proposal to estimate less-thans combining a log-normal probability model and a multiplicative modification of the samples is also introduced. Their performance is illustrated and compared on a real data set, and guidelines are provided for practitioners. Matlab and R code implementing the methods are made available for the reader.