Truncated outlier filtering

J Biopharm Stat. 2014;24(5):1115-29. doi: 10.1080/10543406.2014.926366.

Abstract

The statistical analysis of data can be heavily influenced by measurements of extreme value. If such measurements are contained in the remote tail ends of the true population distribution from which they are drawn, they are referred to as outliers. Neglecting to filter outliers from a sample can distort statistical computations and result in faulty conclusions. Conventional techniques identify measurements, whose distances from the mean exceed a selected multiple of the sample standard deviation, as outliers. Such approaches, however, can fail to classify measurements with large normalized distances as outliers. The truncated outlier filtering method first replaces the minimum and maximum of the population before computing the exclusion criterion. This mitigates the influence of abnormally large (or small) measurements on the normalized distance and hence yields a more compact criterion for outlier determination. Moreover, the method generalizes to two or more dimensions. Simulated one-dimensional and multidimensional data are analyzed. A discussion of the results is also presented.

Keywords: Filtering; Mahalanobis distance; Order statistics; Outliers.

MeSH terms

  • Algorithms
  • Clinical Trials as Topic / statistics & numerical data*
  • Data Interpretation, Statistical*
  • Humans
  • Models, Statistical*
  • Multivariate Analysis
  • Normal Distribution
  • Sample Size
  • Signal-To-Noise Ratio