Censoring Trace-Level Environmental Data: Statistical Analysis Considerations to Limit Bias

Environ Sci Technol. 2021 Mar 16;55(6):3786-3795. doi: 10.1021/acs.est.0c02256. Epub 2021 Feb 24.

Abstract

Trace-level environmental data typically include values near or below detection and quantitation thresholds where health effects may result from low-concentration exposures to one chemical over time or to multiple chemicals. In a cook stove case study, bias in dibenzo[a,h]anthracene concentration means and standard deviations (SDs) was assessed following censoring at thresholds for selected analysis approaches: substituting threshold/2, maximum likelihood estimation, robust regression on order statistics, Kaplan-Meier, and omitting censored observations. Means and SDs for gas chromatography-mass spectrometry-determined concentrations were calculated after censoring at detection and calibration thresholds, 17% and 55% of the data, respectively. Threshold/2 substitution was the least biased. Measurement values were subsequently simulated from two log-normal distributions at two sample sizes. Means and SDs were calculated for 30%, 50%, and 80% censoring levels and compared to known distribution counterparts. Simulation results illustrated (1) threshold/2 substitution to be inferior to modern after-censoring statistical approaches and (2) all after-censoring approaches to be inferior to including all measurement data in analysis. Additionally, differences in stove-specific group means were tested for uncensored samples and after censoring. Group differences of means tests varied depending on censoring and distributional decisions. Investigators should guard against censoring-related bias from (explicit or implicit) distributional and analysis approach decisions.

Keywords: PAH; detection limit; estimating the mean; maximum likelihood estimation; nondetect; regression on order statistics; reporting level; simulation study.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bias
  • Computer Simulation
  • Models, Statistical*
  • Research Design*