Interpretation and Implications of Lognormal Linear Regression Used for Bacterial Enumeration

J AOAC Int. 2020 Jul 1;103(4):1105-1111. doi: 10.1093/jaoacint/qsaa005.

Abstract

Background: Bacterial enumeration data are typically log transformed to realize a more normal distribution and stabilize the variance. Unfortunately, statistical results from log transformed data are often misinterpreted as data within the arithmetic domain.

Objective: To explore the implication of slope and intercept from an unweighted linear regression and compare it to the results of the regression of log transformed data.

Method: Mathematical formulae inferencing explained using real dataset.

Results: For y=Ax+B+ε, where y is the recovery (CFU/g) and x is the target concentration (CFU/g) with error ε homogeneous across x. When B=0, slope A estimates percent recovery R. In the regression of log transformed data, logy=αlogx+β+εz (equivalent to equation y=Axα·ω), it is the intercept β=logyx=logA that estimates the percent recovery in logarithm when slope α=1, which means that R doesn't vary over x. Error term ω is multiplicative to x, while εz or log(ω) is additive to log(x). Whether the data should be transformed or not is not a choice, but a decision based on the distribution of the data. Significant difference was not found between the five models (the linear regression of log transformed data, three generalized linear models and a nonlinear model) regarding their predicted percent recovery when applied to our data. An acceptable regression model should result in approximately the best normal distribution of residuals.

Conclusions: Statistical procedures making use of log transformed data should be studied separately and documented as such, not collectively reported and interpreted with results studied in arithmetic domain.

Highlights: The way to interpret statistical results developed from arithmetic domain does not apply to that of the log transformed data.

MeSH terms

  • Linear Models*