Background: Bacterial enumeration data are typically log transformed to realize a more normal distribution and stabilize the variance. Unfortunately, statistical results from log transformed data are often misinterpreted as data within the arithmetic domain.
Objective: To explore the implication of slope and intercept from an unweighted linear regression and compare it to the results of the regression of log transformed data.
Method: Mathematical formulae inferencing explained using real dataset.
Results: For y=Ax+B+ε, where y is the recovery (CFU/g) and x is the target concentration (CFU/g) with error ε homogeneous across x. When B=0, slope A estimates percent recovery R. In the regression of log transformed data, logy=αlogx+β+εz (equivalent to equation y=Axα·ω), it is the intercept β=logyx=logA that estimates the percent recovery in logarithm when slope α=1, which means that R doesn't vary over x. Error term ω is multiplicative to x, while εz or log(ω) is additive to log(x). Whether the data should be transformed or not is not a choice, but a decision based on the distribution of the data. Significant difference was not found between the five models (the linear regression of log transformed data, three generalized linear models and a nonlinear model) regarding their predicted percent recovery when applied to our data. An acceptable regression model should result in approximately the best normal distribution of residuals.
Conclusions: Statistical procedures making use of log transformed data should be studied separately and documented as such, not collectively reported and interpreted with results studied in arithmetic domain.
Highlights: The way to interpret statistical results developed from arithmetic domain does not apply to that of the log transformed data.
© AOAC INTERNATIONAL 2020. All rights reserved. For permissions, please email: [email protected].