Urinary analyte data has to be corrected for the sample specific dilution as the dilution varies intra- and interpersonally dramatically, leading to non-comparable concentration measures. Most methods of dilution correction utilized nowadays like probabilistic quotient normalization or total spectra normalization result in a division of the raw data by a dilution correction factor. Here, however, we show that the implicit assumption behind the application of division, log-linearity between the urinary flow rate and the raw urinary concentration, does not hold for analytes which are not in steady state in blood. We explicate the physiological reason for this short-coming in mathematical terms and demonstrate the empirical consequences via simulations and on multiple time-point metabolomic data, showing the insufficiency of division-based normalization procedures to account for the complex non-linear analyte specific dependencies on the urinary flow rate. By reformulating normalization as a regression problem, we propose an analyte specific way to remove the dilution variance via a flexible non-linear regression methodology which then was shown to be more effective in comparison to division-based normalization procedures. In the progress, we developed several, easily applicable methods of normalization diagnostics to decide on the method of dilution correction in a given sample. On the way, we identified furthermore the time-span since last urination as an important variance factor in urinary metabolome data which is until now completely neglected. In conclusion, we present strong theoretical and empirical evidence that normalization has to be analyte specific in dynamically influenced data. Accordingly, we developed a normalization methodology for removing the dilution variance in urinary data respecting the single analyte kinetics.
Keywords: Dilution correction; Metabolomics; Model diagnostics; Non-linear regression techniques; Normalization; Urine analysis.
Copyright © 2018 Elsevier B.V. All rights reserved.