Curating the data underlying quantitative structure-activity relationship models is a never-ending struggle. Some curation can now be automated but much cannot, especially where data as complex as those pertaining to molecular absorption, distribution, metabolism, excretion, and toxicity are concerned (vide infra). The authors discuss some particularly challenging problem areas in terms of specific examples involving experimental context, incompleteness of data, confusion of units, problematic nomenclature, tautomerism, and misapplication of automated structure recognition tools.
Keywords: Automated structure recognition; Cytochrome P450; Data curation; Metabolism; Nomenclature; QSAR; Tautomerism.