Read-across (RAX) is a popular data-gap filling technique that uses category and analogue approaches to predict toxicological endpoints for a target. Despite its increasing relevance, RAX relies on human expert judgement and lacks a reproducible and automated protocol. It also only relies on structural similarity for identifying the analogues, while other aspects are often neglected. In this paper, we propose an automated procedure for the selection of analogues for data gap-filling. Analogues were identified with a decision algorithm that integrates three similarity metrics, each considering different toxicologically relevant aspects (i.e., structural, biological and metabolic similarity). Structural filters based on the presence of maximum common substructures (MCS) and common functional groups were applied to narrow the chemical space for the analogues search. The procedure has been implemented as a workflow in KNIME and is freely available. The workflow provides informative tabular and graphical outputs to support toxicologists and risk assessors in drawing conclusion based on the RAX approach. The procedure has been validated for its predictive power on two datasets related to high-tier in vivo toxicological endpoints, i.e. human hepatotoxicity and drug-induced liver injury (DILI). The validation results gave good accuracy values (i.e., up to 0.79 for the binary hepatotoxicity classification and up to 0.67 for the three-class DILI classification) that were higher than those returned by RAX based on the sole use of structural similarity. Results confirmed the suitability of the procedure as a source of data to support regulatory decision-making.
Keywords: biological similarity; data-gap filling; read-across; workflow.