In this article, we describe two complementary data-mining approaches used to characterize the GlaxoSmithKline (GSK) natural-products set (NPS) based on information from the high-throughput screening (HTS) databases. Both methods rely on the aggregation and analysis of a large set of single-shot screening data for a number of biological assays, with the goal to reveal natural-product chemical motifs. One of them is an established method based on the data-driven clustering of compounds using a wide range of descriptors,(1)whereas the other method partitions and hierarchically clusters the data to identify chemical cores.(2,3)Both methods successfully find structural scaffolds that significantly hit different groups of discrete drug targets, compared with their relative frequency of demonstrating inhibitory activity in a large number of screens. We describe how these methods can be applied to unveil hidden information in large single-shot HTS data sets. Applied prospectively, this type of information could contribute to the design of new chemical templates for drug-target classes and guide synthetic efforts for lead optimization of tractable hits that are based on natural-product chemical motifs. Relevant findings for 7TM receptors (7TMRs), ion channels, class-7 transferases (protein kinases), hydrolases, and oxidoreductases will be discussed.
Keywords: computational chemistry; natural products screening; statistical analyses; structure–activity relationships.
© 2014 Society for Laboratory Automation and Screening.