Drug-like annotation and duplicate analysis of a 23-supplier chemical database totalling 2.7 million compounds

J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):643-51. doi: 10.1021/ci034260m.

Abstract

We have implemented five drug-like filters, based on 1D and 2D molecular descriptors, and applied them to characterize the drug-like properties of commercially available chemical compounds. In addition to previously published filters (Lipinski and Veber), we implemented a filter for medicinal chemistry tractability based on lists of chemical features drawn up by a panel of medicinal chemists. A filter based on the modeling of aqueous solubility (>1 microM) was derived in-house, as well as another based on the modeling of Caco-2 passive membrane permeability (>10 nm/s). A library of 2.7 million compounds was collated from the 23 compound suppliers and analyzed with these filters, highlighting a tendency toward highly lipophilic compounds. The library contains 1.6 M unique structures, of which 37% (607,223) passed all five drug-like filters. None of the 23 suppliers provides all the members of the drug-like subset, emphasizing the benefit of considering compounds from various compound suppliers as a source of diversity for drug discovery.