The most important criteria for the development and analysis of databases for elucidating the structural bases of toxicological activity include the integrity of the databases with respect to uniformity of the experimental protocol and interpretation of the test results and inclusion of chemicals representing different chemical classes and differing mechanisms of action. Within these criteria, it is demonstrated that when the chemicals are chosen at random, the larger the database, the better the predictivity of chemicals not included in the learning set. It is shown however, that when chemicals are selected on the basis of structural features, that a learning set of approximately 180 chemicals is as informative as a database consisting of 800 chemicals chosen at random.