In order to develop methods for evaluating the predictive performance of computer-driven structure-activity methods (SAR) as well as to determine the limits of predictivity, we investigated the behavior of two Salmonella mutagenicity data bases: (a) a subset from the Genetox Program and (b) one from the U.S. National Toxicology Program (NTP). For molecules common to the two data bases, the experimental concordance was 76% when "marginals" were included and 81% when they were excluded. Three SAR methods were evaluated: CASE, MULTICASE and CASE/Graph Indices (CASE/GI). The programs "learned" the Genetox data base and used it to predict NTP molecules that were not present in the Genetox compilation. The concordances were 72, 80 and 47% respectively. Obviously, the MULTICASE version is superior and approaches the 85% interlaboratory variability observed for the Salmonella mutagenicity assays when the latter was carried out under carefully controlled conditions.