Exploring the limits of graph invariant- and spectrum-based discrimination of (sub)structures

J Chem Inf Comput Sci. 2002 May-Jun;42(3):640-50. doi: 10.1021/ci010121y.

Abstract

The limits of a recently proposed computer method for finding all distinct substructures of a chemical structure are systematically explored within comprehensive graph samples which serve as supersets of the graphs corresponding to saturated hydrocarbons, both acyclic (up to n = 20) and (poly)cyclic (up to n = 10). Several pairs of smallest graphs and compounds are identified that cannot be distinguished using selected combinations of invariants such as combinations of Balaban's index J and graph matrix eigenvalues. As the most important result, it can now be stated that the computer program NIMSG, using J and distance eigenvalues, is safe within the domain of mono- through tetracyclic saturated hydrocarbon substructures up to n = 10 (oligocyclic decanes) and of all acyclic alkane substructures up to n = 19 (nonadecanes), i.e., it will not miss any of these substructures. For the regions surrounding this safe domain, upper limits are found for the numbers of substructures that may be lost in the worst case, and these are low. This taken together means that the computer program can be reasonably employed in chemistry whenever one is interested in finding the saturated hydrocarbon substructures. As to unsaturated and heteroatom containing substructures, there are reasons to conjecture that the method's resolving power for them is similar.