The enormous cytochrome oxidase subunit I (COI) sequence database being assembled from the various DNA barcoding projects as well as from independent phylogenetic studies constitutes an almost unprecedented amount of data for molecular systematics, in addition to its role in species identification and discovery. As part of a study of the potential of this gene fragment to improve the accuracy of phylogenetic reconstructions, and in particular, exploring the effects of dense taxon sampling, we have assembled a data set for the hyperdiverse, cosmopolitan parasitic wasp superfamily Ichneumonoidea, including the release of 1793 unpublished sequences. Of approximately 84 currently recognized Ichneumonoidea subfamilies, 2500 genera and 41,000 described species, barcoding 5'-COI data were assembled for 4168 putative species-level terminals (many undescribed), representing 671 genera and all but ten of the currently recognized subfamilies. After the removal of identical and near-identical sequences, the 4174 initial sequences were reduced to 3278. We show that when subjected to phylogenetic analysis using both maximum likelihood and parsimony, there is a broad correlation between taxonomic congruence and number of included sequences. We additionally present a new measure of taxonomic congruence based upon the Simpson diversity index, the Simpson dominance index, which gives greater weight to morphologically recognized taxonomic groups (subfamilies) recovered with most representatives in one or a few contiguous groups or subclusters.
© 2012 Blackwell Publishing Ltd.