Background: Array-based Comparative Genomic Hybridization (CGH) data have been used to infer phylogenetic relationships. However, the reliability of array CGH analysis to determine evolutionary relationships has not been well established. In most CGH work, all species and strains are compared to a single reference species, whose genome was used to design the array. In the accompanying work, we critically evaluated CGH-based phylogeny using simulated competitive hybridization data. This work showed that a limited number of conditions, principally the tree topology and placement of the reference taxon in the tree, had a strong effect on the ability to recover the correct tree topology. Here, we add to our simulation study by testing the use of CGH as a phylogenetic tool with experimental CGH data from competitive hybridizations between N. crassa and other Neurospora species. In the discussion, we add to our empirical study of Neurospora by reanalyzing of data from a previous CGH phylogenetic analysis of the yeast sensu stricto complex.
Results: Array ratio data for Neurospora and related species were normalized with loess, robust spline, and linear ratio based methods, and then used to construct Neighbor-Joining and parsimony trees. These trees were compared to published phylogenetic analyses for Neurospora based on multilocus sequence analysis (MLSA). For the Neurospora dataset, the best combination of methods resulted in recovery of the MLSA tree topology less than half the time. Our reanalysis of a yeast dataset found that trees identical to established phylogeny were recovered only by pruning taxa - including the reference taxon - from the analysis.
Conclusion: Our results indicate that CGH data can be problematic for phylogenetic analysis. Success fluctuates based on the methods utilized to construct the tree and the taxa included. Selective pruning of the taxa improves the results - an impractical approach for normal phylogenetic analysis. From the more successful methods we make suggestions on the normalization and post-normalization methods that work best in estimating genetic distance between taxa.