Plasmids are mosaic in composition with a maintenance "backbone" as well as "accessory" genes obtained via horizontal gene transfer. This horizontal gene transfer complicates the study of their genetic relationships. We describe a method for relating a large number of Gram-negative (GN) bacterial plasmids based on their genetic sequences. Complete coding gene sequences of 527 GN bacterial plasmids were obtained from NCBI. Initial classification of their genetic relationships was accomplished using a computational approach analogous to hybridization of "mixed-genome microarrays." Because of this similarity, the phrase "virtual hybridization" is used to describe this approach. Protein sequences generated from the gene sequences were randomly chosen to serve as "probes" for the virtual arrays, and virtual hybridization for each GN plasmid was achieved using BLASTp. Each resulting intensity matrix was used to generate a distance matrix from which an initial tree was constructed. Relationships were refined for several clusters by identifying conserved proteins within a cluster. Multiple-sequence alignment was applied to the concatenated conserved proteins, and maximum likelihood was used to generate relationships from the results of the alignment. While it is not possible to prove that the genetic relationships among the 527 GN bacterial plasmids obtained in this study are correct, replication of identical results produced in a separate study for a small group of IncA/C plasmids provides evidence that the approach used can correctly predict genetic relationships. In addition, results obtained for clusters of Borrelia plasmids are consistent with the expected exclusivity for plasmids from this genus. Finally, the 527-plasmid tree was used to study the distribution of four common antibiotic resistance genes.
Copyright © 2012 Elsevier Inc. All rights reserved.