Identification of a Non-Pentapeptide Region Associated with Rapid Mycobacterial Evolution

PLoS One. 2016 May 5;11(5):e0154059. doi: 10.1371/journal.pone.0154059. eCollection 2016.

Abstract

A large portion of the coding capacity of Mycobacterium tuberculosis is devoted to the production of proteins containing several copies of the pentapeptide-2 repeat, namely the PE/PPE_MPTR proteins. Protein domain repeats have a variety of binding properties and are involved in protein-protein interactions as well as binding to other ligands such as DNA and RNA. They are not as common in prokaryotes, compared to eukaryotes, but the enrichment of pentapeptide-2 repeats in Mycobacteria constitutes an exception to that rule. The genes encoding the PE/PPE_MPTR proteins have undergone many rearrangements and here we have identified the expansion patterns across the Mycobacteria. We have performed a reclassification of the PE/PPE_MPTR proteins using cohesive regions rather than sparse domain architectures. It is clear that these proteins have undergone large insertions of several pentapeptide-2 domains appearing adjacent to one another in a repetitive pattern. Further, we have identified a non-pentapeptide motif associated with rapid mycobacterial evolution. The sequence composition of this region suggests a different structure compared to pentapeptide-2 repeats. By studying the evolution of the PE/PPE_MPTR proteins, we have distinguished features pertaining to tuberculosis-inducing species. Further studies of the non-pentapeptide region associated with repeat expansions promises to shed light on the pathogenicity of Mycobacterium tuberculosis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA, Intergenic / genetics
  • Evolution, Molecular*
  • Genes, Bacterial / genetics
  • Microsatellite Repeats / genetics
  • Mycobacterium / genetics
  • Mycobacterium tuberculosis / genetics*
  • Phylogeny
  • Protein Domains / genetics
  • Repetitive Sequences, Nucleic Acid

Substances

  • DNA, Intergenic

Grants and funding

SL was funded by Bioinformatics Infrastructure for Life Science (BILS) and Science for Life Laboratory (SciLifeLab). PW was financed by SciLifeLab.