Mining genomic patterns in Mycobacterium tuberculosis H37Rv using a web server Tuber-Gene

Genomics Proteomics Bioinformatics. 2011 Oct;9(4-5):171-8. doi: 10.1016/S1672-0229(11)60020-X.

Abstract

Mycobacterium tuberculosis (MTB), causative agent of tuberculosis, is one of the most dreaded diseases of the century. It has long been studied by researchers throughout the world using various wet-lab and dry-lab techniques. In this study, we focus on mining useful patterns at genomic level that can be applied for in silico functional characterization of genes from the MTB complex. The model developed on the basis of the patterns found in this study can correctly identify 99.77% of the input genes from the genome of MTB strain H37Rv. The model was tested against four other MTB strains and the homologue M. bovis to further evaluate its generalization capability. The mean prediction accuracy was 85.76%. It was also observed that the GC content remained fairly constant throughout the genome, implicating the absence of any pathogenicity island transferred from other organisms. This study reveals that dinucleotide composition is an efficient functional class discriminator for MTB complex. To facilitate the application of this model, a web server Tuber-Gene has been developed, which can be freely accessed at http://www.bifmanit.org/tb2/.

MeSH terms

  • Algorithms
  • Base Composition
  • Genes, Bacterial / genetics
  • Genome, Bacterial / genetics*
  • Genomics / methods*
  • Internet*
  • Models, Genetic*
  • Mycobacterium tuberculosis / genetics*
  • Reproducibility of Results
  • Software