EuPathDomains: the divergent domain database for eukaryotic pathogens

Infect Genet Evol. 2011 Jun;11(4):698-707. doi: 10.1016/j.meegid.2010.09.008. Epub 2010 Nov 4.

Abstract

Eukaryotic pathogens (e.g. Plasmodium, Leishmania, Trypanosomes, etc.) are a major source of morbidity and mortality worldwide. In Africa, one of the most impacted continents, they cause millions of deaths and constitute an immense economic burden. While the genome sequence of several of these organisms is now available, the biological functions of more than half of their proteins are still unknown. This is a serious issue for bringing to the foreground the expected new therapeutic targets. In this context, the identification of protein domains is a key step to improve the functional annotation of the proteins. However, several domains are missed in eukaryotic pathogens because of the high phylogenetic distance of these organisms from the classical eukaryote models. We recently proposed a method, co-occurrence domain detection (CODD), that improves the sensitivity of Pfam domain detection by exploiting the tendency of domains to appear preferentially with a few other favorite domains in a protein. In this paper, we present EuPathDomains (http://www.atgc-montpellier.fr/EuPathDomains/), an extended database of protein domains belonging to ten major eukaryotic human pathogens. EuPathDomains gathers known and new domains detected by CODD, along with the associated confidence measurements and the GO annotations that can be deduced from the new domains. This database significantly extends the Pfam domain coverage of all selected genomes, by proposing new occurrences of domains as well as new domain families that have never been reported before. For example, with a false discovery rate lower than 20%, EuPathDomains increases the number of detected domains by 13% in Toxoplasma gondii genome and up to 28% in Cryptospordium parvum, and the total number of domain families by 10% in Plasmodium falciparum and up to 16% in C. parvum genome. The database can be queried by protein names, domain identifiers, Pfam or Interpro identifiers, or organisms, and should become a valuable resource to decipher the protein functions of eukaryotic pathogens.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Cryptosporidium parvum / genetics
  • Databases, Protein*
  • Eukaryota / genetics*
  • Eukaryota / metabolism
  • Giardia lamblia / genetics
  • Humans
  • Leishmania / genetics
  • Molecular Sequence Annotation
  • Plasmodium / genetics
  • Protein Binding
  • Protein Interaction Domains and Motifs / genetics*
  • Protozoan Proteins / chemistry
  • Protozoan Proteins / genetics*
  • Protozoan Proteins / metabolism
  • Toxoplasma / genetics
  • Trypanosoma brucei brucei / genetics

Substances

  • Protozoan Proteins