Multigenic families and proteomics: extended protein characterization as a tool for paralog gene identification

Proteomics. 2005 Feb;5(2):450-60. doi: 10.1002/pmic.200400954.

Abstract

In classical proteomic studies, the searches in protein databases lead mostly to the identification of protein functions by homology due to the non-exhaustiveness of the protein databases. The quality of the identification depends on the studied organism, its complexity and its representation in the protein databases. Nevertheless, this basic function identification is insufficient for certain applications namely for the development of RNA-based gene-silencing strategies, commonly termed RNA interference (RNAi) in animals and post-transcriptional gene silencing (PTGS) in plants, that require an unambiguous identification of the targeted gene sequence. A PTGS strategy was considered in the study of the infection of Oryza sativa by the Rice Yellow Mottle Virus (RYMV). It is suspected that the RYMV recruits host proteins after its entry into plant cells to form a complex facilitating virus multiplication and spreading. The protein partners of this complex were identified by a classical proteomic approach, nano liquid chromatography tandem mass spectrometry. Among the identified proteins, several were retained for a PTGS strategy. Nevertheless most of the protein candidates appear to be members of multigenic families for which all paralog genes are not present in protein databases. Thus the identification of the real expressed paralog gene with classical protein database searches is impossible. Consequently, as the genome contains all genes and thus all paralog genes, a whole genome search strategy was developed to determine the specific expressed paralog gene. With this approach, the identification of peptides matching only a single gene, called discriminant peptides, allows definitive proof of the expression of this identified gene. This strategy has several requirements: (i) a genome completely sequenced and accessible; (ii) high protein sequence coverage. In the present work, through three examples, we report and validate for the first time a genome database search strategy to specifically identify paralog genes belonging to multigenic families expressed under specific conditions.

MeSH terms

  • Chaperonin 60 / chemistry
  • Chaperonin 60 / genetics
  • Chaperonin 60 / isolation & purification
  • Chaperonin 60 / metabolism
  • Chromatography, Gel
  • Chromatography, Liquid
  • Chromosomes, Plant
  • Databases, Genetic
  • Databases, Protein
  • Discriminant Analysis
  • Electrophoresis, Polyacrylamide Gel
  • Freeze Drying
  • Fructose-Bisphosphate Aldolase / chemistry
  • Fructose-Bisphosphate Aldolase / genetics
  • Fructose-Bisphosphate Aldolase / isolation & purification
  • Fructose-Bisphosphate Aldolase / metabolism
  • Gene Expression
  • Gene Silencing
  • Genes, Plant
  • Mass Spectrometry
  • Mitochondrial Proteins / chemistry
  • Mitochondrial Proteins / genetics
  • Mitochondrial Proteins / isolation & purification
  • Mitochondrial Proteins / metabolism
  • Multigene Family*
  • Nanotechnology
  • Oryza / chemistry
  • Oryza / genetics*
  • Phenylalanine Ammonia-Lyase / chemistry
  • Phenylalanine Ammonia-Lyase / genetics
  • Phenylalanine Ammonia-Lyase / isolation & purification
  • Phenylalanine Ammonia-Lyase / metabolism
  • Plant Proteins / chemistry*
  • Plant Proteins / genetics*
  • Plant Proteins / isolation & purification
  • Plant Proteins / metabolism
  • Plant Viruses / chemistry
  • Plant Viruses / genetics*
  • Plant Viruses / isolation & purification
  • Protein Processing, Post-Translational
  • Proteomics*
  • RNA Interference
  • Reproducibility of Results
  • Sequence Analysis, Protein

Substances

  • Chaperonin 60
  • Mitochondrial Proteins
  • Plant Proteins
  • Fructose-Bisphosphate Aldolase
  • Phenylalanine Ammonia-Lyase