Exploring sequence characteristics related to high-level production of secreted proteins in Aspergillus niger

PLoS One. 2012;7(10):e45869. doi: 10.1371/journal.pone.0045869. Epub 2012 Oct 1.

Abstract

Protein sequence features are explored in relation to the production of over-expressed extracellular proteins by fungi. Knowledge on features influencing protein production and secretion could be employed to improve enzyme production levels in industrial bioprocesses via protein engineering. A large set, over 600 homologous and nearly 2,000 heterologous fungal genes, were overexpressed in Aspergillus niger using a standardized expression cassette and scored for high versus no production. Subsequently, sequence-based machine learning techniques were applied for identifying relevant DNA and protein sequence features. The amino-acid composition of the protein sequence was found to be most predictive and interpretation revealed that, for both homologous and heterologous gene expression, the same features are important: tyrosine and asparagine composition was found to have a positive correlation with high-level production, whereas for unsuccessful production, contributions were found for methionine and lysine composition. The predictor is available online at http://bioinformatics.tudelft.nl/hipsec. Subsequent work aims at validating these findings by protein engineering as a method for increasing expression levels per gene copy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Artificial Intelligence
  • Aspergillus niger / enzymology
  • Aspergillus niger / genetics*
  • Computational Biology / methods*
  • Electrophoresis, Polyacrylamide Gel
  • Enzymes / biosynthesis*
  • Fungal Proteins / genetics*
  • Gene Expression Profiling
  • Genes, Fungal / genetics*
  • Genetic Engineering / methods
  • Industrial Microbiology / methods*
  • Molecular Sequence Data

Substances

  • Enzymes
  • Fungal Proteins

Grants and funding

This work was supported by the BioRange programme of the Netherlands Bioinformatics Centre (NBIC) and was part of the Kluyver Centre for Genomics of Industrial Fermentation, subsidiaries of the Netherlands Genomics Initiative (NGI). This funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The used data set was provided by DSM Biotechnology Center. Study design, data collection and analysis, decision to publish, and preparation of the manuscript has been done in cooperation with the DSM employees who are co-authors.