You're one in a googol: optimizing genes for protein expression

J R Soc Interface. 2009 Aug 6;6 Suppl 4(Suppl 4):S467-76. doi: 10.1098/rsif.2008.0520.focus. Epub 2009 Mar 11.

Abstract

A vast number of different nucleic acid sequences can all be translated by the genetic code into the same amino acid sequence. These sequences are not all equally useful however; the exact sequence chosen can have profound effects on the expression of the encoded protein. Despite the importance of protein-coding sequences, there has been little systematic study to identify parameters that affect expression. This is probably because protein expression has largely been tackled on an ad hoc basis in many independent projects: once a sequence has been obtained that yields adequate expression for that project, there is little incentive to continue work on the problem. Synthetic biology may now provide the impetus to transform protein expression folklore into design principles, so that DNA sequences may easily be designed to express any protein in any system. In this review, we offer a brief survey of the literature, outline the major challenges in interpreting existing data and constructing robust design algorithms, and propose a way to proceed towards the goal of rational sequence engineering.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Amino Acid Sequence
  • Base Sequence
  • Biotechnology / methods*
  • Codon
  • DNA / chemistry
  • Gene Expression Profiling*
  • Genetic Code
  • Molecular Sequence Data
  • Open Reading Frames
  • Proteins / chemistry*
  • Proteomics / methods
  • Sequence Alignment

Substances

  • Codon
  • Proteins
  • DNA