Optimizing doped libraries by using genetic algorithms

J Comput Aided Mol Des. 1997 Jan;11(1):29-38. doi: 10.1023/a:1008071310472.

Abstract

The insertion of random sequences into protein-encoding genes in combination with biological selection techniques has become a valuable tool in the design of molecules that have useful and possibly novel properties. By employing highly effective screening protocols, a functional and unique structure that had not been anticipated can be distinguished among a huge collection of inactive molecules that together represent all possible amino acid combinations. This technique is severely limited by its restriction to a library of manageable size. One approach for limiting the size of a mutant library relies on 'doping schemes', where subsets of amino acids are generated that reveal only certain combinations of amino acids in a protein sequence. Three mononucleotide mixtures for each codon concerned must be designed, such that the resulting codons that are assembled during chemical gene synthesis represent the desired amino acid mixture on the level of the translated protein. In this paper we present a doping algorithm that "reverse translates' a desired mixture of certain amino acids into three mixtures of mononucleotides. The algorithm is designed to optimally bias these mixtures towards the codons of choice. This approach combines a genetic algorithm with local optimization strategies based on the downhill simplex method. Disparate relative representations of all amino acids (and stop codons) within a target set can be generated. Optional weighing factors are employed to emphasize the frequencies of certain amino acids and their codon usage, and to compensate for reaction rates of different mononucleotide building blocks (synthons) during chemical DNA synthesis. The effect of statistical errors that accompany an experimental realization of calculated nucleotide mixtures on the generated mixtures of amino acids is simulated. These simulations show that the robustness of different optima with respect to small deviations from calculated values depends on their concomitant fitness. Furthermore, the calculations probe the fitness landscape locally and allow a preliminary assessment of its structure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence
  • Binding Sites
  • Codon / genetics
  • Computer Simulation
  • DNA / chemical synthesis
  • DNA / genetics
  • Drug Design
  • Models, Genetic*
  • Mutation
  • Peptide Library*
  • Peptides / chemical synthesis
  • Peptides / chemistry
  • Peptides / genetics

Substances

  • Codon
  • Peptide Library
  • Peptides
  • DNA