Microbes are being engineered for an increasingly large and diverse set of applications. However, the designing of microbial genomes remains challenging due to the general complexity of biological systems. Adaptive Laboratory Evolution (ALE) leverages nature's problem-solving processes to generate optimized genotypes currently inaccessible to rational methods. The large amount of public ALE data now represents a new opportunity for data-driven strain design. This study describes how novel strain designs, or genome sequences not yet observed in ALE experiments or published designs, can be extracted from aggregated ALE data and demonstrates this by designing, building, and testing three novel Escherichia coli strains with fitnesses comparable to ALE mutants. These designs were achieved through a meta-analysis of aggregated ALE mutations data (63 Escherichia coli K-12 MG1655 based ALE experiments, described by 93 unique environmental conditions, 357 independent evolutions, and 13 957 observed mutations), which additionally revealed global ALE mutation trends that inform on ALE-derived strain design principles. Such informative trends anticipate ALE-derived strain designs as largely gene-centric, as opposed to noncoding, and composed of a relatively small number of beneficial variants (approximately 6). These results demonstrate how strain design efforts can be enhanced by the meta-analysis of aggregated ALE data.
Keywords: adaptive laboratory evolution; data-driven strain design; genome design variables; meta-analysis; mutation functional analysis; structural biology.