N-terminal heterogeneity resulting from non-uniform signal peptide (SP) cleavage can potentially affect biologics property attributes and result in extended product development timelines. Few studies are available on engineering SPs systematically to address miscleavage issues. Herein, we developed a novel high throughput computational pipeline capable of generating millions of SP mutant sequences that uses the SignalP 5.0 deep learning model to predict which of these mutants are likely to alleviate the N-terminal miscleavage in antibodies. We optimized the parameters to target mutating one or two amino acids at the C-terminus of 84 unique SPs, exhausting all theoretically possible combinations and resulting in a library of 296,077 unique wildtype and mutant signal peptides for in silico screening of each antibody. We applied this method to five antibodies against different targets, with various extent of miscleavage (2.3% to 100%) on their Lambda light chains. In each case, multiple SP mutants were generated, with miscleavage reduced to a non-detectable level and titers comparable with or better than that of the original SPs. Pairwise mutational analysis using an in silico library enriched with high-scoring mutants revealed patterns of amino acids at the C-terminus of SPs, providing insights beyond the "Heijne rule". To our knowledge, no similar approach that combines high throughput in silico mutagenesis and screening with SP cleavage prediction has been reported in the literature. This method can be applied to both the light chain and heavy chain of antibodies, regardless of their initial extent of miscleavage, provides optimized solutions for individual cases, and facilitates the development of antibody therapeutics.Abbreviations: Aa, amino acids; CHO, Chinese hamster ovary; CNN, convolutional neural network; CSscore, cleavage site score; CSV, comma-separated values; HC, heavy chain; HEK, human embryonic kidney; HPLC, high-performance liquid chromatography; IgG, immunoglobulin G; IGLV, immunoglobulin G Lambda variable; LC, light chain; LCMS, liquid chromatography-mass spectrometry; MS, mass spectrometry; PCR, polymerase chain reaction; PBS, phosphate-buffered saline; PEI, polyethylenimine; SP, signal peptide; SPase, signal peptidase; TCEP, tris(2-carboxyethyl) phosphine; TOF, time-of-flight.
Keywords: N-terminal miscleavage; Signal peptides; SignalP 5.0; antibodies; deep learning; high throughput mutagenesis; in silico screening.