-
Contrastive learning of T cell receptor representations
Authors:
Yuta Nagano,
Andrew Pyo,
Martina Milighetti,
James Henderson,
John Shawe-Taylor,
Benny Chain,
Andreas Tiffeau-Mayer
Abstract:
Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein lan…
▽ More
Computational prediction of the interaction of T cell receptors (TCRs) and their ligands is a grand challenge in immunology. Despite advances in high-throughput assays, specificity-labelled TCR data remains sparse. In other domains, the pre-training of language models on unlabelled data has been successfully used to address data bottlenecks. However, it is unclear how to best pre-train protein language models for TCR specificity prediction. Here we introduce a TCR language model called SCEPTR (Simple Contrastive Embedding of the Primary sequence of T cell Receptors), capable of data-efficient transfer learning. Through our model, we introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling, which enables SCEPTR to achieve its state-of-the-art performance. In contrast, existing protein language models and a variant of SCEPTR pre-trained without autocontrastive learning are outperformed by sequence alignment-based methods. We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Correlated Feature Selection with Extended Exclusive Group Lasso
Authors:
Yuxin Sun,
Benny Chain,
Samuel Kaski,
John Shawe-Taylor
Abstract:
In many high dimensional classification or regression problems set in a biological context, the complete identification of the set of informative features is often as important as predictive accuracy, since this can provide mechanistic insight and conceptual understanding. Lasso and related algorithms have been widely used since their sparse solutions naturally identify a set of informative featur…
▽ More
In many high dimensional classification or regression problems set in a biological context, the complete identification of the set of informative features is often as important as predictive accuracy, since this can provide mechanistic insight and conceptual understanding. Lasso and related algorithms have been widely used since their sparse solutions naturally identify a set of informative features. However, Lasso performs erratically when features are correlated. This limits the use of such algorithms in biological problems, where features such as genes often work together in pathways, leading to sets of highly correlated features. In this paper, we examine the performance of a Lasso derivative, the exclusive group Lasso, in this setting. We propose fast algorithms to solve the exclusive group Lasso, and introduce a solution to the case when the underlying group structure is unknown. The solution combines stability selection with random group allocation and introduction of artificial features. Experiments with both synthetic and real-world data highlight the advantages of this proposed methodology over Lasso in comprehensive selection of informative features.
△ Less
Submitted 27 February, 2020;
originally announced February 2020.
-
Hybrid spreading mechanisms and T cell activation shape the dynamics of HIV-1 infection
Authors:
Changwang Zhang,
Shi Zhou,
Elisabetta Groppelli,
Pierre Pellegrino,
Ian Williams,
Persephone Borrow,
Benjamin M. Chain,
Clare Jolly
Abstract:
HIV-1 can disseminate between susceptible cells by two mechanisms: cell-free infection following fluid-phase diffusion of virions and by highly-efficient direct cell-to-cell transmission at immune cell contacts. The contribution of this hybrid spreading mechanism, which is also a characteristic of some important computer worm outbreaks, to HIV-1 progression in vivo remains unknown. Here we present…
▽ More
HIV-1 can disseminate between susceptible cells by two mechanisms: cell-free infection following fluid-phase diffusion of virions and by highly-efficient direct cell-to-cell transmission at immune cell contacts. The contribution of this hybrid spreading mechanism, which is also a characteristic of some important computer worm outbreaks, to HIV-1 progression in vivo remains unknown. Here we present a new mathematical model that explicitly incorporates the ability of HIV-1 to use hybrid spreading mechanisms and evaluate the consequences for HIV-1 pathogenenesis. The model captures the major phases of the HIV-1 infection course of a cohort of treatment naive patients and also accurately predicts the results of the Short Pulse Anti-Retroviral Therapy at Seroconversion (SPARTAC) trial. Using this model we find that hybrid spreading is critical to seed and establish infection, and that cell-to-cell spread and increased CD4+ T cell activation are important for HIV-1 progression. Notably, the model predicts that cell-to-cell spread becomes increasingly effective as infection progresses and thus may present a considerable treatment barrier. Deriving predictions of various treatments' influence on HIV-1 progression highlights the importance of earlier intervention and suggests that treatments effectively targeting cell-to-cell HIV-1 spread can delay progression to AIDS. This study suggests that hybrid spreading is a fundamental feature of HIV infection, and provides the mathematical framework incorporating this feature with which to evaluate future therapeutic strategies.
△ Less
Submitted 31 March, 2015;
originally announced March 2015.
-
LeoTask: a fast, flexible and reliable framework for computational research
Authors:
Changwang Zhang,
Shi Zhou,
Benjamin M. Chain
Abstract:
LeoTask is a Java library for computation-intensive and time-consuming research tasks. It automatically executes tasks in parallel on multiple CPU cores on a computing facility. It uses a configuration file to enable automatic exploration of parameter space and flexible aggregation of results, and therefore allows researchers to focus on programming the key logic of a computing task. It also suppo…
▽ More
LeoTask is a Java library for computation-intensive and time-consuming research tasks. It automatically executes tasks in parallel on multiple CPU cores on a computing facility. It uses a configuration file to enable automatic exploration of parameter space and flexible aggregation of results, and therefore allows researchers to focus on programming the key logic of a computing task. It also supports reliable recovery from interruptions, dynamic and cloneable networks, and integration with the plotting software Gnuplot.
△ Less
Submitted 7 January, 2015;
originally announced January 2015.
-
Optimizing Hybrid Spreading in Metapopulations
Authors:
Changwang Zhang,
Shi Zhou,
Joel C. Miller,
Ingemar J. Cox,
Benjamin M. Chain
Abstract:
Epidemic spreading phenomena are ubiquitous in nature and society. Examples include the spreading of diseases, information, and computer viruses. Epidemics can spread by local spreading, where infected nodes can only infect a limited set of direct target nodes and global spreading, where an infected node can infect every other node. In reality, many epidemics spread using a hybrid mixture of both…
▽ More
Epidemic spreading phenomena are ubiquitous in nature and society. Examples include the spreading of diseases, information, and computer viruses. Epidemics can spread by local spreading, where infected nodes can only infect a limited set of direct target nodes and global spreading, where an infected node can infect every other node. In reality, many epidemics spread using a hybrid mixture of both types of spreading. In this study we develop a theoretical framework for studying hybrid epidemics, and examine the optimum balance between spreading mechanisms in terms of achieving the maximum outbreak size. We show the existence of critically hybrid epidemics where neither spreading mechanism alone can cause a noticeable spread but a combination of the two spreading mechanisms would produce an enormous outbreak. Our results provide new strategies for maximising beneficial epidemics and estimating the worst outcome of damaging hybrid epidemics.
△ Less
Submitted 31 March, 2015; v1 submitted 25 September, 2014;
originally announced September 2014.
-
Hybrid Epidemics - A Case Study on Computer Worm Conficker
Authors:
Changwang Zhang,
Shi Zhou,
Benjamin M. Chain
Abstract:
Conficker is a computer worm that erupted on the Internet in 2008. It is unique in combining three different spreading strategies: local probing, neighbourhood probing, and global probing. We propose a mathematical model that combines three modes of spreading, local, neighbourhood and global to capture the worm's spreading behaviour. The parameters of the model are inferred directly from network d…
▽ More
Conficker is a computer worm that erupted on the Internet in 2008. It is unique in combining three different spreading strategies: local probing, neighbourhood probing, and global probing. We propose a mathematical model that combines three modes of spreading, local, neighbourhood and global to capture the worm's spreading behaviour. The parameters of the model are inferred directly from network data obtained during the first day of the Conifcker epidemic. The model is then used to explore the trade-off between spreading modes in determining the worm's effectiveness. Our results show that the Conficker epidemic is an example of a critically hybrid epidemic, in which the different modes of spreading in isolation do not lead to successful epidemics. Such hybrid spreading strategies may be used beneficially to provide the most effective strategies for promulgating information across a large population. When used maliciously, however, they can present a dangerous challenge to current internet security protocols.
△ Less
Submitted 31 March, 2015; v1 submitted 22 June, 2014;
originally announced June 2014.