Zum Hauptinhalt springen

Showing 1–21 of 21 results for author: Bornschein, J

.
  1. arXiv:2409.01369  [pdf, other

    cs.LG cs.AI stat.ML

    Imitating Language via Scalable Inverse Reinforcement Learning

    Authors: Markus Wulfmeier, Michael Bloesch, Nino Vieillard, Arun Ahuja, Jorg Bornschein, Sandy Huang, Artem Sokolov, Matt Barnes, Guillaume Desjardins, Alex Bewley, Sarah Maria Elisabeth Bechtle, Jost Tobias Springenberg, Nikola Momchev, Olivier Bachem, Matthieu Geist, Martin Riedmiller

    Abstract: The majority of language model training builds on imitation learning. It covers pretraining, supervised fine-tuning, and affects the starting conditions for reinforcement learning from human feedback (RLHF). The simplicity and scalability of maximum likelihood estimation (MLE) for next token prediction led to its role as predominant paradigm. However, the broader field of imitation learning can mo… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  2. arXiv:2403.05196  [pdf, other

    cs.LG cs.CV

    Denoising Autoregressive Representation Learning

    Authors: Yazhe Li, Jorg Bornschein, Ting Chen

    Abstract: In this paper, we explore a new generative approach for learning visual representations. Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively. We find that training with Mean Squared Error (MSE) alone leads to strong representations. To enhance the image generation ability, we replace the MSE loss with the diffusion objective by using a denoising patch dec… ▽ More

    Submitted 4 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2403.01554  [pdf, other

    cs.LG

    Transformers for Supervised Online Continual Learning

    Authors: Jorg Bornschein, Yazhe Li, Amal Rannen-Triki

    Abstract: Transformers have become the dominant architecture for sequence modeling tasks such as natural language processing or audio processing, and they are now even considered for tasks that are not naturally sequential such as image classification. Their ability to attend to and to process a set of tokens as context enables them to develop in-context few-shot learning abilities. However, their potential… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  4. arXiv:2403.01518  [pdf, other

    cs.CL cs.LG

    Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

    Authors: Amal Rannen-Triki, Jorg Bornschein, Razvan Pascanu, Marcus Hutter, Andras György, Alexandre Galashov, Yee Whye Teh, Michalis K. Titsias

    Abstract: We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation. While it is generally known that this approach improves the overall predictive performance, especially when considering distributional shift between training and evaluation data, we here emphasize the perspective that online adaptation turns parameters into temporally ch… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  5. arXiv:2307.05741  [pdf, other

    cs.CL

    Towards Robust and Efficient Continual Language Learning

    Authors: Adam Fisch, Amal Rannen-Triki, Razvan Pascanu, Jörg Bornschein, Angeliki Lazaridou, Elena Gribovskaya, Marc'Aurelio Ranzato

    Abstract: As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue fine-tuning models trained on past tasks on new tasks, with the goal of "transferring" relevant knowledge. However, this strategy also runs the risk of doing m… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  6. arXiv:2306.08448  [pdf, other

    cs.LG cs.AI

    Kalman Filter for Online Classification of Non-Stationary Data

    Authors: Michalis K. Titsias, Alexandre Galashov, Amal Rannen-Triki, Razvan Pascanu, Yee Whye Teh, Jorg Bornschein

    Abstract: In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  7. arXiv:2304.05823  [pdf, other

    q-bio.MN cs.LG q-bio.GN

    DiscoGen: Learning to Discover Gene Regulatory Networks

    Authors: Nan Rosemary Ke, Sara-Jane Dunn, Jorg Bornschein, Silvia Chiappa, Melanie Rey, Jean-Baptiste Lespiau, Albin Cassirer, Jane Wang, Theophane Weber, David Barrett, Matthew Botvinick, Anirudh Goyal, Mike Mozer, Danilo Rezende

    Abstract: Accurately inferring Gene Regulatory Networks (GRNs) is a critical and challenging task in biology. GRNs model the activatory and inhibitory interactions between genes and are inherently causal in nature. To accurately identify GRNs, perturbational data is required. However, most GRN discovery methods only operate on observational data. Recent advances in neural network-based causal discovery meth… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

  8. arXiv:2302.09579  [pdf, other

    cs.LG cs.CV

    Evaluating Representations with Readout Model Switching

    Authors: Yazhe Li, Jorg Bornschein, Marcus Hutter

    Abstract: Although much of the success of Deep Learning builds on learning good representations, a rigorous method to evaluate their quality is lacking. In this paper, we treat the evaluation of representations as a model selection problem and propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. Contrary to the established practice of limiting the capacity of the rea… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

    Journal ref: International Conference on Learning Representations, 2023

  9. arXiv:2212.08131  [pdf, other

    cs.LG

    Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies

    Authors: Shivakanth Sujit, Pedro H. M. Braga, Jorg Bornschein, Samira Ebrahimi Kahou

    Abstract: Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive; such as in robotics. Offlin… ▽ More

    Submitted 21 November, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: TMLR 2023

  10. arXiv:2211.11747  [pdf, other

    cs.LG cs.CV

    NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

    Authors: Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato

    Abstract: A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks. An even more ambitious goal is to build models that never stop adapting, and that become increasingly more efficient through time by suitably transferring the accrued knowledge. Beyond the study o… ▽ More

    Submitted 16 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  11. arXiv:2210.07931  [pdf, other

    stat.ML cs.LG

    Sequential Learning Of Neural Networks for Prequential MDL

    Authors: Jorg Bornschein, Yazhe Li, Marcus Hutter

    Abstract: Minimum Description Length (MDL) provides a framework and an objective for principled model evaluation. It formalizes Occam's Razor and can be applied to data from non-stationary sources. In the prequential formulation of MDL, the objective is to minimize the cumulative next-step log-loss when sequentially going through the data and using previous observations for parameter estimation. It thus clo… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  12. arXiv:2206.10011  [pdf, other

    cs.LG cs.CV stat.ML

    When Does Re-initialization Work?

    Authors: Sheheryar Zaidi, Tudor Berariu, Hyunjik Kim, Jörg Bornschein, Claudia Clopath, Yee Whye Teh, Razvan Pascanu

    Abstract: Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay an… ▽ More

    Submitted 2 April, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: Published in PMLR Volume 187; spotlight presentation at I Can't Believe It's Not Better Workshop at NeurIPS 2022

  13. arXiv:2204.04875  [pdf, other

    stat.ML cs.LG

    Learning to Induce Causal Structure

    Authors: Nan Rosemary Ke, Silvia Chiappa, Jane Wang, Anirudh Goyal, Jorg Bornschein, Melanie Rey, Theophane Weber, Matthew Botvinic, Michael Mozer, Danilo Jimenez Rezende

    Abstract: The fundamental challenge in causal induction is to infer the underlying graph structure given observational and/or interventional data. Most existing causal induction algorithms operate by generating candidate graphs and evaluating them using either score-based methods (including continuous optimization) or independence tests. In our work, we instead treat the inference process as a black box and… ▽ More

    Submitted 7 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

  14. arXiv:2107.05481  [pdf, other

    cs.LG stat.ML

    Prequential MDL for Causal Structure Learning with Neural Networks

    Authors: Jorg Bornschein, Silvia Chiappa, Alan Malek, Rosemary Nan Ke

    Abstract: Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas of science and technology. We show that the prequential minimum description length principle (MDL) can be used to derive a practical scoring function for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability distributi… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  15. arXiv:2106.00042  [pdf, other

    cs.LG

    A study on the plasticity of neural networks

    Authors: Tudor Berariu, Wojciech Czarnecki, Soham De, Jorg Bornschein, Samuel Smith, Razvan Pascanu, Claudia Clopath

    Abstract: One aim shared by multiple settings, such as continual learning or transfer learning, is to leverage previously acquired knowledge to converge faster on the current task. Usually this is done through fine-tuning, where an implicit assumption is that the network maintains its plasticity, meaning that the performance it can reach on any given task is not affected negatively by previously seen tasks.… ▽ More

    Submitted 14 October, 2023; v1 submitted 31 May, 2021; originally announced June 2021.

  16. arXiv:2009.12583  [pdf, other

    cs.LG stat.ML

    Small Data, Big Decisions: Model Selection in the Small-Data Regime

    Authors: Jorg Bornschein, Francesco Visin, Simon Osindero

    Abstract: Highly overparametrized neural networks can display curiously strong generalization performance - a phenomenon that has recently garnered a wealth of theoretical and empirical research in order to better understand it. In contrast to most previous work, which typically considers the performance as a function of the model size, in this paper we empirically study the generalization performance as th… ▽ More

    Submitted 26 September, 2020; originally announced September 2020.

    Journal ref: Proceedings of the International Conference on Machine (ICML 2020)

  17. arXiv:1908.06843  [pdf, ps, other

    eess.SP cs.LG stat.ML

    ProSper -- A Python Library for Probabilistic Sparse Coding with Non-Standard Priors and Superpositions

    Authors: Georgios Exarchakis, Jörg Bornschein, Abdul-Saboor Sheikh, Zhenwen Dai, Marc Henniges, Jakob Drefs, Jörg Lücke

    Abstract: ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are e… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

  18. arXiv:1709.07116  [pdf, other

    cs.LG

    Variational Memory Addressing in Generative Models

    Authors: Jörg Bornschein, Andriy Mnih, Daniel Zoran, Danilo J. Rezende

    Abstract: Aiming to augment generative models with external memory, we interpret the output of a memory module with stochastic addressing as a conditional mixture distribution, where a read operation corresponds to sampling a discrete memory address and retrieving the corresponding content from memory. This perspective allows us to apply variational inference to memory addressing, which enables effective tr… ▽ More

    Submitted 20 September, 2017; originally announced September 2017.

  19. arXiv:1506.03877  [pdf, other

    cs.LG stat.ML

    Bidirectional Helmholtz Machines

    Authors: Jorg Bornschein, Samira Shabanian, Asja Fischer, Yoshua Bengio

    Abstract: Efficient unsupervised training and inference in deep generative models remains a challenging problem. One basic approach, called Helmholtz machine, involves training a top-down directed generative model together with a bottom-up auxiliary model used for approximate inference. Recent results indicate that better generative models can be obtained with better approximate inference procedures. Instea… ▽ More

    Submitted 24 May, 2016; v1 submitted 11 June, 2015; originally announced June 2015.

  20. arXiv:1502.04156  [pdf, other

    cs.LG

    Towards Biologically Plausible Deep Learning

    Authors: Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Thomas Mesnard, Zhouhan Lin

    Abstract: Neuroscientists have long criticised deep learning algorithms as incompatible with current knowledge of neurobiology. We explore more biologically plausible versions of deep representation learning, focusing here mostly on unsupervised learning but developing a learning mechanism that could account for supervised, unsupervised and reinforcement learning. The starting point is that the basic learni… ▽ More

    Submitted 8 August, 2016; v1 submitted 13 February, 2015; originally announced February 2015.

  21. arXiv:1406.2751  [pdf, other

    cs.LG

    Reweighted Wake-Sleep

    Authors: Jörg Bornschein, Yoshua Bengio

    Abstract: Training deep directed graphical models with many hidden variables and performing inference remains a major challenge. Helmholtz machines and deep belief networks are such models, and the wake-sleep algorithm has been proposed to train them. The wake-sleep algorithm relies on training not just the directed generative model but also a conditional generative model (the inference network) that runs… ▽ More

    Submitted 16 April, 2015; v1 submitted 10 June, 2014; originally announced June 2014.