Zum Hauptinhalt springen

Showing 1–20 of 20 results for author: Gesmundo, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.06103  [pdf, other

    cs.LG

    Composable Function-preserving Expansions for Transformer Architectures

    Authors: Andrea Gesmundo, Kaitlin Maile

    Abstract: Training state-of-the-art neural networks requires a high cost in terms of compute and time. Model scale is recognized to be a critical factor to achieve and improve the state-of-the-art. Increasing the scale of a neural network normally requires restarting from scratch by randomly initializing all the parameters of the model, as this implies a change of architecture's parameters that does not all… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  2. arXiv:2302.02721  [pdf, other

    cs.LG cs.AI

    Multipath agents for modular multitask ML systems

    Authors: Andrea Gesmundo

    Abstract: A standard ML model is commonly generated by a single method that specifies aspects such as architecture, initialization, training data and hyperparameters configuration. The presented work introduces a novel methodology allowing to define multiple methods as distinct agents. Agents can collaborate and compete to generate and improve ML models for a given tasks. The proposed methodology is demonst… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

  3. arXiv:2209.14745  [pdf, other

    cs.LG cs.AI cs.CV cs.MA cs.NE

    A Multiagent Framework for the Asynchronous and Collaborative Extension of Multitask ML Systems

    Authors: Andrea Gesmundo

    Abstract: The traditional ML development methodology does not enable a large number of contributors, each with distinct objectives, to work collectively on the creation and extension of a shared intelligent system. Enabling such a collaborative methodology can accelerate the rate of innovation, increase ML technologies accessibility and enable the emergence of novel capabilities. We believe that this novel… ▽ More

    Submitted 29 December, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2209.07326

  4. arXiv:2209.07326  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems

    Authors: Andrea Gesmundo

    Abstract: The traditional Machine Learning (ML) methodology requires to fragment the development and experimental process into disconnected iterations whose feedback is used to guide design or tuning choices. This methodology has multiple efficiency and scalability disadvantages, such as leading to spend significant resources into the creation of multiple trial models that do not contribute to the final sol… ▽ More

    Submitted 6 November, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.12755

  5. arXiv:2205.12755  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems

    Authors: Andrea Gesmundo, Jeff Dean

    Abstract: Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning. Though, state of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. Also, continual learning, that adds the temporal aspect to multitask, is… ▽ More

    Submitted 15 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  6. arXiv:2205.10937  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems

    Authors: Andrea Gesmundo, Jeff Dean

    Abstract: Most uses of machine learning today involve training a model from scratch for a particular task, or sometimes starting with a model pretrained on a related task and then fine-tuning on a downstream task. Both approaches offer limited knowledge transfer between different tasks, time-consuming human-driven customization to individual tasks and high computational costs especially when starting from r… ▽ More

    Submitted 25 May, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  7. arXiv:2203.17189  [pdf, other

    cs.LG cs.CL

    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

    Authors: Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen , et al. (18 additional authors not shown)

    Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  8. arXiv:2009.04381  [pdf, other

    cs.LG stat.ML

    Routing Networks with Co-training for Continual Learning

    Authors: Mark Collier, Efi Kokiopoulou, Andrea Gesmundo, Jesse Berent

    Abstract: The core challenge with continual learning is catastrophic forgetting, the phenomenon that when neural networks are trained on a sequence of tasks they rapidly forget previously learned tasks. It has been observed that catastrophic forgetting is most severe when tasks are dissimilar to each other. We propose the use of sparse routing networks for continual learning. For each input, these network a… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

    Comments: Presented at ICML Workshop on Continual Learning 2020

  9. arXiv:1911.11481  [pdf, other

    cs.LG stat.ML

    Ranking architectures using meta-learning

    Authors: Alina Dubatovka, Efi Kokiopoulou, Luciano Sbaiz, Andrea Gesmundo, Gabor Bartok, Jesse Berent

    Abstract: Neural architecture search has recently attracted lots of research efforts as it promises to automate the manual design of neural networks. However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: NeurIPS 2019 Meta-Learning workshop

  10. arXiv:1910.04915  [pdf, other

    cs.LG stat.ML

    Flexible Multi-task Networks by Learning Parameter Allocation

    Authors: Krzysztof Maziarz, Efi Kokiopoulou, Andrea Gesmundo, Luciano Sbaiz, Gabor Bartok, Jesse Berent

    Abstract: This paper proposes a novel learning method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can hurt performance. To address this issue, we propose a framework to learn fine-grained patterns of parameter sharing. Assuming that the network is composed of sev… ▽ More

    Submitted 18 July, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

  11. arXiv:1907.13223  [pdf, other

    cs.NE cs.LG q-bio.NC

    Temporal Coding in Spiking Neural Networks with Alpha Synaptic Function: Learning with Backpropagation

    Authors: Iulia M. Comsa, Krzysztof Potempa, Luca Versari, Thomas Fischbacher, Andrea Gesmundo, Jyrki Alakuijala

    Abstract: The timing of individual neuronal spikes is essential for biological brains to make fast responses to sensory stimuli. However, conventional artificial neural networks lack the intrinsic temporal coding ability present in biological networks. We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes. In classification tasks, the output of… ▽ More

    Submitted 16 November, 2020; v1 submitted 30 July, 2019; originally announced July 2019.

    Comments: Open-source code related to this paper is available at https://github.com/google/ihmehimmeli v2: Added references and added some clarifications for the methods

  12. arXiv:1906.08102  [pdf, other

    cs.LG stat.ML

    Transfer NAS: Knowledge Transfer between Search Spaces with Transformer Agents

    Authors: Zalán Borsos, Andrey Khorlin, Andrea Gesmundo

    Abstract: Recent advances in Neural Architecture Search (NAS) have produced state-of-the-art architectures on several tasks. NAS shifts the efforts of human experts from developing novel architectures directly to designing architecture search spaces and methods to explore them efficiently. The search space definition captures prior knowledge about the properties of the architectures and it is crucial for th… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

    Comments: 6th ICML Workshop on Automated Machine Learning

  13. arXiv:1902.05781  [pdf, other

    cs.LG stat.ML

    Fast Task-Aware Architecture Inference

    Authors: Efi Kokiopoulou, Anja Hauth, Luciano Sbaiz, Andrea Gesmundo, Gabor Bartok, Jesse Berent

    Abstract: Neural architecture search has been shown to hold great promise towards the automation of deep learning. However in spite of its potential, neural architecture search remains quite costly. To this point, we propose a novel gradient-based framework for efficient architecture search by sharing information across several tasks. We start by training many model architectures on several related (trainin… ▽ More

    Submitted 15 February, 2019; originally announced February 2019.

  14. arXiv:1902.00751  [pdf, other

    cs.LG cs.CL stat.ML

    Parameter-Efficient Transfer Learning for NLP

    Authors: Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly

    Abstract: Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can… ▽ More

    Submitted 13 June, 2019; v1 submitted 2 February, 2019; originally announced February 2019.

  15. arXiv:1812.10666  [pdf, other

    cs.LG stat.ML

    Neural Architecture Search Over a Graph Search Space

    Authors: Stanisław Jastrzębski, Quentin de Laroussilhe, Mingxing Tan, Xiao Ma, Neil Houlsby, Andrea Gesmundo

    Abstract: Neural Architecture Search (NAS) enabled the discovery of state-of-the-art architectures in many domains. However, the success of NAS depends on the definition of the search space. Current search spaces are defined as a static sequence of decisions and a set of available actions for each decision. Each possible sequence of actions defines an architecture. We propose a more expressive class of sear… ▽ More

    Submitted 31 July, 2019; v1 submitted 27 December, 2018; originally announced December 2018.

  16. arXiv:1811.09828  [pdf, other

    cs.LG cs.NE stat.ML

    Evolutionary-Neural Hybrid Agents for Architecture Search

    Authors: Krzysztof Maziarz, Mingxing Tan, Andrey Khorlin, Marin Georgiev, Andrea Gesmundo

    Abstract: Neural Architecture Search has shown potential to automate the design of neural networks. Deep Reinforcement Learning based agents can learn complex architectural patterns, as well as explore a vast and compositional search space. On the other hand, evolutionary algorithms offer higher sample efficiency, which is critical for such a resource intensive application. In order to capture the best of b… ▽ More

    Submitted 15 February, 2020; v1 submitted 24 November, 2018; originally announced November 2018.

  17. arXiv:1803.02780  [pdf, other

    cs.LG stat.ML

    Transfer Learning with Neural AutoML

    Authors: Catherine Wong, Neil Houlsby, Yifeng Lu, Andrea Gesmundo

    Abstract: We reduce the computational cost of Neural AutoML with transfer learning. AutoML relieves human effort by automating the design of ML algorithms. Neural AutoML has become popular for the design of deep learning architectures, however, this method has a high computation cost. To address this we propose Transfer Neural AutoML that uses knowledge from prior tasks to speed up network design. We extend… ▽ More

    Submitted 28 January, 2019; v1 submitted 7 March, 2018; originally announced March 2018.

  18. arXiv:1801.07537  [pdf, other

    cs.CL cs.AI

    Analyzing Language Learned by an Active Question Answering Agent

    Authors: Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang

    Abstract: We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017]. In ActiveQA, question answering is framed as a reinforcement learning task in which an agent sits between the user and a black box question-answering system. The agent learns to reformulate the user's questions to elicit the optimal answers. It probes the syste… ▽ More

    Submitted 23 January, 2018; originally announced January 2018.

    Comments: Emergent Communication Workshop, NIPS 2017

  19. arXiv:1710.10776  [pdf, other

    cs.AI cs.LG stat.ML

    Transfer Learning to Learn with Multitask Neural Model Search

    Authors: Catherine Wong, Andrea Gesmundo

    Abstract: Deep learning models require extensive architecture design exploration and hyperparameter optimization to perform well on a given task. The exploration of the model design space is often made by a human expert, and optimized using a combination of grid search and search heuristics over a large space of possible choices. Neural Architecture Search (NAS) is a Reinforcement Learning approach that has… ▽ More

    Submitted 30 October, 2017; originally announced October 2017.

  20. arXiv:1705.07830  [pdf, other

    cs.CL cs.AI

    Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

    Authors: Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang

    Abstract: We frame Question Answering (QA) as a Reinforcement Learning task, an approach that we call Active Question Answering. We propose an agent that sits between the user and a black box QA system and learns to reformulate questions to elicit the best possible answers. The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned… ▽ More

    Submitted 2 March, 2018; v1 submitted 22 May, 2017; originally announced May 2017.

    Journal ref: Sixth International Conference on Learning Representations (ICLR), 2018