Zum Hauptinhalt springen

Showing 1–29 of 29 results for author: Wortsman, M

.
  1. arXiv:2407.05872  [pdf, other

    cs.LG

    Scaling Exponents Across Parameterizations and Optimizers

    Authors: Katie Everett, Lechao Xiao, Mitchell Wortsman, Alexander A. Alemi, Roman Novak, Peter J. Liu, Izzeddin Gur, Jascha Sohl-Dickstein, Leslie Pack Kaelbling, Jaehoon Lee, Jeffrey Pennington

    Abstract: Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on parameterization by investigating a key assumption in prior work about the alignment between parameters and data and derive new theoretical results unde… ▽ More

    Submitted 16 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: 63 pages, International Conference on Machine Learning 2024

  2. arXiv:2406.19146  [pdf, other

    cs.LG cs.CL

    Resolving Discrepancies in Compute-Optimal Scaling of Language Models

    Authors: Tomer Porian, Mitchell Wortsman, Jenia Jitsev, Ludwig Schmidt, Yair Carmon

    Abstract: Kaplan et al. and Hoffmann et al. developed influential scaling laws for the optimal model size as a function of the compute budget, but these laws yield substantially different predictions. We explain the discrepancy by reproducing the Kaplan scaling law on two datasets (OpenWebText2 and RefinedWeb) and identifying three factors causing the difference: last layer computational cost, warmup durati… ▽ More

    Submitted 25 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Fixing bug in small models with tuned LR

  3. arXiv:2406.11794  [pdf, other

    cs.LG cs.CL

    DataComp-LM: In search of the next generation of training sets for language models

    Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

    Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.datacomp.ai/dclm/

  4. arXiv:2403.08540  [pdf, other

    cs.CL cs.LG

    Language models scale reliably with over-training and on downstream tasks

    Authors: Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Luca Soldaini, Alexandros G. Dimakis, Gabriel Ilharco, Pang Wei Koh, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

    Abstract: Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimately trained and evaluated. For instance, scaling is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime). In contr… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  5. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  6. arXiv:2309.14322  [pdf, other

    cs.LG

    Small-scale proxies for large-scale Transformer training instabilities

    Authors: Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

    Abstract: Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study train… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  7. arXiv:2309.08586  [pdf, other

    cs.CV cs.LG

    Replacing softmax with ReLU in Vision Transformers

    Authors: Mitchell Wortsman, Jaehoon Lee, Justin Gilmer, Simon Kornblith

    Abstract: Previous research observed accuracy degradation when replacing the attention softmax with a point-wise activation such as ReLU. In the context of vision transformers, we find that this degradation is mitigated when dividing by sequence length. Our experiments training small to large vision transformers on ImageNet-21k indicate that ReLU-attention can approach or match the performance of softmax-at… ▽ More

    Submitted 16 October, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

  8. arXiv:2308.01390  [pdf, other

    cs.CV cs.AI cs.LG

    OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

    Authors: Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt

    Abstract: We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind's Flamingo models. On seven vision-language datasets, OpenFlamingo models average between 80 - 89% of corresponding Flamingo performance. This technical report describes our models, training data, hyperpar… ▽ More

    Submitted 7 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  9. arXiv:2304.14108  [pdf, other

    cs.CV cs.CL cs.LG

    DataComp: In search of the next generation of multimodal datasets

    Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song , et al. (9 additional authors not shown)

    Abstract: Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Commo… ▽ More

    Submitted 20 October, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  10. arXiv:2304.13013  [pdf, other

    cs.LG cs.CV

    Stable and low-precision training for large-scale vision-language models

    Authors: Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, Ludwig Schmidt

    Abstract: We introduce new methods for 1) accelerating and 2) stabilizing training for large language-vision models. 1) For acceleration, we introduce SwitchBack, a linear layer for int8 quantized training which provides a speed-up of 13-25% while matching the performance of bfloat16 training within 0.1 percentage points for the 1B parameter CLIP ViT-Huge -- the largest int8 training to date. Our main focus… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023

  11. arXiv:2302.13602  [pdf, other

    cs.CV cs.LG

    The Role of Pre-training Data in Transfer Learning

    Authors: Rahim Entezari, Mitchell Wortsman, Olga Saukh, M. Moein Shariatnia, Hanie Sedghi, Ludwig Schmidt

    Abstract: The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high-accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3… ▽ More

    Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  12. Reproducible scaling laws for contrastive language-image learning

    Authors: Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, Jenia Jitsev

    Abstract: Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale experiments are becoming increasingly expensive. However, previous work on scaling laws has primarily used private data \& models or focused on… ▽ More

    Submitted 13 July, 2024; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: CVPR 2023. Version with minor extension. Original: https://openaccess.thecvf.com/content/CVPR2023/html/Cherti_Reproducible_Scaling_Laws_for_Contrastive_Language-Image_Learning_CVPR_2023_paper

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 2818-2829

  13. arXiv:2212.04089  [pdf, other

    cs.LG cs.CL cs.CV

    Editing Models with Task Arithmetic

    Authors: Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi

    Abstract: Changing how pre-trained models behave -- e.g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around \textit{task vectors}. A task vector specifies a direction in the weight space of a pr… ▽ More

    Submitted 31 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023)

  14. arXiv:2210.12517  [pdf, other

    cs.CL cs.LG

    Exploring The Landscape of Distributional Robustness for Question Answering Models

    Authors: Anas Awadalla, Mitchell Wortsman, Gabriel Ilharco, Sewon Min, Ian Magnusson, Hannaneh Hajishirzi, Ludwig Schmidt

    Abstract: We conduct a large empirical evaluation to investigate the landscape of distributional robustness in question answering. Our investigation spans over 350 models and 16 question answering datasets, including a diverse set of architectures, model sizes, and adaptation methods (e.g., fine-tuning, adapter tuning, in-context learning, etc.). We find that, in many cases, model variations do not affect r… ▽ More

    Submitted 22 October, 2022; originally announced October 2022.

    Comments: EMNLP Findings 2022

  15. arXiv:2210.11948  [pdf, other

    cs.LG

    lo-fi: distributed fine-tuning without communication

    Authors: Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

    Abstract: When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base… ▽ More

    Submitted 12 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

  16. arXiv:2210.08402  [pdf, other

    cs.CV cs.AI cs.LG

    LAION-5B: An open large-scale dataset for training next generation image-text models

    Authors: Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia Jitsev

    Abstract: Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classi… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022), Track on Datasets and Benchmarks. OpenReview: https://openreview.net/forum?id=M3Y74vmsMcY

  17. arXiv:2208.05592  [pdf, other

    cs.CV cs.LG

    Patching open-vocabulary models by interpolating weights

    Authors: Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

    Abstract: Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method tha… ▽ More

    Submitted 11 October, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  18. arXiv:2208.05516  [pdf, other

    cs.LG cs.CV

    Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

    Authors: Thao Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt

    Abstract: Web-crawled datasets have enabled remarkable generalization capabilities in recent image-text models such as CLIP (Contrastive Language-Image pre-training) or Flamingo, but little is known about the dataset creation processes. In this work, we introduce a testbed of six publicly available data sources - YFCC, LAION, Conceptual Captions, WIT, RedCaps, Shutterstock - to investigate how pre-training… ▽ More

    Submitted 31 January, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: Oral paper at NeurIPS 2022

  19. arXiv:2205.01397  [pdf, other

    cs.CV cs.CL cs.LG

    Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

    Authors: Alex Fang, Gabriel Ilharco, Mitchell Wortsman, Yuhao Wan, Vaishaal Shankar, Achal Dave, Ludwig Schmidt

    Abstract: Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. Con… ▽ More

    Submitted 22 August, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

  20. arXiv:2203.10421  [pdf, other

    cs.CV cs.LG cs.RO

    CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation

    Authors: Samir Yitzhak Gadre, Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt, Shuran Song

    Abstract: For robots to be generally useful, they must be able to find arbitrary objects described by people (i.e., be language-driven) even without expensive navigation training on in-domain data (i.e., perform zero-shot inference). We explore these capabilities in a unified setting: language-driven zero-shot object navigation (L-ZSON). Inspired by the recent success of open-vocabulary models for image cla… ▽ More

    Submitted 14 December, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

  21. arXiv:2203.05482  [pdf, other

    cs.LG cs.CL cs.CV

    Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

    Authors: Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

    Abstract: The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low… ▽ More

    Submitted 1 July, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

    Comments: ICML 2022. The last three authors contributed equally

  22. arXiv:2109.01903  [pdf, other

    cs.CV cs.LG

    Robust fine-tuning of zero-shot models

    Authors: Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt

    Abstract: Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning methods substantially improve accuracy on a given target distribution, they often reduce robustness to distribution shifts. We address this tension by introducing a simple a… ▽ More

    Submitted 21 June, 2022; v1 submitted 4 September, 2021; originally announced September 2021.

    Comments: CVPR 2022

  23. arXiv:2102.10472  [pdf, other

    cs.LG cs.CV

    Learning Neural Network Subspaces

    Authors: Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari

    Abstract: Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single… ▽ More

    Submitted 12 September, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

  24. arXiv:2012.00172  [pdf, other

    cs.LG cs.CV

    Deconstructing the Structure of Sparse Neural Networks

    Authors: Maxwell Van Gelder, Mitchell Wortsman, Kiana Ehsani

    Abstract: Although sparse neural networks have been studied extensively, the focus has been primarily on accuracy. In this work, we focus instead on network structure, and analyze three popular algorithms. We first measure performance when structure persists and weights are reset to a different random initialization, thereby extending experiments in Deconstructing Lottery Tickets (Zhou et al., 2019). This e… ▽ More

    Submitted 30 November, 2020; originally announced December 2020.

    Comments: 6 pages, 4 figures, Accepted to ML-Retrospectives, Surveys & Meta-Analyses @ NeurIPS 2020 Workshop

    ACM Class: I.2.6; I.5.1

  25. arXiv:2006.14769  [pdf, other

    cs.LG cs.AI stat.ML

    Supermasks in Superposition

    Authors: Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi

    Abstract: We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not… ▽ More

    Submitted 21 October, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 Camera Ready

  26. arXiv:2002.03231  [pdf, other

    cs.LG cs.CV stat.ML

    Soft Threshold Weight Reparameterization for Learnable Sparsity

    Authors: Aditya Kusupati, Vivek Ramanujan, Raghav Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi

    Abstract: Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparamete… ▽ More

    Submitted 22 June, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: 19 pages, 10 figures, Published at International Conference on Machine Learning (ICML) 2020

  27. arXiv:1911.13299  [pdf, other

    cs.CV cs.LG

    What's Hidden in a Randomly Weighted Neural Network?

    Authors: Vivek Ramanujan, Mitchell Wortsman, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari

    Abstract: Training a neural network is synonymous with learning the values of the weights. By contrast, we demonstrate that randomly weighted neural networks contain subnetworks which achieve impressive performance without ever training the weight values. Hidden in a randomly weighted Wide ResNet-50 we show that there is a subnetwork (with random weights) that is smaller than, but matches the performance of… ▽ More

    Submitted 30 March, 2020; v1 submitted 29 November, 2019; originally announced November 2019.

    Comments: Accepted to CVPR 2020

  28. arXiv:1906.00586  [pdf, other

    cs.LG cs.CV cs.NE

    Discovering Neural Wirings

    Authors: Mitchell Wortsman, Ali Farhadi, Mohammad Rastegari

    Abstract: The success of neural networks has driven a shift in focus from feature engineering to architecture engineering. However, successful networks today are constructed using a small and manually defined set of building blocks. Even in methods of neural architecture search (NAS) the network connectivity patterns are largely constrained. In this work we propose a method for discovering neural wirings. W… ▽ More

    Submitted 16 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: NeurIPS 2019 Camera Ready

  29. arXiv:1812.00971  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning

    Authors: Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

    Abstract: Learning is an inherently continuous phenomenon. When humans learn a new task there is no explicit distinction between training and inference. As we learn a task, we keep learning about it while performing the task. What we learn and how we learn it varies during different stages of learning. Learning how to learn and adapt is a key property that enables us to generalize effortlessly to new settin… ▽ More

    Submitted 26 March, 2019; v1 submitted 3 December, 2018; originally announced December 2018.