Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Shallue, C J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2102.06356  [pdf, other

    cs.LG stat.ML

    A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

    Authors: Zachary Nado, Justin M. Gilmer, Christopher J. Shallue, Rohan Anil, George E. Dahl

    Abstract: Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether L… ▽ More

    Submitted 9 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

  2. arXiv:2011.00003  [pdf, other

    astro-ph.EP astro-ph.IM astro-ph.SR cs.LG

    Identifying Exoplanets with Deep Learning. IV. Removing Stellar Activity Signals from Radial Velocity Measurements Using Neural Networks

    Authors: Zoe L. de Beurs, Andrew Vanderburg, Christopher J. Shallue, Xavier Dumusque, Andrew Collier Cameron, Christopher Leet, Lars A. Buchhave, Rosario Cosentino, Adriano Ghedina, Raphaëlle D. Haywood, Nicholas Langellier, David W. Latham, Mercedes López-Morales, Michel Mayor, Giusi Micela, Timothy W. Milbourne, Annelies Mortier, Emilio Molinari, Francesco Pepe, David F. Phillips, Matteo Pinamonti, Giampaolo Piotto, Ken Rice, Dimitar Sasselov, Alessandro Sozzetti , et al. (2 additional authors not shown)

    Abstract: Exoplanet detection with precise radial velocity (RV) observations is currently limited by spurious RV signals introduced by stellar activity. We show that machine learning techniques such as linear regression and neural networks can effectively remove the activity signals (due to starspots/faculae) from RV observations. Previous efforts focused on carefully filtering out activity signals in time… ▽ More

    Submitted 13 June, 2022; v1 submitted 30 October, 2020; originally announced November 2020.

    Comments: 28 pages, 14 figures, Accepted for publication in the Astronomical Journal

  3. arXiv:1910.05446  [pdf, other

    cs.LG stat.ML

    On Empirical Comparisons of Optimizers for Deep Learning

    Authors: Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl

    Abstract: Selecting an optimizer is a central step in the contemporary deep learning pipeline. In this paper, we demonstrate the sensitivity of optimizer comparisons to the hyperparameter tuning protocol. Our findings suggest that the hyperparameter search space may be the single most important factor explaining the rankings obtained by recent empirical comparisons in the literature. In fact, we show that t… ▽ More

    Submitted 15 June, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

  4. arXiv:1907.05550  [pdf, other

    cs.LG

    Faster Neural Network Training with Data Echoing

    Authors: Dami Choi, Alexandre Passos, Christopher J. Shallue, George E. Dahl

    Abstract: In the twilight of Moore's law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training. However, earlier stages of the training pipeline, such as disk I/O and data preprocessing, do not run on accelerators. As accelerators continue to improve, these earlier stages will increasingly become the bottleneck. In this paper, we introduce "data echoing," which… ▽ More

    Submitted 7 May, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

  5. arXiv:1907.04164  [pdf, other

    cs.LG stat.ML

    Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

    Authors: Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse

    Abstract: Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critical batch size changes based on properties of the optimization algorithm, including acceleration and preconditioning, through two different lenses: large scale experiments, and analysis of a simple noi… ▽ More

    Submitted 28 October, 2019; v1 submitted 9 July, 2019; originally announced July 2019.

    Comments: NeurIPS 2019

  6. arXiv:1811.03600  [pdf, other

    cs.LG stat.ML

    Measuring the Effects of Data Parallelism on Neural Network Training

    Authors: Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl

    Abstract: Recent hardware developments have dramatically increased the scale of data parallelism available for neural network training. Among the simplest ways to harness next-generation hardware is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured by… ▽ More

    Submitted 18 July, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

    Journal ref: Journal of Machine Learning Research 20 (2019) 1-49

  7. arXiv:1806.04313  [pdf, other

    cs.CL cs.LG

    Embedding Text in Hyperbolic Spaces

    Authors: Bhuwan Dhingra, Christopher J. Shallue, Mohammad Norouzi, Andrew M. Dai, George E. Dahl

    Abstract: Natural language text exhibits hierarchical structure in a variety of respects. Ideally, we could incorporate our prior knowledge of this hierarchical structure into unsupervised learning algorithms that work on text data. Recent work by Nickel & Kiela (2017) proposed using hyperbolic instead of Euclidean embedding spaces to represent hierarchical data and demonstrated encouraging results when emb… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: TextGraphs 2018