Search | arXiv e-print repository

Benchmarking Neural Network Training Algorithms

Authors: George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson

Abstract: Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a communi… ▽ More Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a community, we are currently unable to reliably identify training algorithm improvements, or even determine the state-of-the-art training algorithm. In this work, using concrete experiments, we argue that real progress in speeding up training requires new benchmarks that resolve three basic challenges faced by empirical comparisons of training algorithms: (1) how to decide when training is complete and precisely measure training time, (2) how to handle the sensitivity of measurements to exact workload details, and (3) how to fairly compare algorithms that require hyperparameter tuning. In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark. Our benchmark includes a set of workload variants that make it possible to detect benchmark submissions that are more robust to workload changes than current widely-used methods. Finally, we evaluate baseline submissions constructed using various optimizers that represent current practice, as well as other optimizers that have recently received attention in the literature. These baseline results collectively demonstrate the feasibility of our benchmark, show that non-trivial gaps between methods exist, and set a provisional state-of-the-art for future benchmark submissions to try and surpass. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 102 pages, 8 figures, 41 tables

arXiv:2111.03047 [pdf, other]

A deep ensemble approach to X-ray polarimetry

Authors: A. L. Peirson, R. W. Romani

Abstract: X-ray polarimetry will soon open a new window on the high energy universe with the launch of NASA's Imaging X-ray Polarimetry Explorer (IXPE). Polarimeters are currently limited by their track reconstruction algorithms, which typically use linear estimators and do not consider individual event quality. We present a modern deep learning method for maximizing the sensitivity of X-ray telescopic obse… ▽ More X-ray polarimetry will soon open a new window on the high energy universe with the launch of NASA's Imaging X-ray Polarimetry Explorer (IXPE). Polarimeters are currently limited by their track reconstruction algorithms, which typically use linear estimators and do not consider individual event quality. We present a modern deep learning method for maximizing the sensitivity of X-ray telescopic observations with imaging polarimeters, with a focus on the gas pixel detectors (GPDs) to be flown on IXPE. We use a weighted maximum likelihood combination of predictions from a deep ensemble of ResNets, trained on Monte Carlo event simulations. We derive and apply the optimal event weighting for maximizing the polarization signal-to-noise ratio (SNR) in track reconstruction algorithms. For typical power-law source spectra, our method improves on the current state of the art, providing a ~40% decrease in required exposure times for a given SNR. △ Less

Submitted 30 November, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

Comments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)

arXiv:2007.03828 [pdf, other]

doi 10.1016/j.nima.2020.164740

Deep Ensemble Analysis for Imaging X-ray Polarimetry

Authors: A. L. Peirson, R. W. Romani, H. L. Marshall, J. F. Steiner, L. Baldini

Abstract: We present a method for enhancing the sensitivity of X-ray telescopic observations with imaging polarimeters, with a focus on the gas pixel detectors (GPDs) to be flown on the Imaging X-ray Polarimetry Explorer (IXPE). Our analysis determines photoelectron directions, X-ray absorption points and X-ray energies for 1-9 keV event tracks, with estimates for both the statistical and model (reconstruct… ▽ More We present a method for enhancing the sensitivity of X-ray telescopic observations with imaging polarimeters, with a focus on the gas pixel detectors (GPDs) to be flown on the Imaging X-ray Polarimetry Explorer (IXPE). Our analysis determines photoelectron directions, X-ray absorption points and X-ray energies for 1-9 keV event tracks, with estimates for both the statistical and model (reconstruction) uncertainties. We use a weighted maximum likelihood combination of predictions from a deep ensemble of ResNet convolutional neural networks, trained on Monte Carlo event simulations. We define a figure of merit to compare the polarization bias-variance trade-off in track reconstruction algorithms. For power-law source spectra, our method improves on the current planned IXPE analysis (and previous deep learning approaches), providing ~45% increase in effective exposure times. For individual energies, our method produces 20-30% absolute improvements in modulation factor for simulated 100% polarized events, while keeping residual systematic modulation within 1 sigma of the finite sample minimum. Absorption point location and photon energy estimates are also significantly improved. We have validated our method with sample data from real GPD detectors. △ Less

Submitted 5 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: 18 pages, 9 figures. Accepted to Nuclear Instruments and Methods in Physics Research Section A, Sep 2020

arXiv:1806.04510 [pdf, ps, other]

Dank Learning: Generating Memes Using Deep Neural Networks

Authors: Abel L Peirson V, E Meltem Tolunay

Abstract: We introduce a novel meme generation system, which given any image can produce a humorous and relevant caption. Furthermore, the system can be conditioned on not only an image but also a user-defined label relating to the meme template, giving a handle to the user on meme content. The system uses a pretrained Inception-v3 network to return an image embedding which is passed to an attention-based d… ▽ More We introduce a novel meme generation system, which given any image can produce a humorous and relevant caption. Furthermore, the system can be conditioned on not only an image but also a user-defined label relating to the meme template, giving a handle to the user on meme content. The system uses a pretrained Inception-v3 network to return an image embedding which is passed to an attention-based deep-layer LSTM model producing the caption - inspired by the widely recognised Show and Tell Model. We implement a modified beam search to encourage diversity in the captions. We evaluate the quality of our model using perplexity and human assessment on both the quality of memes generated and whether they can be differentiated from real ones. Our model produces original memes that cannot on the whole be differentiated from real ones. △ Less

Submitted 7 June, 2018; originally announced June 2018.

Comments: Stanford CS 224n Project

Showing 1–4 of 4 results for author: Peirson, A L