Search | arXiv e-print repository

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

Abstract: We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-tr… ▽ More We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens. △ Less

Submitted 28 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.07781 [pdf, other]

doi 10.1109/LRA.2024.3396056

Estimating Visibility from Alternate Perspectives for Motion Planning with Occlusions

Authors: Barry Gilhuly, Armin Sadeghi, Stephen L. Smith

Abstract: Visibility is a crucial aspect of planning and control of autonomous vehicles (AV), particularly when navigating environments with occlusions. However, when an AV follows a trajectory with multiple occlusions, existing methods evaluate each occlusion individually, calculate a visibility cost for each, and rely on the planner to minimize the overall cost. This can result in conflicting priorities f… ▽ More Visibility is a crucial aspect of planning and control of autonomous vehicles (AV), particularly when navigating environments with occlusions. However, when an AV follows a trajectory with multiple occlusions, existing methods evaluate each occlusion individually, calculate a visibility cost for each, and rely on the planner to minimize the overall cost. This can result in conflicting priorities for the planner, as individual occlusion costs may appear to be in opposition. We solve this problem by creating an alternate perspective cost map that allows for an aggregate view of the occlusions in the environment. The value of each cell on the cost map is a measure of the amount of visual information that the vehicle can gain about the environment by visiting that location. Our proposed method identifies observation locations and occlusion targets drawn from both map data and sensor data. We show how to estimate an alternate perspective for each observation location and then combine all estimates into a single alternate perspective cost map for motion planning. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: This work has been submitted to the IEEE-RAL for possible publication

arXiv:2403.08295 [pdf, other]

Gemma: Open Models Based on Gemini Research and Technology

Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations. △ Less

Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.19427 [pdf, other]

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Authors: Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre

Abstract: Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama… ▽ More Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama-2 despite being trained on over 6 times fewer tokens. We also show that Griffin can extrapolate on sequences significantly longer than those seen during training. Our models match the hardware efficiency of Transformers during training, and during inference they have lower latency and significantly higher throughput. We scale Griffin up to 14B parameters, and explain how to shard our models for efficient distributed training. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 25 pages, 11 figures

arXiv:2401.01483 [pdf, other]

To Lead or to Follow? Adaptive Robot Task Planning in Human-Robot Collaboration

Authors: Ali Noormohammadi-Asl, Stephen L. Smith, Kerstin Dautenhahn

Abstract: Adaptive task planning is fundamental to ensuring effective and seamless human-robot collaboration. This paper introduces a robot task planning framework that takes into account both human leading/following preferences and performance, specifically focusing on task allocation and scheduling in collaborative settings. We present a proactive task allocation approach with three primary objectives: en… ▽ More Adaptive task planning is fundamental to ensuring effective and seamless human-robot collaboration. This paper introduces a robot task planning framework that takes into account both human leading/following preferences and performance, specifically focusing on task allocation and scheduling in collaborative settings. We present a proactive task allocation approach with three primary objectives: enhancing team performance, incorporating human preferences, and upholding a positive human perception of the robot and the collaborative experience. Through a user study, involving an autonomous mobile manipulator robot working alongside participants in a collaborative scenario, we confirm that the task planning framework successfully attains all three intended goals, thereby contributing to the advancement of adaptive task planning in human-robot collaboration. This paper mainly focuses on the first two objectives, and we discuss the third objective, participants' perception of the robot, tasks, and collaboration in a companion paper. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2401.01466 [pdf, other]

Human Leading or Following Preferences: Effects on Human Perception of the Robot and the Human-Robot Collaboration

Authors: Ali Noormohammadi-Asl, Kevin Fan, Stephen L. Smith, Kerstin Dautenhahn

Abstract: Achieving effective and seamless human-robot collaboration requires two key outcomes: enhanced team performance and fostering a positive human perception of both the robot and the collaboration. This paper investigates the capability of the proposed task planning framework to realize these objectives by integrating human leading/following preference and performance into its task allocation and sch… ▽ More Achieving effective and seamless human-robot collaboration requires two key outcomes: enhanced team performance and fostering a positive human perception of both the robot and the collaboration. This paper investigates the capability of the proposed task planning framework to realize these objectives by integrating human leading/following preference and performance into its task allocation and scheduling processes. We designed a collaborative scenario wherein the robot autonomously collaborates with participants. The outcomes of the user study indicate that the proactive task planning framework successfully attains the aforementioned goals. We also explore the impact of participants' leadership and followership styles on their collaboration. The results reveal intriguing relationships between these factors, which warrant further investigation in future studies. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.15177 [pdf, other]

Stochastic Data-Driven Predictive Control with Equivalence to Stochastic MPC

Authors: Ruiqi Li, John W. Simpson-Porco, Stephen L. Smith

Abstract: We propose a data-driven receding-horizon control method dealing with the chance-constrained output-tracking problem of unknown stochastic linear time-invariant (LTI) systems with partial state observation. The proposed method takes into account the statistics of the process noise, the measurement noise and the uncertain initial condition, following an analogous framework to Stochastic Model Predi… ▽ More We propose a data-driven receding-horizon control method dealing with the chance-constrained output-tracking problem of unknown stochastic linear time-invariant (LTI) systems with partial state observation. The proposed method takes into account the statistics of the process noise, the measurement noise and the uncertain initial condition, following an analogous framework to Stochastic Model Predictive Control (SMPC), but does not rely on the use of a parametric system model. As such, our receding-horizon algorithm produces a sequence of closed-loop control policies for predicted time steps, as opposed to a sequence of open-loop control actions. Under certain conditions, we establish that our proposed data-driven control method produces identical control inputs as that produced by the associated model-based SMPC. Simulation results on a grid-connected power converter are provided to illustrate the performance benefits of our methodology. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 20 pages, 4 figures. The extended version of a submission to IEEE Transactions on Automatic Control

arXiv:2312.07227 [pdf, other]

Scalarizing Multi-Objective Robot Planning Problems using Weighted Maximization

Authors: Nils Wilde, Stephen L. Smith, Javier Alonso-Mora

Abstract: When designing a motion planner for autonomous robots there are usually multiple objectives to be considered. However, a cost function that yields the desired trade-off between objectives is not easily obtainable. A common technique across many applications is to use a weighted sum of relevant objective functions and then carefully adapt the weights. However, this approach may not find all relevan… ▽ More When designing a motion planner for autonomous robots there are usually multiple objectives to be considered. However, a cost function that yields the desired trade-off between objectives is not easily obtainable. A common technique across many applications is to use a weighted sum of relevant objective functions and then carefully adapt the weights. However, this approach may not find all relevant trade-offs even in simple planning problems. Thus, we study an alternative method based on a weighted maximum of objectives. Such a cost function is more expressive than the weighted sum, and we show how it can be deployed in both continuous- and discrete-space motion planning problems. We propose a novel path planning algorithm for the proposed cost function and establish its correctness, and present heuristic adaptations that yield a practical runtime. In extensive simulation experiments, we demonstrate that the proposed cost function and algorithm are able to find a wider range of trade-offs between objectives (i.e., Pareto-optimal solutions) for various planning problems, showcasing its advantages in practice. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.05338 [pdf, other]

Minimizing Robot Digging Times to Retrieve Bins in Robotic-Based Compact Storage and Retrieval Systems

Authors: Anni Yue, Stephen L. Smith

Abstract: Robotic-based compact storage and retrieval systems provide high-density storage in distribution center and warehouse applications. In the system, items are stored in bins, and the bins are organized inside a three-dimensional grid. Robots move on top of the grid to retrieve and deliver bins. To retrieve a bin, a robot removes all bins above one by one with its gripper, called bin digging. The clo… ▽ More Robotic-based compact storage and retrieval systems provide high-density storage in distribution center and warehouse applications. In the system, items are stored in bins, and the bins are organized inside a three-dimensional grid. Robots move on top of the grid to retrieve and deliver bins. To retrieve a bin, a robot removes all bins above one by one with its gripper, called bin digging. The closer the target bin is to the top of the grid, the less digging is required to retrieve the bin. In this paper, we propose a policy to optimally arrange the bins in the grid while processing bin requests so that the most frequently accessed bins remain near the top of the grid. This improves the performance of the system and makes it responsive to changes in bin demand. Our solution approach identifies the optimal bin arrangement in the storage facility, initiates a transition to this optimal set-up, and subsequently ensures the ongoing maintenance of this arrangement for optimal performance. We perform extensive simulations on a custom-built discrete event model of the system. Our simulation results show that under the proposed policy more than half of the bins requested are located on top of the grid, reducing bin digging compared to existing policies. Compared to existing approaches, the proposed policy reduces the retrieval time of the requested bins by over 30% and the number of bin requests that exceed certain time thresholds by nearly 50%. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 35 pages, 16 figures, submitted to Transportation Science (INFORMS)

arXiv:2312.00668 [pdf, other]

A transform pair for bounded convex planar domains

Authors: Jesse Hulse, Loredana Lanzani, Stefan Llewellyn Smith, Elena Luca

Abstract: A new transform pair which can be used to solve mixed boundary value problems for Laplace's equation and the complex Helmholtz equation in bounded convex planar domains is presented. This work is an extension of Crowdy (2015, CMFT, 15, 655--687) where new transform techniques were developed for boundary value problems for Laplace's equation in circular domains. The key ingredient of the method is… ▽ More A new transform pair which can be used to solve mixed boundary value problems for Laplace's equation and the complex Helmholtz equation in bounded convex planar domains is presented. This work is an extension of Crowdy (2015, CMFT, 15, 655--687) where new transform techniques were developed for boundary value problems for Laplace's equation in circular domains. The key ingredient of the method is the analysis of the so called global relation which provides a coupling of integral transforms of the given boundary data and of the unknown boundary values. Three problems which involve mixed boundary conditions are solved in detail, as well as numerically implemented, to illustrate how to apply the new approach. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.17837 [pdf, other]

Anytime Replanning of Robot Coverage Paths for Partially Unknown Environments

Authors: Megnath Ramesh, Frank Imeson, Baris Fidan, Stephen L. Smith

Abstract: In this paper, we propose a method to replan coverage paths for a robot operating in an environment with initially unknown static obstacles. Existing coverage approaches reduce coverage time by covering along the minimum number of coverage lines (straight-line paths). However, recomputing such paths online can be computationally expensive resulting in robot stoppages that increase coverage time. A… ▽ More In this paper, we propose a method to replan coverage paths for a robot operating in an environment with initially unknown static obstacles. Existing coverage approaches reduce coverage time by covering along the minimum number of coverage lines (straight-line paths). However, recomputing such paths online can be computationally expensive resulting in robot stoppages that increase coverage time. A naive alternative is greedy detour replanning, i.e., replanning with minimum deviation from the initial path, which is efficient to compute but may result in unnecessary detours. In this work, we propose an anytime coverage replanning approach named OARP-Replan that performs near-optimal replans to an interrupted coverage path within a given time budget. We do this by solving linear relaxations of integer linear programs (ILPs) to identify sections of the interrupted path that can be optimally replanned within the time budget. We validate OARP-Replan in simulation and perform comparisons against a greedy detour replanner and other state-of-the-art coverage planners. We also demonstrate OARP-Replan in experiments using an industrial-level autonomous robot. △ Less

Submitted 7 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 16 pages, 18 figures, Paper submitted to IEEE T-RO

arXiv:2311.00136 [pdf, other]

Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data

Authors: Antonis Antoniades, Yiyi Yu, Joseph Canzano, William Wang, Spencer LaVere Smith

Abstract: State-of-the-art systems neuroscience experiments yield large-scale multimodal data, and these data sets require new tools for analysis. Inspired by the success of large pretrained models in vision and language domains, we reframe the analysis of large-scale, cellular-resolution neuronal spiking data into an autoregressive spatiotemporal generation problem. Neuroformer is a multimodal, multitask g… ▽ More State-of-the-art systems neuroscience experiments yield large-scale multimodal data, and these data sets require new tools for analysis. Inspired by the success of large pretrained models in vision and language domains, we reframe the analysis of large-scale, cellular-resolution neuronal spiking data into an autoregressive spatiotemporal generation problem. Neuroformer is a multimodal, multitask generative pretrained transformer (GPT) model that is specifically designed to handle the intricacies of data in systems neuroscience. It scales linearly with feature size, can process an arbitrary number of modalities, and is adaptable to downstream tasks, such as predicting behavior. We first trained Neuroformer on simulated datasets, and found that it both accurately predicted simulated neuronal circuit activity, and also intrinsically inferred the underlying neural circuit connectivity, including direction. When pretrained to decode neural responses, the model predicted the behavior of a mouse with only few-shot fine-tuning, suggesting that the model begins learning how to do so directly from the neural representations themselves, without any explicit supervision. We used an ablation study to show that joint training on neuronal responses and behavior boosted performance, highlighting the model's ability to associate behavioral and neural representations in an unsupervised manner. These findings show that Neuroformer can analyze neural datasets and their emergent properties, informing the development of models and hypotheses associated with the brain. △ Less

Submitted 15 March, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: 9 pages for main paper. 22 pages in total. 13 figures, 1 table

arXiv:2310.16764 [pdf, other]

ConvNets Match Vision Transformers at Scale

Authors: Samuel L. Smith, Andrew Brock, Leonard Berrada, Soham De

Abstract: Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale. We challenge this belief by evaluating a performant ConvNet architecture pre-trained on JFT-4B, a large labelled dataset of images often used for training foundation models. We consider pre-training compute budge… ▽ More Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale. We challenge this belief by evaluating a performant ConvNet architecture pre-trained on JFT-4B, a large labelled dataset of images often used for training foundation models. We consider pre-training compute budgets between 0.4k and 110k TPU-v4 core compute hours, and train a series of networks of increasing depth and width from the NFNet model family. We observe a log-log scaling law between held out loss and compute budget. After fine-tuning on ImageNet, NFNets match the reported performance of Vision Transformers with comparable compute budgets. Our strongest fine-tuned model achieves a Top-1 accuracy of 90.4%. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.10502 [pdf, other]

Adaptive Robot Assistance: Expertise and Influence in Multi-User Task Planning

Authors: Abhinav Dahiya, Stephen L. Smith

Abstract: This paper addresses the challenge of enabling a single robot to effectively assist multiple humans in decision-making for task planning domains. We introduce a comprehensive framework designed to enhance overall team performance by considering both human expertise in making the optimal decisions and robot influence on human decision-making. Our model integrates these factors seamlessly within the… ▽ More This paper addresses the challenge of enabling a single robot to effectively assist multiple humans in decision-making for task planning domains. We introduce a comprehensive framework designed to enhance overall team performance by considering both human expertise in making the optimal decisions and robot influence on human decision-making. Our model integrates these factors seamlessly within the task-planning domain, formulating the problem as a partially observable Markov decision process (POMDP) while treating expertise and influence as unobservable components of the system state. To solve for the robot's actions in such systems, we propose an efficient Attention-Switching policy. This policy capitalizes on the inherent structure of such systems, solving multiple smaller POMDPs to generate heuristics for prioritizing interactions with different human teammates, thereby reducing the state space and improving scalability. Our empirical results on a simulated kit fulfillment task demonstrate improved team performance when the robot's policy accounts for both expertise and influence. This research represents a significant step forward in the field of adaptive robot assistance, paving the way for integration into cost-effective small and mid-scale industries, where substantial investments in robotic infrastructure may not be economically viable. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 7 pages, 5 figures

arXiv:2310.06463 [pdf, other]

doi 10.1051/epjconf/202429300038

IAS/CEA Evolution of Dust in Nearby Galaxies (ICED): the spatially-resolved dust properties of NGC4254

Authors: L. Pantoni, R. Adam, P. Ade, H. Ajeddig, P. André, E. Artis, H. Aussel, M. Baes, A. Beelen, A. Benoît, S. Berta, L. Bing, O. Bourrion, M. Calvo, A. Catalano, M. De Petris, F. -X. Désert, S. Doyle, E. F. C. Driessen, G. Ejlali, F. Galliano, A. Gomez, J. Goupy, A. P. Jones, C. Hanser , et al. (35 additional authors not shown)

Abstract: We present the first preliminary results of the project \textit{ICED}, focusing on the face-on galaxy NGC4254. We use the millimetre maps observed with NIKA2 at IRAM-30m, as part of the IMEGIN Guaranteed Time Large Program, and of a wide collection of ancillary data (multi-wavelength photometry and gas phase spectral lines) that are publicly available. We derive the global and local properties of… ▽ More We present the first preliminary results of the project \textit{ICED}, focusing on the face-on galaxy NGC4254. We use the millimetre maps observed with NIKA2 at IRAM-30m, as part of the IMEGIN Guaranteed Time Large Program, and of a wide collection of ancillary data (multi-wavelength photometry and gas phase spectral lines) that are publicly available. We derive the global and local properties of interstellar dust grains through infrared-to-radio spectral energy distribution fitting, using the hierarchical Bayesian code HerBIE, which includes the grain properties of the state-of-the-art dust model, THEMIS. Our method allows us to get the following dust parameters: dust mass, average interstellar radiation field, and fraction of small grains. Also, it is effective in retrieving the intrinsic correlations between dust parameters and interstellar medium properties. We find an evident anti-correlation between the interstellar radiation field and the fraction of small grains in the centre of NGC4254, meaning that, at strong radiation field intensities, very small amorphous carbon grains are efficiently destroyed by the ultra-violet photons coming from newly formed stars, through photo-desorption and sublimation. We observe a flattening of the anti-correlation at larger radial distances, which may be driven by the steep metallicity gradient measured in NGC4254. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: to appear in Proc. of the mm Universe 2023 conference, Grenoble (France), June 2023, published by F. Mayet et al. (Eds), EPJ Web of conferences, EDP Sciences

Journal ref: EPJ Web of Conferences 293 (2024) 00038

arXiv:2310.03428 [pdf, other]

doi 10.1051/epjconf/202429300016

Constraining Millimeter Dust Emission in Nearby Galaxies with NIKA2: the case of NGC2146 and NGC2976

Authors: G. Ejlali, R. Adam, P. Ade, H. Ajeddig, P. André, E. Artis, H. Aussel, M. Baes, A. Beelen, Benoît, S. Berta, L. Bing, O. Bourrion, M. Calvo, A. Catalano, M. De Petris, F. -X. Désert, S. Doyle, E. F. C. Driessen, F. Galliano, A. Gomez, J. Goupy, A. P. Jones, C. Hanser, A. Hughes , et al. (35 additional authors not shown)

Abstract: This study presents the first millimeter continuum mapping observations of two nearby galaxies, the starburst spiral galaxy NGC2146 and the dwarf galaxy NGC2976, at 1.15 mm and 2 mm using the NIKA2 camera on the IRAM 30m telescope, as part of the Guaranteed Time Large Project IMEGIN. These observations provide robust resolved information about the physical properties of dust in nearby galaxies by… ▽ More This study presents the first millimeter continuum mapping observations of two nearby galaxies, the starburst spiral galaxy NGC2146 and the dwarf galaxy NGC2976, at 1.15 mm and 2 mm using the NIKA2 camera on the IRAM 30m telescope, as part of the Guaranteed Time Large Project IMEGIN. These observations provide robust resolved information about the physical properties of dust in nearby galaxies by constraining their FIR-radio SED in the millimeter domain. After subtracting the contribution from the CO line emission, the SEDs are modeled spatially using a Bayesian approach. Maps of dust mass surface density, temperature, emissivity index, and thermal radio component of the galaxies are presented, allowing for a study of the relations between the dust properties and star formation activity (using observations at 24$μ$m as a tracer). We report that dust temperature is correlated with star formation rate in both galaxies. The effect of star formation activity on dust temperature is stronger in NGC2976, an indication of the thinner interstellar medium of dwarf galaxies. Moreover, an anti-correlation trend is reported between the dust emissivity index and temperature in both galaxies. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: To appear in Proc. of the mm Universe 2023 conference, Grenoble (France), June 2023, published by F. Mayet et al. (Eds), EPJ Web of conferences, EDP Sciences

Journal ref: EPJ Web of conferences 293 (2024) 00016

arXiv:2308.10888 [pdf, other]

Unlocking Accuracy and Fairness in Differentially Private Image Classification

Authors: Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle

Abstract: Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal privacy guarantees. However, compared to their non-private counterparts, models trained with DP often have significantly reduced accuracy. Private classifiers are al… ▽ More Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal privacy guarantees. However, compared to their non-private counterparts, models trained with DP often have significantly reduced accuracy. Private classifiers are also believed to exhibit larger performance disparities across subpopulations, raising fairness concerns. The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry. Here we show that pre-trained foundation models fine-tuned with DP can achieve similar accuracy to non-private classifiers, even in the presence of significant distribution shifts between pre-training data and downstream tasks. We achieve private accuracies within a few percent of the non-private state of the art across four datasets, including two medical imaging benchmarks. Furthermore, our private medical classifiers do not exhibit larger performance disparities across demographic groups than non-private models. This milestone to make DP training a practical and reliable technology has the potential to widely enable machine learning practitioners to train safely on sensitive datasets while protecting individuals' privacy. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2307.11888 [pdf, other]

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

Authors: Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith

Abstract: Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling. Examples of such architectures include state-space models (SSMs) like S4, LRU, and Mamba: recently proposed models that achieve promising performance on text, genetics, and other data that require long-range reasoning. Despite experimental evidence hig… ▽ More Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling. Examples of such architectures include state-space models (SSMs) like S4, LRU, and Mamba: recently proposed models that achieve promising performance on text, genetics, and other data that require long-range reasoning. Despite experimental evidence highlighting these architectures' effectiveness and computational efficiency, their expressive power remains relatively unexplored, especially in connection to specific choices crucial in practice - e.g., carefully designed initialization distribution and potential use of complex numbers. In this paper, we show that combining MLPs with both real or complex linear diagonal recurrences leads to arbitrarily precise approximation of regular causal sequence-to-sequence maps. At the heart of our proof, we rely on a separation of concerns: the linear RNN provides a lossless encoding of the input sequence, and the MLP performs non-linear processing on this encoding. While we show that real diagonal linear recurrences are enough to achieve universality in this architecture, we prove that employing complex eigenvalues near unit disk - i.e., empirically the most successful strategy in S4 - greatly helps the RNN in storing information. We connect this finding with the vanishing gradient issue and provide experiments supporting our claims. △ Less

Submitted 5 June, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: v1: Accepted at HLD 2023 Workshop @ICML; v2: Preprint; v3: ICML version

arXiv:2307.11192 [pdf, other]

Adapting to Human Preferences to Lead or Follow in Human-Robot Collaboration: A System Evaluation

Authors: Ali Noormohammadi-Asl, Ali Ayub, Stephen L. Smith, Kerstin Dautenhahn

Abstract: With the introduction of collaborative robots, humans and robots can now work together in close proximity and share the same workspace. However, this collaboration presents various challenges that need to be addressed to ensure seamless cooperation between the agents. This paper focuses on task planning for human-robot collaboration, taking into account the human's performance and their preference… ▽ More With the introduction of collaborative robots, humans and robots can now work together in close proximity and share the same workspace. However, this collaboration presents various challenges that need to be addressed to ensure seamless cooperation between the agents. This paper focuses on task planning for human-robot collaboration, taking into account the human's performance and their preference for following or leading. Unlike conventional task allocation methods, the proposed system allows both the robot and human to select and assign tasks to each other. Our previous studies evaluated the proposed framework in a computer simulation environment. This paper extends the research by implementing the algorithm in a real scenario where a human collaborates with a Fetch mobile manipulator robot. We briefly describe the experimental setup, procedure and implementation of the planned user study. As a first step, in this paper, we report on a system evaluation study where the experimenter enacted different possible behaviours in terms of leader/follower preferences that can occur in a user study. Results show that the robot can adapt and respond appropriately to different human agent behaviours, enacted by the experimenter. A future user study will evaluate the system with human participants. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.04674 [pdf, other]

Optimal Robot Path Planning In a Collaborative Human-Robot Team with Intermittent Human Availability

Authors: Abhinav Dahiya, Stephen L. Smith

Abstract: This paper presents a solution for the problem of optimal planning for a robot in a collaborative human-robot team, where the human supervisor is intermittently available to assist the robot in completing tasks more quickly. Specifically, we address the challenge of computing the fastest path between two configurations in an environment with time constraints on how long the robot can wait for assi… ▽ More This paper presents a solution for the problem of optimal planning for a robot in a collaborative human-robot team, where the human supervisor is intermittently available to assist the robot in completing tasks more quickly. Specifically, we address the challenge of computing the fastest path between two configurations in an environment with time constraints on how long the robot can wait for assistance. To solve this problem, we propose a novel approach that utilizes the concepts of budget and critical departure times, which enables us to obtain optimal solutions while scaling to larger problem instances than existing methods. We demonstrate the effectiveness of our approach by comparing it with several baseline algorithms on a city road network and analyzing the quality of the solutions obtained. Our work contributes to the field of robot planning by addressing the critical issue of incorporating human assistance and environmental restrictions, which has significant implications for real-world applications. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: 9 pages, 7 figures, IEEE ROMAN 2023

arXiv:2307.03984 [pdf, other]

doi 10.1109/LRA.2023.3295251

Optimizing Task Waiting Times in Dynamic Vehicle Routing

Authors: Alexander Botros, Barry Gilhuly, Nils Wilde, Armin Sadeghi, Javier Alonso-Mora, Stephen L. Smith

Abstract: We study the problem of deploying a fleet of mobile robots to service tasks that arrive stochastically over time and at random locations in an environment. This is known as the Dynamic Vehicle Routing Problem (DVRP) and requires robots to allocate incoming tasks among themselves and find an optimal sequence for each robot. State-of-the-art approaches only consider average wait times and focus on h… ▽ More We study the problem of deploying a fleet of mobile robots to service tasks that arrive stochastically over time and at random locations in an environment. This is known as the Dynamic Vehicle Routing Problem (DVRP) and requires robots to allocate incoming tasks among themselves and find an optimal sequence for each robot. State-of-the-art approaches only consider average wait times and focus on high-load scenarios where the arrival rate of tasks approaches the limit of what can be handled by the robots while keeping the queue of unserviced tasks bounded, i.e., stable. To ensure stability, these approaches repeatedly compute minimum distance tours over a set of newly arrived tasks. This paper is aimed at addressing the missing policies for moderate-load scenarios, where quality of service can be improved by prioritizing long-waiting tasks. We introduce a novel DVRP policy based on a cost function that takes the $p$-norm over accumulated wait times and show it guarantees stability even in high-load scenarios. We demonstrate that the proposed policy outperforms the state-of-the-art in both mean and $95^{th}$ percentile wait times in moderate-load scenarios through simulation experiments in the Euclidean plane as well as using real-world data for city scale service requests. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)

MSC Class: 68M20 ACM Class: J.2

arXiv:2306.16501 [pdf, other]

On the Impact of Interruptions During Multi-Robot Supervision Tasks

Authors: Abhinav Dahiya, Yifan Cai, Oliver Schneider, Stephen L. Smith

Abstract: Human supervisors in multi-robot systems are primarily responsible for monitoring robots, but can also be assigned with secondary tasks. These tasks can act as interruptions and can be categorized as either intrinsic, i.e., being directly related to the monitoring task, or extrinsic, i.e., being unrelated. In this paper, we investigate the impact of these two types of interruptions through a user… ▽ More Human supervisors in multi-robot systems are primarily responsible for monitoring robots, but can also be assigned with secondary tasks. These tasks can act as interruptions and can be categorized as either intrinsic, i.e., being directly related to the monitoring task, or extrinsic, i.e., being unrelated. In this paper, we investigate the impact of these two types of interruptions through a user study ($N=39$), where participants monitor a number of remote mobile robots while intermittently being interrupted by either a robot fault correction task (intrinsic) or a messaging task (extrinsic). We find that task performance of participants does not change significantly with the interruptions but depends greatly on the number of robots. However, interruptions result in an increase in perceived workload, and extrinsic interruptions have a more negative effect on workload across all NASA-TLX scales. Participants also reported switching between extrinsic interruptions and the primary task to be more difficult compared to the intrinsic interruption case. Statistical significance of these results is confirmed using ANOVA and one-sample t-test. These findings suggest that when deciding task assignment in such supervision systems, one should limit interruptions from secondary tasks, especially extrinsic ones, in order to limit user workload. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: 7 pages, 10 figures, 2 tables, ICRA 2023

arXiv:2304.02692 [pdf, other]

A Unified Approach to Optimally Solving Sensor Scheduling and Sensor Selection Problems in Kalman Filtering

Authors: Shamak Dutta, Nils Wilde, Stephen L. Smith

Abstract: We consider a general form of the sensor scheduling problem for state estimation of linear dynamical systems, which involves selecting sensors that minimize the trace of the Kalman filter error covariance (weighted by a positive semidefinite matrix) subject to polyhedral constraints on the selected sensors. This general form captures several well-studied problems including sensor placement, sensor… ▽ More We consider a general form of the sensor scheduling problem for state estimation of linear dynamical systems, which involves selecting sensors that minimize the trace of the Kalman filter error covariance (weighted by a positive semidefinite matrix) subject to polyhedral constraints on the selected sensors. This general form captures several well-studied problems including sensor placement, sensor scheduling with budget constraints, and Linear Quadratic Gaussian (LQG) control and sensing co-design. We present a mixed integer optimization approach that is derived by exploiting the optimality of the Kalman filter. While existing work has focused on approximate methods to specific problem variants, our work provides a unified approach to computing optimal solutions to the general version of sensor scheduling. In simulation, we show this approach finds optimal solutions for systems with 30 to 50 states in seconds. △ Less

Submitted 11 December, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

arXiv:2303.08935 [pdf, other]

Multi-Robot Persistent Monitoring: Minimizing Latency and Number of Robots with Recharging Constraints

Authors: Ahmad Bilal Asghar, Shreyas Sundaram, Stephen L. Smith

Abstract: In this paper we study multi-robot path planning for persistent monitoring tasks. We consider the case where robots have a limited battery capacity with a discharge time $D$. We represent the areas to be monitored as the vertices of a weighted graph. For each vertex, there is a constraint on the maximum allowable time between robot visits, called the latency. The objective is to find the minimum n… ▽ More In this paper we study multi-robot path planning for persistent monitoring tasks. We consider the case where robots have a limited battery capacity with a discharge time $D$. We represent the areas to be monitored as the vertices of a weighted graph. For each vertex, there is a constraint on the maximum allowable time between robot visits, called the latency. The objective is to find the minimum number of robots that can satisfy these latency constraints while also ensuring that the robots periodically charge at a recharging depot. The decision version of this problem is known to be PSPACE-complete. We present a $O(\frac{\log D}{\log \log D}\log ρ)$ approximation algorithm for the problem where $ρ$ is the ratio of the maximum and the minimum latency constraints. We also present an orienteering based heuristic to solve the problem and show empirically that it typically provides higher quality solutions than the approximation algorithm. We extend our results to provide an algorithm for the problem of minimizing the maximum weighted latency given a fixed number of robots. We evaluate our algorithms on large problem instances in a patrolling scenario and in a wildfire monitoring application. We also compare the algorithms with an existing solver on benchmark instances. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 13 pages, 10 fiugres. arXiv admin note: substantial text overlap with arXiv:1903.06105

arXiv:2303.06349 [pdf, other]

Resurrecting Recurrent Neural Networks for Long Sequences

Authors: Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

Abstract: Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important diff… ▽ More Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important differences that make it unclear where their performance boost over RNNs comes from. In this paper, we show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while also matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and ensuring proper normalization of the forward pass. Our results provide new insights on the origins of the impressive performance of deep SSMs, while also introducing an RNN block called the Linear Recurrent Unit that matches both their performance on the Long Range Arena benchmark and their computational efficiency. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: 30 pages, 9 figures

arXiv:2302.13861 [pdf, other]

Differentially Private Diffusion Models Generate Useful Synthetic Images

Authors: Sahra Ghalebikesabi, Leonard Berrada, Sven Gowal, Ira Ktena, Robert Stanforth, Jamie Hayes, Soham De, Samuel L. Smith, Olivia Wiles, Borja Balle

Abstract: The ability to generate privacy-preserving synthetic versions of sensitive image datasets could unlock numerous ML applications currently constrained by data availability. Due to their astonishing image generation quality, diffusion models are a prime candidate for generating high-quality synthetic data. However, recent studies have found that, by default, the outputs of some diffusion models do n… ▽ More The ability to generate privacy-preserving synthetic versions of sensitive image datasets could unlock numerous ML applications currently constrained by data availability. Due to their astonishing image generation quality, diffusion models are a prime candidate for generating high-quality synthetic data. However, recent studies have found that, by default, the outputs of some diffusion models do not preserve training data privacy. By privately fine-tuning ImageNet pre-trained diffusion models with more than 80M parameters, we obtain SOTA results on CIFAR-10 and Camelyon17 in terms of both FID and the accuracy of downstream classifiers trained on synthetic data. We decrease the SOTA FID on CIFAR-10 from 26.2 to 9.8, and increase the accuracy from 51.0% to 88.0%. On synthetic data from Camelyon17, we achieve a downstream accuracy of 91.1% which is close to the SOTA of 96.5% when training on the real data. We leverage the ability of generative models to create infinite amounts of data to maximise the downstream prediction performance, and further show how to use synthetic data for hyperparameter tuning. Our results demonstrate that diffusion models fine-tuned with differential privacy can produce useful and provably private synthetic data, even in applications with significant distribution shift between the pre-training and fine-tuning distributions. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.11601 [pdf, other]

doi 10.1109/ICRA48891.2023.10161044

Real-Time Navigation for Autonomous Surface Vehicles In Ice-Covered Waters

Authors: Rodrigue de Schaetzen, Alexander Botros, Robert Gash, Kevin Murrant, Stephen L. Smith

Abstract: Vessel transit in ice-covered waters poses unique challenges in safe and efficient motion planning. When the concentration of ice is high, it may not be possible to find collision-free trajectories. Instead, ice can be pushed out of the way if it is small or if contact occurs near the edge of the ice. In this work, we propose a real-time navigation framework that minimizes collisions with ice and… ▽ More Vessel transit in ice-covered waters poses unique challenges in safe and efficient motion planning. When the concentration of ice is high, it may not be possible to find collision-free trajectories. Instead, ice can be pushed out of the way if it is small or if contact occurs near the edge of the ice. In this work, we propose a real-time navigation framework that minimizes collisions with ice and distance travelled by the vessel. We exploit a lattice-based planner with a cost that captures the ship interaction with ice. To address the dynamic nature of the environment, we plan motion in a receding horizon manner based on updated vessel and ice state information. Further, we present a novel planning heuristic for evaluating the cost-to-go, which is applicable to navigation in a channel without a fixed goal location. The performance of our planner is evaluated across several levels of ice concentration both in simulated and in real-world experiments. △ Less

Submitted 23 February, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: 7 pages, 8 figures

arXiv:2302.10322 [pdf, other]

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Authors: Bobby He, James Martens, Guodong Zhang, Aleksandar Botev, Andrew Brock, Samuel L Smith, Yee Whye Teh

Abstract: Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which… ▽ More Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which we define as networks without skips or normalisation). However, these approaches are incompatible with the self-attention layers present in transformers, whose kernels are intrinsically more complicated to analyse and control. And so the question remains: is it possible to train deep vanilla transformers? We answer this question in the affirmative by designing several approaches that use combinations of parameter initialisations, bias matrices and location-dependent rescaling to achieve faithful signal propagation in vanilla transformers. Our methods address various intricacies specific to signal propagation in transformers, including the interaction with positional encoding and causal masking. In experiments on WikiText-103 and C4, our approaches enable deep transformers without normalisation to train at speeds matching their standard counterparts, and deep vanilla transformers to reach the same performance as standard ones after about 5 times more iterations. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: ICLR 2023

arXiv:2212.05286 [pdf, other]

A Survey of Multi-Agent Human-Robot Interaction Systems

Authors: Abhinav Dahiya, Alexander M. Aroyo, Kerstin Dautenhahn, Stephen L. Smith

Abstract: This article presents a survey of literature in the area of Human-Robot Interaction (HRI), specifically on systems containing more than two agents (i.e., having multiple humans and/or multiple robots). We identify three core aspects of ``Multi-agent" HRI systems that are useful for understanding how these systems differ from dyadic systems and from one another. These are the Team structure, Intera… ▽ More This article presents a survey of literature in the area of Human-Robot Interaction (HRI), specifically on systems containing more than two agents (i.e., having multiple humans and/or multiple robots). We identify three core aspects of ``Multi-agent" HRI systems that are useful for understanding how these systems differ from dyadic systems and from one another. These are the Team structure, Interaction style among agents, and the system's Computational characteristics. Under these core aspects, we present five attributes of HRI systems, namely Team size, Team composition, Interaction model, Communication modalities, and Robot control. These attributes are used to characterize and distinguish one system from another. We populate resulting categories with examples from recent literature along with a brief discussion of their applications and analyze how these attributes differ from the case of dyadic human-robot systems. We summarize key observations from the current literature, and identify challenges and promising areas for future research in this domain. In order to realize the vision of robots being part of the society and interacting seamlessly with humans, there is a need to expand research on multi-human -- multi-robot systems. Not only do these systems require coordination among several agents, they also involve multi-agent and indirect interactions which are absent from dyadic HRI systems. Adding multiple agents in HRI systems requires advanced interaction schemes, behavior understanding and control methods to allow natural interactions among humans and robots. In addition, research on human behavioral understanding in mixed human-robot teams also requires more attention. This will help formulate and implement effective robot control policies in HRI systems with large numbers of heterogeneous robots and humans; a team composition reflecting many real-world scenarios. △ Less

Submitted 10 December, 2022; originally announced December 2022.

Comments: 23 pages, 7 figures

arXiv:2210.08107 [pdf, other]

Approximation Algorithms for Robot Tours in Random Fields with Guaranteed Estimation Accuracy

Authors: Shamak Dutta, Nils Wilde, Pratap Tokekar, Stephen L. Smith

Abstract: We study the sample placement and shortest tour problem for robots tasked with mapping environmental phenomena modeled as stationary random fields. The objective is to minimize the resources used (samples or tour length) while guaranteeing estimation accuracy. We give approximation algorithms for both problems in convex environments. These improve previously known results, both in terms of theoret… ▽ More We study the sample placement and shortest tour problem for robots tasked with mapping environmental phenomena modeled as stationary random fields. The objective is to minimize the resources used (samples or tour length) while guaranteeing estimation accuracy. We give approximation algorithms for both problems in convex environments. These improve previously known results, both in terms of theoretical guarantees and in simulations. In addition, we disprove an existing claim in the literature on a lower bound for a solution to the sample placement problem. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2209.03458 [pdf, other]

Scheduling Operator Assistance for Shared Autonomy in Multi-Robot Teams

Authors: Yifan Cai, Abhinav Dahiya, Nils Wilde, Stephen L. Smith

Abstract: In this paper, we consider the problem of allocating human operator assistance in a system with multiple autonomous robots. Each robot is required to complete independent missions, each defined as a sequence of tasks. While executing a task, a robot can either operate autonomously or be teleoperated by the human operator to complete the task at a faster rate. We show that the problem of creating a… ▽ More In this paper, we consider the problem of allocating human operator assistance in a system with multiple autonomous robots. Each robot is required to complete independent missions, each defined as a sequence of tasks. While executing a task, a robot can either operate autonomously or be teleoperated by the human operator to complete the task at a faster rate. We show that the problem of creating a teleoperation schedule that minimizes makespan of the system is NP-Hard. We formulate our problem as a Mixed Integer Linear Program, which can be used to optimally solve small to moderate sized problem instances. We also develop an anytime algorithm that makes use of the problem structure to provide a fast and high-quality solution of the operator scheduling problem, even for larger problem instances. Our key insight is to identify blocking tasks in greedily-created schedules and iteratively remove those blocks to improve the quality of the solution. Through numerical simulations, we demonstrate the benefits of the proposed algorithm as an efficient and scalable approach that outperforms other greedy methods. △ Less

Submitted 7 September, 2022; originally announced September 2022.

arXiv:2206.00663 [pdf, other]

Error-Bounded Approximation of Pareto Fronts in Robot Planning Problems

Authors: Alexander Botros, Armin Sadeghi, Nils Wilde, Javier Alonso-Mora, Stephen L. Smith

Abstract: Many problems in robotics seek to simultaneously optimize several competing objectives under constraints. A conventional approach to solving such multi-objective optimization problems is to create a single cost function comprised of the weighted sum of the individual objectives. Solutions to this scalarized optimization problem are Pareto optimal solutions to the original multi-objective problem.… ▽ More Many problems in robotics seek to simultaneously optimize several competing objectives under constraints. A conventional approach to solving such multi-objective optimization problems is to create a single cost function comprised of the weighted sum of the individual objectives. Solutions to this scalarized optimization problem are Pareto optimal solutions to the original multi-objective problem. However, finding an accurate representation of a Pareto front remains an important challenge. Using uniformly spaced weight vectors is often inefficient and does not provide error bounds. Thus, we address the problem of computing a finite set of weight vectors such that for any other weight vector, there exists an element in the set whose error compared to optimal is minimized. To this end, we prove fundamental properties of the optimal cost as a function of the weight vector, including its continuity and concavity. Using these, we propose an algorithm that greedily adds the weight vector least-represented by the current set, and provide bounds on the error. Finally, we illustrate that the proposed approach significantly outperforms uniformly distributed weights for different robot planning problems with varying numbers of objective functions. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2205.11513 [pdf, other]

Data-Driven Learning of Safety-Critical Control with Stochastic Control Barrier Functions

Authors: Chuanzheng Wang, Yiming Meng, Stephen L. Smith, Jun Liu

Abstract: Control barrier functions are widely used to synthesize safety-critical controls. The existence of Gaussian-type noise may lead to unsafe actions and result in severe consequences. While studies are widely done in safety-critical control for stochastic systems, in many real-world applications, we do not have the knowledge of the stochastic component of the dynamics. In this paper, we study safety-… ▽ More Control barrier functions are widely used to synthesize safety-critical controls. The existence of Gaussian-type noise may lead to unsafe actions and result in severe consequences. While studies are widely done in safety-critical control for stochastic systems, in many real-world applications, we do not have the knowledge of the stochastic component of the dynamics. In this paper, we study safety-critical control of stochastic systems with an unknown diffusion part and propose a data-driven method to handle these scenarios. More specifically, we propose a data-driven stochastic control barrier function (DDSCBF) framework and use supervised learning to learn the unknown stochastic dynamics via the DDSCBF scheme. Under some reasonable assumptions, we provide guarantees that the DDSCBF scheme can approximate the Itô derivative of the stochastic control barrier function (SCBF) under partially unknown dynamics using the universal approximation theorem. We also show that we can achieve the same safety guarantee using the DDSCBF scheme as with SCBF in previous work without requiring the knowledge of stochastic dynamics. We use two non-linear stochastic systems to validate our theory in simulations. △ Less

Submitted 22 May, 2022; originally announced May 2022.

arXiv:2204.13650 [pdf, other]

Unlocking High-Accuracy Differentially Private Image Classification through Scale

Authors: Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, Borja Balle

Abstract: Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found th… ▽ More Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained NFNet-F3, we achieve a remarkable 83.8% top-1 accuracy on ImageNet under (0.5, 8*10^{-7})-DP. Additionally, we also achieve 86.7% top-1 accuracy under (8, 8 \cdot 10^{-7})-DP, which is just 4.3% below the current non-private SOTA for this task. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification. △ Less

Submitted 16 June, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

arXiv:2204.09571 [pdf, other]

Informative Path Planning in Random Fields via Mixed Integer Programming

Authors: Shamak Dutta, Nils Wilde, Stephen L. Smith

Abstract: We present a new mixed integer formulation for the discrete informative path planning problem in random fields. The objective is to compute a budget constrained path while collecting measurements whose linear estimate results in minimum error over a finite set of prediction locations. The problem is known to be NP-hard. However, we strive to compute optimal solutions by leveraging advances in mixe… ▽ More We present a new mixed integer formulation for the discrete informative path planning problem in random fields. The objective is to compute a budget constrained path while collecting measurements whose linear estimate results in minimum error over a finite set of prediction locations. The problem is known to be NP-hard. However, we strive to compute optimal solutions by leveraging advances in mixed integer optimization. Our approach is based on expanding the search space so we optimize not only over the collected measurement subset, but also over the class of all linear estimators. This allows us to formulate a mixed integer quadratic program that is convex in the continuous variables. The formulations are general and are not restricted to any covariance structure of the field. In simulations, we demonstrate the effectiveness of our approach over previous branch and bound algorithms. △ Less

Submitted 20 April, 2022; originally announced April 2022.

arXiv:2203.16423 [pdf, ps, other]

Data-Driven Model Predictive Control for Linear Time-Periodic Systems

Authors: Ruiqi Li, John W. Simpson-Porco, Stephen L. Smith

Abstract: We consider the problem of data-driven predictive control for an unknown discrete-time linear time-periodic (LTP) system of known period. Our proposed strategy generalizes both Data-enabled Predictive Control (DeePC) and Subspace Predictive Control (SPC), which are established data-driven control techniques for linear time-invariant (LTI) systems. The approach is supported by an extensive theoreti… ▽ More We consider the problem of data-driven predictive control for an unknown discrete-time linear time-periodic (LTP) system of known period. Our proposed strategy generalizes both Data-enabled Predictive Control (DeePC) and Subspace Predictive Control (SPC), which are established data-driven control techniques for linear time-invariant (LTI) systems. The approach is supported by an extensive theoretical development of behavioral systems theory for LTP systems, culminating in a generalization of the fundamental lemma. Our algorithm produces results identical to standard Model Predictive Control (MPC) for deterministic LTP systems. Robustness of the algorithm to noisy data is illustrated via simulation of a regularized version of the algorithm applied to a stochastic multi-input multi-output LTP system. △ Less

Submitted 12 September, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: 12 pages, 7 figures, the extended version of a CDC 2022 paper

arXiv:2203.16070 [pdf, other]

An Improved Greedy Algorithm for Subset Selection in Linear Estimation

Authors: Shamak Dutta, Nils Wilde, Stephen L. Smith

Abstract: In this paper, we consider a subset selection problem in a spatial field where we seek to find a set of k locations whose observations provide the best estimate of the field value at a finite set of prediction locations. The measurements can be taken at any location in the continuous field, and the covariance between the field values at different points is given by the widely used squared exponent… ▽ More In this paper, we consider a subset selection problem in a spatial field where we seek to find a set of k locations whose observations provide the best estimate of the field value at a finite set of prediction locations. The measurements can be taken at any location in the continuous field, and the covariance between the field values at different points is given by the widely used squared exponential covariance function. One approach for observation selection is to perform a grid discretization of the space and obtain an approximate solution using the greedy algorithm. The solution quality improves with a finer grid resolution but at the cost of increased computation. We propose a method to reduce the computational complexity, or conversely to increase solution quality, of the greedy algorithm by considering a search space consisting only of prediction locations and centroids of cliques formed by the prediction locations. We demonstrate the effectiveness of our proposed approach in simulation, both in terms of solution quality and runtime. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted for publication at European Control Conference, 2022

arXiv:2203.08278 [pdf, other]

A New Charged Lepton Flavor Violation Program at Fermilab

Authors: M. Aoki, R. B. Appleby, M. Aslaninejad, R. Barlow, R. H. Bernstein, C. Bloise, L. Calibbi, F. Cervelli, R. Culbertson, Andre Luiz de Gouvea, S. Di Falco, E. Diociaiuti, S. Donati, R. Donghia, B. Echenard, A. Gaponenko, S. Giovannella, C. Group, F. Happacher, M. T. Hedges, D. G. Hitlin, E. Hungerford, C. Johnstone, D. M. Kaplan, M. Kargiantoulakis , et al. (43 additional authors not shown)

Abstract: The muon has played a central role in establishing the Standard Model of particle physics, and continues to provide valuable information about the nature of new physics. A new complex at Fermilab, the Advanced Muon Facility, would provide the world's most intense positive and negative muon beams by exploiting the full potential of PIP-II and the Booster upgrade. This facility would enable a broad… ▽ More The muon has played a central role in establishing the Standard Model of particle physics, and continues to provide valuable information about the nature of new physics. A new complex at Fermilab, the Advanced Muon Facility, would provide the world's most intense positive and negative muon beams by exploiting the full potential of PIP-II and the Booster upgrade. This facility would enable a broad muon physics program, including studies of charged lepton flavor violation, muonium-antimuonium transitions, a storage ring muon EDM experiment, and muon spin rotation experiments. This document describes a staged realization of this complex, together with a series of next-generation experiments to search for charged lepton flavor violation. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: A Contributed Paper for Snowmass 2021

arXiv:2201.00724 [pdf, other]

Submodular Maximization with Limited Function Access

Authors: Andrew Downie, Bahman Gharesifard, Stephen L. Smith

Abstract: We consider a class of submodular maximization problems in which decision-makers have limited access to the objective function. We explore scenarios where the decision-maker can observe only pairwise information, i.e., can evaluate the objective function on sets of size two. We begin with a negative result that no algorithm using only $k$-wise information can guarantee performance better than… ▽ More We consider a class of submodular maximization problems in which decision-makers have limited access to the objective function. We explore scenarios where the decision-maker can observe only pairwise information, i.e., can evaluate the objective function on sets of size two. We begin with a negative result that no algorithm using only $k$-wise information can guarantee performance better than $k/n$. We present two algorithms that utilize only pairwise information about the function and characterize their performance relative to the optimal, which depends on the curvature of the submodular function. Additionally, if the submodular function possess a property called supermodularity of conditioning, then we can provide a method to bound the performance based purely on pairwise information. The proposed algorithms offer significant computational speedups over a traditional greedy strategy. A by-product of our study is the introduction of two new notions of curvature, the $k$-Marginal Curvature and the $k$-Cardinality Curvature. Finally, we present experiments highlighting the performance of our proposed algorithms in terms of approximation and time complexity. △ Less

Submitted 7 February, 2022; v1 submitted 3 January, 2022; originally announced January 2022.

Comments: 14 pages, 8 figures

arXiv:2112.08000 [pdf, other]

doi 10.1109/LRA.2021.3135928

Learning Submodular Objectives for Team Environmental Monitoring

Authors: Nils Wilde, Armin Sadeghi, Stephen L. Smith

Abstract: In this paper, we study the well-known team orienteering problem where a fleet of robots collects rewards by visiting locations. Usually, the rewards are assumed to be known to the robots; however, in applications such as environmental monitoring or scene reconstruction, the rewards are often subjective and specifying them is challenging. We propose a framework to learn the unknown preferences of… ▽ More In this paper, we study the well-known team orienteering problem where a fleet of robots collects rewards by visiting locations. Usually, the rewards are assumed to be known to the robots; however, in applications such as environmental monitoring or scene reconstruction, the rewards are often subjective and specifying them is challenging. We propose a framework to learn the unknown preferences of the user by presenting alternative solutions to them, and the user provides a ranking on the proposed alternative solutions. We consider the two cases for the user: 1) a deterministic user which provides the optimal ranking for the alternative solutions, and 2) a noisy user which provides the optimal ranking according to an unknown probability distribution. For the deterministic user we propose a framework to minimize a bound on the maximum deviation from the optimal solution, namely regret. We adapt the approach to capture the noisy user and minimize the expected regret. Finally, we demonstrate the importance of learning user preferences and the performance of the proposed methods in an extensive set of experimental results using real world datasets for environmental monitoring problems. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2111.06437 [pdf, other]

Scalable Operator Allocation for Multi-Robot Assistance: A Restless Bandit Approach

Authors: Abhinav Dahiya, Nima Akbarzadeh, Aditya Mahajan, Stephen L. Smith

Abstract: In this paper, we consider the problem of allocating human operators in a system with multiple semi-autonomous robots. Each robot is required to perform an independent sequence of tasks, subjected to a chance of failing and getting stuck in a fault state at every task. If and when required, a human operator can assist or teleoperate a robot. Conventional MDP techniques used to solve such problems… ▽ More In this paper, we consider the problem of allocating human operators in a system with multiple semi-autonomous robots. Each robot is required to perform an independent sequence of tasks, subjected to a chance of failing and getting stuck in a fault state at every task. If and when required, a human operator can assist or teleoperate a robot. Conventional MDP techniques used to solve such problems face scalability issues due to exponential growth of state and action spaces with the number of robots and operators. In this paper we derive conditions under which the operator allocation problem is indexable, enabling the use of the Whittle index heuristic. The conditions can be easily checked to verify indexability, and we show that they hold for a wide range of problems of interest. Our key insight is to leverage the structure of the value function of individual robots, resulting in conditions that can be verified separately for each state of each robot. We apply these conditions to two types of transitions commonly seen in remote robot supervision systems. Through numerical simulations, we demonstrate the efficacy of Whittle index policy as a near-optimal and scalable approach that outperforms existing scalable methods. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: 11 pages + 4 page Appendix, 7 Figures

arXiv:2111.03844 [pdf, other]

doi 10.1051/epjconf/202225700016

Dust Emission in Galaxies at Millimeter Wavelengths: Cooling of star forming regions in NGC6946

Authors: G. Ejlali, R. Adam, P. Ade, H. Ajeddig, P. André, E. Artis, H. Ausse, A. Beelen, A. Benoît, S. Berta, L. Bing, O. Bourrion, M. Calvo, A. Catalano, I. de Looze, M. De Petris, F. -X. Désert, S. Doyle, E. F. C. Driessen, M. Galametz, F. Galliano, A. Gomez, J. Goupy, A. P. Jones, A. Hughes , et al. (32 additional authors not shown)

Abstract: Interstellar dust plays an important role in the formation of molecular gas and the heating and cooling of the interstellar medium. The spatial distribution of the mm-wavelength dust emission from galaxies is largely unexplored. The NIKA2 Guaranteed Time Project IMEGIN (Interpreting the Millimeter Emission of Galaxies with IRAM and NIKA2) has recently mapped the mm emission in the grand design spi… ▽ More Interstellar dust plays an important role in the formation of molecular gas and the heating and cooling of the interstellar medium. The spatial distribution of the mm-wavelength dust emission from galaxies is largely unexplored. The NIKA2 Guaranteed Time Project IMEGIN (Interpreting the Millimeter Emission of Galaxies with IRAM and NIKA2) has recently mapped the mm emission in the grand design spiral galaxy NGC6946. By subtracting the contributions from the free-free, synchrotron, and CO line emission, we map the distribution of the pure dust emission at 1:15mm and 2mm. Separating the arm/interarm regions, we find a dominant 2mm emission from interarms indicating the significant role of the general interstellar radiation field in heating the cold dust. Finally, we present maps of the dust mass, temperature, and emissivity index using the Bayesian MCMC modeling of the spectral energy distribution in NGC6946. △ Less

Submitted 6 November, 2021; originally announced November 2021.

Comments: To appear in the Proceedings of the International Conference entitled "mm Universe @ NIKA2", Rome (Italy), June 2021, EPJ Web of conferences

arXiv:2110.00284 [pdf, other]

Learning Reward Functions from Scale Feedback

Authors: Nils Wilde, Erdem Bıyık, Dorsa Sadigh, Stephen L. Smith

Abstract: Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences. A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. While this minimizes the users effort, a strict choice does not yield any information on how much one trajectory is preferred. We propose scale feedback, where the use… ▽ More Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences. A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. While this minimizes the users effort, a strict choice does not yield any information on how much one trajectory is preferred. We propose scale feedback, where the user utilizes a slider to give more nuanced information. We introduce a probabilistic model on how users would provide feedback and derive a learning framework for the robot. We demonstrate the performance benefit of slider feedback in simulations, and validate our approach in two user studies suggesting that scale feedback enables more effective learning in practice. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: 16 pages, 15 figures, 3 tables. Published at Conference on Robot Learning (CoRL) 2021

arXiv:2109.08185 [pdf, other]

Optimal Partitioning of Non-Convex Environments for Minimum Turn Coverage Planning

Authors: Megnath Ramesh, Frank Imeson, Baris Fidan, Stephen L. Smith

Abstract: In this paper, we tackle the problem of planning an optimal coverage path for a robot operating indoors. Many existing approaches attempt to discourage turns in the path by covering the environment along the least number of coverage lines, i.e., straight-line paths. This is because turning not only slows down the robot but also negatively affects the quality of coverage, e.g., tools like cameras a… ▽ More In this paper, we tackle the problem of planning an optimal coverage path for a robot operating indoors. Many existing approaches attempt to discourage turns in the path by covering the environment along the least number of coverage lines, i.e., straight-line paths. This is because turning not only slows down the robot but also negatively affects the quality of coverage, e.g., tools like cameras and cleaning attachments commonly have poor performance around turns. The problem of minimizing coverage lines however is typically solved using heuristics that do not guarantee optimality. In this work, we propose a turn-minimizing coverage planning method that computes the optimal number of axis-parallel (horizontal/vertical) coverage lines for the environment in polynomial time. We do this by formulating a linear program (LP) that optimally partitions the environment into axis-parallel ranks (non-intersecting rectangles of width equal to the tool width). We then generate coverage paths for a set of real-world indoor environments and compare the results with state-of-the-art coverage approaches. △ Less

Submitted 26 May, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: 8 pages, 9 figures, submitted to RA-L with IROS 2022 option

arXiv:2107.11467 [pdf, other]

Spatio-Temporal Lattice Planning Using Optimal Motion Primitives

Authors: Alexander Botros, Stephen L. Smith

Abstract: Lattice-based planning techniques simplify the motion planning problem for autonomous vehicles by limiting available motions to a pre-computed set of primitives. These primitives are then combined online to generate more complex maneuvers. A set of motion primitives t-span a lattice if, given a real number t at least 1, any configuration in the lattice can be reached via a sequence of motion primi… ▽ More Lattice-based planning techniques simplify the motion planning problem for autonomous vehicles by limiting available motions to a pre-computed set of primitives. These primitives are then combined online to generate more complex maneuvers. A set of motion primitives t-span a lattice if, given a real number t at least 1, any configuration in the lattice can be reached via a sequence of motion primitives whose cost is no more than a factor of t from optimal. Computing a minimal t-spanning set balances a trade-off between computed motion quality and motion planning performance. In this work, we formulate this problem for an arbitrary lattice as a mixed integer linear program. We also propose an A*-based algorithm to solve the motion planning problem using these primitives. Finally, we present an algorithm that removes the excessive oscillations from planned motions -- a common problem in lattice-based planning. Our method is validated for autonomous driving in both parking lot and highway scenarios. △ Less

Submitted 17 July, 2023; v1 submitted 23 July, 2021; originally announced July 2021.

Comments: 12 pages, 9 figures, 2 tables, accepted to IEEE Transactions on Intelligent Transportation Systems (preprint)

arXiv:2106.03836 [pdf, other]

Tunable Trajectory Planner Using G3 Curves

Authors: Alexander Botros, Stephen L. Smith

Abstract: Trajectory planning is commonly used as part of a local planner in autonomous driving. This paper considers the problem of planning a continuous-curvature-rate trajectory between fixed start and goal states that minimizes a tunable trade-off between passenger comfort and travel time. The problem is an instance of infinite dimensional optimization over two continuous functions: a path, and a veloci… ▽ More Trajectory planning is commonly used as part of a local planner in autonomous driving. This paper considers the problem of planning a continuous-curvature-rate trajectory between fixed start and goal states that minimizes a tunable trade-off between passenger comfort and travel time. The problem is an instance of infinite dimensional optimization over two continuous functions: a path, and a velocity profile. We propose a simplification of this problem that facilitates the discretization of both functions. This paper also proposes a method to quickly generate minimal-length paths between start and goal states based on a single tuning parameter: the second derivative of curvature. Furthermore, we discretize the set of velocity profiles along a given path into a selection of acceleration way-points along the path. Gradient-descent is then employed to minimize cost over feasible choices of the second derivative of curvature, and acceleration way-points, resulting in a method that repeatedly solves the path and velocity profiles in an iterative fashion. Numerical examples are provided to illustrate the benefits of the proposed methods. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: 13 pages, 11 figures, submitted to IEEE Transactions on Intelligent Vehicles

arXiv:2105.13343 [pdf, other]

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

Authors: Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

Abstract: In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch. However recent work has suggested drawing multiple samples can achieve higher test accuracies. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out da… ▽ More In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch. However recent work has suggested drawing multiple samples can achieve higher test accuracies. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets. We demonstrate drawing multiple samples per image consistently enhances the test accuracy achieved for both small and large batch training. Crucially, this benefit arises even if different numbers of augmentations per image perform the same number of parameter updates and gradient evaluations (requiring the same total compute). Although prior work has found variance in the gradient estimate arising from subsampling the dataset has an implicit regularization benefit, our experiments suggest variance which arises from the data augmentation process harms generalization. We apply these insights to the highly performant NFNet-F5, achieving 86.8$\%$ top-1 w/o extra data on ImageNet. △ Less

Submitted 24 February, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

arXiv:2104.02585 [pdf, other]

Safety-Critical Control of Stochastic Systems using Stochastic Control Barrier Functions

Authors: Chuanzheng Wang, Yiming Meng, Stephen L. Smith, Jun Liu

Abstract: Control barrier functions have been widely used for synthesizing safety-critical controls, often via solving quadratic programs. However, the existence of Gaussian-type noise may lead to unsafe actions and result in severe consequences. In this paper, we study systems modeled by stochastic differential equations (SDEs) driven by Brownian motions. We propose a notion of stochastic control barrier f… ▽ More Control barrier functions have been widely used for synthesizing safety-critical controls, often via solving quadratic programs. However, the existence of Gaussian-type noise may lead to unsafe actions and result in severe consequences. In this paper, we study systems modeled by stochastic differential equations (SDEs) driven by Brownian motions. We propose a notion of stochastic control barrier functions (SCBFs)and show that SCBFs can significantly reduce the control efforts, especially in the presence of noise, compared to stochastic reciprocal control barrier functions (SRCBFs), and offer a less conservative estimation of safety probability, compared to stochastic zeroing control barrier functions (SZCBFs). Based on this less conservative probabilistic estimation for the proposed notion of SCBFs, we further extend the results to handle high relative degree safety constraints using high-order SCBFs. We demonstrate that the proposed SCBFs achieve good trade-offs of performance and control efforts, both through theoretical analysis and numerical simulations. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2102.06171 [pdf, other]

High-Performance Large-Scale Image Recognition Without Normalization

Authors: Andrew Brock, Soham De, Samuel L. Smith, Karen Simonyan

Abstract: Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for l… ▽ More Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%. Our code is available at https://github.com/deepmind/ deepmind-research/tree/master/nfnets △ Less

Submitted 11 February, 2021; originally announced February 2021.

arXiv:2101.12176 [pdf, other]

On the Origin of Implicit Regularization in Stochastic Gradient Descent

Authors: Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

Abstract: For infinitesimal learning rates, stochastic gradient descent (SGD) follows the path of gradient flow on the full batch loss function. However moderately large learning rates can achieve higher test accuracies, and this generalization benefit is not explained by convergence bounds, since the learning rate which maximizes test accuracy is often larger than the learning rate which minimizes training… ▽ More For infinitesimal learning rates, stochastic gradient descent (SGD) follows the path of gradient flow on the full batch loss function. However moderately large learning rates can achieve higher test accuracies, and this generalization benefit is not explained by convergence bounds, since the learning rate which maximizes test accuracy is often larger than the learning rate which minimizes training loss. To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss. This modified loss is composed of the original loss function and an implicit regularizer, which penalizes the norms of the minibatch gradients. Under mild assumptions, when the batch size is small the scale of the implicit regularization term is proportional to the ratio of the learning rate to the batch size. We verify empirically that explicitly including the implicit regularizer in the loss can enhance the test accuracy when the learning rate is small. △ Less

Submitted 28 January, 2021; originally announced January 2021.

Comments: Accepted as a conference paper at ICLR 2021

Showing 1–50 of 106 results for author: Smith, S L