-
Dimensionality Reduction and Nearest Neighbors for Improving Out-of-Distribution Detection in Medical Image Segmentation
Authors:
McKell Woodland,
Nihil Patel,
Austin Castelo,
Mais Al Taie,
Mohamed Eltaher,
Joshua P. Yung,
Tucker J. Netherton,
Tiffany L. Calderone,
Jessica I. Sanchez,
Darrel W. Cleere,
Ahmed Elsaiey,
Nakul Gupta,
David Victor,
Laura Beretta,
Ankit B. Patel Kristy K. Brock
Abstract:
Clinically deployed deep learning-based segmentation models are known to fail on data outside of their training distributions. While clinicians review the segmentations, these models tend to perform well in most instances, which could exacerbate automation bias. Therefore, detecting out-of-distribution images at inference is critical to warn the clinicians that the model likely failed. This work a…
▽ More
Clinically deployed deep learning-based segmentation models are known to fail on data outside of their training distributions. While clinicians review the segmentations, these models tend to perform well in most instances, which could exacerbate automation bias. Therefore, detecting out-of-distribution images at inference is critical to warn the clinicians that the model likely failed. This work applied the Mahalanobis distance (MD) post hoc to the bottleneck features of four Swin UNETR and nnU-net models that segmented the liver on T1-weighted magnetic resonance imaging and computed tomography. By reducing the dimensions of the bottleneck features with either principal component analysis or uniform manifold approximation and projection, images the models failed on were detected with high performance and minimal computational load. In addition, this work explored a non-parametric alternative to the MD, a k-th nearest neighbors distance (KNN). KNN drastically improved scalability and performance over MD when both were applied to raw and average-pooled bottleneck features.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (172 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 August, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Authors:
Aleksandar Botev,
Soham De,
Samuel L Smith,
Anushan Fernando,
George-Cristian Muraru,
Ruba Haroun,
Leonard Berrada,
Razvan Pascanu,
Pier Giuseppe Sessa,
Robert Dadashi,
Léonard Hussenot,
Johan Ferret,
Sertan Girgin,
Olivier Bachem,
Alek Andreev,
Kathleen Kenealy,
Thomas Mesnard,
Cassidy Hardin,
Surya Bhupatiraju,
Shreya Pathak,
Laurent Sifre,
Morgane Rivière,
Mihir Sanjay Kale,
Juliette Love,
Pouya Tafti
, et al. (37 additional authors not shown)
Abstract:
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-tr…
▽ More
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
△ Less
Submitted 28 August, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1110 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 8 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Grayscale control of local magnetic properties with direct-write laser annealing
Authors:
Lauren J. Riddiford,
Jeffrey A. Brock,
Katarzyna Murawska,
Aleš Hrabec,
Laura J. Heyderman
Abstract:
Across the fields of magnetism, microelectronics, optics, and others, engineered local variations in physical properties can yield groundbreaking functionalities that play a crucial role in enabling future technologies. Beyond binary modifications, 1D lateral gradients in material properties (achieved by gradients in thickness, stoichiometry, temperature, or strain) give rise to a plethora of new…
▽ More
Across the fields of magnetism, microelectronics, optics, and others, engineered local variations in physical properties can yield groundbreaking functionalities that play a crucial role in enabling future technologies. Beyond binary modifications, 1D lateral gradients in material properties (achieved by gradients in thickness, stoichiometry, temperature, or strain) give rise to a plethora of new effects in thin film magnetic systems. However, extending such gradient-induced behaviors to 2D is challenging to realize with existing methods, which are plagued by slow processing speeds, dose instabilities, or limitation to variation along one dimension. Here, we show for the first time how commonplace direct-write laser exposure techniques, initially developed for grayscale patterning of photoresist surfaces, can be repurposed to perform grayscale direct-write laser annealing. With this technique, we demonstrate the ease with which two-dimensional, continuous variations in magnetic properties can be created at the mesoscopic scale in numerous application-relevant materials, including ferromagnetic, ferrimagnetic, and synthetic antiferromagnetic thin-film systems. The speed, versatility, and new possibilities to create complex magnetic energy landscapes offered by direct-write laser annealing opens the door to the lateral modification of the magnetic, electronic, and structural properties of a variety of thin films with an abundance of applications.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Local control of a single nitrogen-vacancy center by nanoscale engineered magnetic domain wall motions
Authors:
Nathan J. McLaughlin,
Senlei Li,
Jeffrey A. Brock,
Shu Zhang,
Hanyi Lu,
Mengqi Huang,
Yuxuan Xiao,
Jingcheng Zhou,
Yaroslav Tserkovnyak,
Eric E. Fullerton,
Hailong Wang,
Chunhui Rita Du
Abstract:
Effective control and readout of qubits form the technical foundation of next-generation, transformative quantum information sciences and technologies. The nitrogen-vacancy (NV) center, an intrinsic three-level spin system, is naturally relevant in this context due to its excellent quantum coherence, high fidelity of operations, and remarkable functionality over a broad range of experimental condi…
▽ More
Effective control and readout of qubits form the technical foundation of next-generation, transformative quantum information sciences and technologies. The nitrogen-vacancy (NV) center, an intrinsic three-level spin system, is naturally relevant in this context due to its excellent quantum coherence, high fidelity of operations, and remarkable functionality over a broad range of experimental conditions. It is an active contender for the development and implementation of cutting-edge quantum technologies. Here, we report magnetic domain wall motion driven local control and measurements of NV spin properties. By engineering the local magnetic field environment of an NV center via nanoscale reconfigurable domain wall motions, we show that NV photoluminescence, spin level energies, and coherence time can be reliably controlled and correlated to the magneto-transport response of a magnetic device. Our results highlight the electrically tunable dipole interaction between NV centers and nanoscale magnetic structures, providing an attractive platform to realize interactive information transfer between spin qubits and non-volatile magnetic memory in hybrid quantum spintronic systems.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
ConvNets Match Vision Transformers at Scale
Authors:
Samuel L. Smith,
Andrew Brock,
Leonard Berrada,
Soham De
Abstract:
Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale. We challenge this belief by evaluating a performant ConvNet architecture pre-trained on JFT-4B, a large labelled dataset of images often used for training foundation models. We consider pre-training compute budge…
▽ More
Many researchers believe that ConvNets perform well on small or moderately sized datasets, but are not competitive with Vision Transformers when given access to datasets on the web-scale. We challenge this belief by evaluating a performant ConvNet architecture pre-trained on JFT-4B, a large labelled dataset of images often used for training foundation models. We consider pre-training compute budgets between 0.4k and 110k TPU-v4 core compute hours, and train a series of networks of increasing depth and width from the NFNet model family. We observe a log-log scaling law between held out loss and compute budget. After fine-tuning on ImageNet, NFNets match the reported performance of Vision Transformers with comparable compute budgets. Our strongest fine-tuned model achieves a Top-1 accuracy of 90.4%.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Impacts of the half-skyrmion spin topology, spin-orbit torque, and dynamic symmetry breaking on the growth of magnetic stripe domains
Authors:
Jeffrey A. Brock,
Daan Swinkels,
Bert Koopmans,
Eric E. Fullerton
Abstract:
We have performed an experimental and modeling-based study of the spin-orbit torque-induced growth of magnetic stripe domains in heavy metal/ferromagnet thin-film heterostructures that possess chiral Néel-type domain walls due to an interfacial Dzyaloshinskii-Moriya interaction. In agreement with previous reports, the stripe domains stabilized in these systems exhibit a significant transverse grow…
▽ More
We have performed an experimental and modeling-based study of the spin-orbit torque-induced growth of magnetic stripe domains in heavy metal/ferromagnet thin-film heterostructures that possess chiral Néel-type domain walls due to an interfacial Dzyaloshinskii-Moriya interaction. In agreement with previous reports, the stripe domains stabilized in these systems exhibit a significant transverse growth velocity relative to the applied current axis. This behavior has previously been attributed to the Magnus force-like skyrmion Hall effect of the stripe domain spin topology, which is analogous to that of a half-skyrmion. However, through analytic modeling of the in-plane torques generated by spin-orbit torque, we find that a dynamical reconfiguration of the domain wall magnetization profile is expected to occur - promoting motion with similar directionality and symmetry as the skyrmion Hall effect. These results further highlight the sensitivity of spin-orbit torque to the local orientation of the domain wall magnetization profile and its contribution to domain growth directionality.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Evidence of extreme domain wall speeds under ultrafast optical excitation
Authors:
Rahul Jangid,
Nanna Zhou Hagström,
Meera Madhavi,
Kyle Rockwell,
Justin M. Shaw,
Jeffrey A. Brock,
Matteo Pancaldi,
Dario De Angelis,
Flavio Capotondi,
Emanuele Pedersoli,
Hans T. Nembach,
Mark W. Keller,
Stefano Bonetti,
Eric E. Fullerton,
Ezio Iacocca,
Roopali Kukreja,
Thomas J. Silva
Abstract:
Time-resolved ultrafast EUV magnetic scattering was used to test a recent prediction of >10 km/s domain wall speeds by optically exciting a magnetic sample with a nanoscale labyrinthine domain pattern. Ultrafast distortion of the diffraction pattern was observed at markedly different timescales compared to the magnetization quenching. The diffraction pattern distortion shows a threshold-dependence…
▽ More
Time-resolved ultrafast EUV magnetic scattering was used to test a recent prediction of >10 km/s domain wall speeds by optically exciting a magnetic sample with a nanoscale labyrinthine domain pattern. Ultrafast distortion of the diffraction pattern was observed at markedly different timescales compared to the magnetization quenching. The diffraction pattern distortion shows a threshold-dependence with laser fluence, not seen for magnetization quenching, consistent with a picture of domain wall motion with pinning sites. Supported by simulations, we show that a speed of $\approx$ 66 km/s for highly curved domain walls can explain the experimental data. While our data agree with the prediction of extreme, non-equilibrium wall speeds locally, it differs from the details of the theory, suggesting that additional mechanisms are required to fully understand these effects.
△ Less
Submitted 27 April, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Authors:
Bobby He,
James Martens,
Guodong Zhang,
Aleksandar Botev,
Andrew Brock,
Samuel L Smith,
Yee Whye Teh
Abstract:
Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which…
▽ More
Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which we define as networks without skips or normalisation). However, these approaches are incompatible with the self-attention layers present in transformers, whose kernels are intrinsically more complicated to analyse and control. And so the question remains: is it possible to train deep vanilla transformers? We answer this question in the affirmative by designing several approaches that use combinations of parameter initialisations, bias matrices and location-dependent rescaling to achieve faithful signal propagation in vanilla transformers. Our methods address various intricacies specific to signal propagation in transformers, including the interaction with positional encoding and causal masking. In experiments on WikiText-103 and C4, our approaches enable deep transformers without normalisation to train at speeds matching their standard counterparts, and deep vanilla transformers to reach the same performance as standard ones after about 5 times more iterations.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Spatial Functa: Scaling Functa to ImageNet Classification and Generation
Authors:
Matthias Bauer,
Emilien Dupont,
Andy Brock,
Dan Rosenbaum,
Jonathan Richard Schwarz,
Hyunjik Kim
Abstract:
Neural fields, also known as implicit neural representations, have emerged as a powerful means to represent complex signals of various modalities. Based on this Dupont et al. (2022) introduce a framework that views neural fields as data, termed *functa*, and proposes to do deep learning directly on this dataset of neural fields. In this work, we show that the proposed framework faces limitations w…
▽ More
Neural fields, also known as implicit neural representations, have emerged as a powerful means to represent complex signals of various modalities. Based on this Dupont et al. (2022) introduce a framework that views neural fields as data, termed *functa*, and proposes to do deep learning directly on this dataset of neural fields. In this work, we show that the proposed framework faces limitations when scaling up to even moderately complex datasets such as CIFAR-10. We then propose *spatial functa*, which overcome these limitations by using spatially arranged latent representations of neural fields, thereby allowing us to scale up the approach to ImageNet-1k at 256x256 resolution. We demonstrate competitive performance to Vision Transformers (Steiner et al., 2022) on classification and Latent Diffusion (Rombach et al., 2022) on image generation respectively.
△ Less
Submitted 9 February, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Accessible Interactive Maps for Visually Impaired Users
Authors:
Julie Ducasse,
Anke Brock,
Christophe Jouffrais
Abstract:
Tactile maps are commonly used to give visually impaired users access to geographical representations. Although those relief maps are efficient tools for acquisition of spatial knowledge, they present several limitations and issues such as the need to read braille. Several research projects have been led during the past three decades in order to improve access to maps using interactive technologie…
▽ More
Tactile maps are commonly used to give visually impaired users access to geographical representations. Although those relief maps are efficient tools for acquisition of spatial knowledge, they present several limitations and issues such as the need to read braille. Several research projects have been led during the past three decades in order to improve access to maps using interactive technologies. In this chapter, we present an exhaustive review of interactive map prototypes. We classified existing interactive maps into two categories: Digital Interactive Maps (DIMs) that are displayed on a flat surface such as a screen; and Hybrid Interactive Maps (HIMs) that include both a digital and a physical representation. In each family, we identified several subcategories depending on the technology being used. We compared the categories and subcategories according to cost, availability and technological limitations, but also in terms of content, comprehension and interactivity. Then we reviewed a number of studies showing that those maps can support spatial learning for visually impaired users. Finally, we identified new technologies and methods that could improve the accessibility of graphics for visually impaired users in the future.
△ Less
Submitted 31 August, 2022;
originally announced August 2022.
-
Editorial: Special Issue on Collaborative Aspects of Open Data in Software EngineeringJohan
Authors:
Johan Linåker,
Per Runeson,
Anneke Zuiderwijk,
Amanda Brock
Abstract:
High-quality data has become increasingly important to software engineers in designing and implementing today's software, for example, as an input to machine-learning algorithms and visualisation- and analytics-based features. Open data - i.e., data shared under a licence that gives users the right to study, process, and distribute the data to anyone and for any purpose - offers a mechanism to add…
▽ More
High-quality data has become increasingly important to software engineers in designing and implementing today's software, for example, as an input to machine-learning algorithms and visualisation- and analytics-based features. Open data - i.e., data shared under a licence that gives users the right to study, process, and distribute the data to anyone and for any purpose - offers a mechanism to address this need. Data may originate from multiple sources, whether crowdsourced, shared by government agencies, or shared between commercial entities, and is undoubtedly inherent to all business and revenue models across the public sector, business and industry today. In this guest editorial for the Special Issue on Collaborative Aspects of Open Data in Software Engineering, we explore the collaborative aspects of open data in software engineering. We highlight how these aspects can benefit organisations, what challenges may exist and how these may be addressed based on current practice, and introduce the four papers included in this special issue.
△ Less
Submitted 31 July, 2022;
originally announced August 2022.
-
Parallel Compositing of Volumetric Depth Images for Interactive Visualization of Distributed Volumes at High Frame Rates
Authors:
Aryaman Gupta,
Pietro Incardona,
Anton Brock,
Guido Reina,
Steffen Frey,
Stefan Gumhold,
Ulrik Günther,
Ivo F. Sbalzarini
Abstract:
We present a parallel compositing algorithm for Volumetric Depth Images (VDIs) of large three-dimensional volume data. Large distributed volume data are routinely produced in both numerical simulations and experiments, yet it remains challenging to visualize them at smooth, interactive frame rates. VDIs are view-dependent piecewise constant representations of volume data that offer a potential sol…
▽ More
We present a parallel compositing algorithm for Volumetric Depth Images (VDIs) of large three-dimensional volume data. Large distributed volume data are routinely produced in both numerical simulations and experiments, yet it remains challenging to visualize them at smooth, interactive frame rates. VDIs are view-dependent piecewise constant representations of volume data that offer a potential solution. They are more compact and less expensive to render than the original data. So far, however, there is no method for generating VDIs from distributed data. We propose an algorithm that enables this by sort-last parallel generation and compositing of VDIs with automatically chosen content-adaptive parameters. The resulting composited VDI can then be streamed for remote display, providing responsive visualization of large, distributed volume data.
△ Less
Submitted 31 July, 2024; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Flamingo: a Visual Language Model for Few-Shot Learning
Authors:
Jean-Baptiste Alayrac,
Jeff Donahue,
Pauline Luc,
Antoine Miech,
Iain Barr,
Yana Hasson,
Karel Lenc,
Arthur Mensch,
Katie Millican,
Malcolm Reynolds,
Roman Ring,
Eliza Rutherford,
Serkan Cabi,
Tengda Han,
Zhitao Gong,
Sina Samangooei,
Marianne Monteiro,
Jacob Menick,
Sebastian Borgeaud,
Andrew Brock,
Aida Nematzadeh,
Sahand Sharifzadeh,
Mikolaj Binkowski,
Ricardo Barreira,
Oriol Vinyals
, et al. (2 additional authors not shown)
Abstract:
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily i…
▽ More
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of our models, exploring and measuring their ability to rapidly adapt to a variety of image and video tasks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer; captioning tasks, which evaluate the ability to describe a scene or an event; and close-ended tasks such as multiple-choice visual question-answering. For tasks lying anywhere on this spectrum, a single Flamingo model can achieve a new state of the art with few-shot learning, simply by prompting the model with task-specific examples. On numerous benchmarks, Flamingo outperforms models fine-tuned on thousands of times more task-specific data.
△ Less
Submitted 15 November, 2022; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Quantum sensing and imaging of spin-orbit-torque-driven spin dynamics in noncollinear antiferromagnet Mn3Sn
Authors:
Gerald Q. Yan,
Senlei Li,
Hanyi Lu,
Mengqi Huang,
Yuxuan Xiao,
Luke Wernert,
Jeffrey A. Brock,
Eric E. Fullerton,
Hua Chen,
Hailong Wang,
Chunhui Rita Du
Abstract:
Novel noncollinear antiferromagnets with spontaneous time-reversal symmetry breaking, nontrivial band topology, and unconventional transport properties have received immense research interest over the past decade due to their rich physics and enormous promise in technological applications. One of the central focuses in this emerging field is exploring the relationship between the microscopic magne…
▽ More
Novel noncollinear antiferromagnets with spontaneous time-reversal symmetry breaking, nontrivial band topology, and unconventional transport properties have received immense research interest over the past decade due to their rich physics and enormous promise in technological applications. One of the central focuses in this emerging field is exploring the relationship between the microscopic magnetic structure and exotic material properties. Here, the nanoscale imaging of both spin-orbit-torque-induced deterministic magnetic switching and chiral spin rotation in noncollinear antiferromagnet Mn3Sn films using nitrogen-vacancy (NV) centers is reported. Direct evidence of the off-resonance dipole-dipole coupling between the spin dynamics in Mn3Sn and proximate NV centers is also demonstrated with NV relaxometry measurements. These results demonstrate the unique capabilities of NV centers in accessing the local information of the magnetic order and dynamics in these emergent quantum materials and suggest new opportunities for investigating the interplay between topology and magnetism in a broad range of topological magnets.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
Improving language models by retrieving from trillions of tokens
Authors:
Sebastian Borgeaud,
Arthur Mensch,
Jordan Hoffmann,
Trevor Cai,
Eliza Rutherford,
Katie Millican,
George van den Driessche,
Jean-Baptiste Lespiau,
Bogdan Damoc,
Aidan Clark,
Diego de Las Casas,
Aurelia Guy,
Jacob Menick,
Roman Ring,
Tom Hennigan,
Saffron Huang,
Loren Maggiore,
Chris Jones,
Albin Cassirer,
Andy Brock,
Michela Paganini,
Geoffrey Irving,
Oriol Vinyals,
Simon Osindero,
Karen Simonyan
, et al. (3 additional authors not shown)
Abstract:
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to d…
▽ More
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
△ Less
Submitted 7 February, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Towards Learning Universal Audio Representations
Authors:
Luyu Wang,
Pauline Luc,
Yan Wu,
Adria Recasens,
Lucas Smaira,
Andrew Brock,
Andrew Jaegle,
Jean-Baptiste Alayrac,
Sander Dieleman,
Joao Carreira,
Aaron van den Oord
Abstract:
The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learni…
▽ More
The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation evaluation suite (HARES) spanning 12 downstream tasks across audio domains and provide a thorough empirical study of recent sound representation learning systems on that benchmark. We discover that previous sound event classification or speech models do not generalize outside of their domains. We observe that more robust audio representations can be learned with the SimCLR objective; however, the model's transferability depends heavily on the model architecture. We find the Slowfast architecture is good at learning rich representations required by different domains, but its performance is affected by the normalization scheme. Based on these findings, we propose a novel normalizer-free Slowfast NFNet and achieve state-of-the-art performance across all domains.
△ Less
Submitted 23 June, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Skyrmion stabilization at the domain morphology transition in ferromagnet/heavy metal heterostructures with low exchange stiffness
Authors:
Jeffrey A. Brock,
Eric E. Fullerton
Abstract:
We report the experimental observation of micron-scale magnetic skyrmions at room temperature in several Pt/Co-based thin film heterostructures designed to possess a low exchange stiffness, perpendicular magnetic anisotropy, and a modest interfacial Dzyaloshinskii-Moriya interaction (iDMI). We find both experimentally and by micromagnetic and analytic modeling that the combined action of low excha…
▽ More
We report the experimental observation of micron-scale magnetic skyrmions at room temperature in several Pt/Co-based thin film heterostructures designed to possess a low exchange stiffness, perpendicular magnetic anisotropy, and a modest interfacial Dzyaloshinskii-Moriya interaction (iDMI). We find both experimentally and by micromagnetic and analytic modeling that the combined action of low exchange stiffness and modest iDMI eliminates the energetic penalty associated with forming domain walls in thin film heterostructures. When the domain wall energy density approaches negative values, the remanent domain morphology transitions from a uniform state to a labyrinthian stripe phase. A low exchange stiffness, indicated by a reduction in the Curie temperature below 400 K, is achieved in Pt/Co, Pt/Co/Ni, and Pt/Co/Ni/Re structures by reducing the Co thickness to the ultrathin limit (< 0.3 nm). A similar effect occurs in thicker Pt/Co/NixCu1-x structures when the Ni layer is alloyed with Cu. At this transition in domain morphology, skyrmion phases are stabilized when a small (< 1 mT) perpendicular magnetic field is applied and current-induced skyrmion motion including the skyrmion Hall effect is observed. The temperature and thickness-induced morphological phase transitions observed are similar to the well-studied spin reorientation transition that occurs in the ultrathin limit, but we find that the underlying energy balances are substantially modified by the presence of an iDMI.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Authors:
Andrew Jaegle,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Carl Doersch,
Catalin Ionescu,
David Ding,
Skanda Koppula,
Daniel Zoran,
Andrew Brock,
Evan Shelhamer,
Olivier Hénaff,
Matthew M. Botvinick,
Andrew Zisserman,
Oriol Vinyals,
Joāo Carreira
Abstract:
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data f…
▽ More
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.
△ Less
Submitted 15 March, 2022; v1 submitted 30 July, 2021;
originally announced July 2021.
-
Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
Authors:
Stanislav Fort,
Andrew Brock,
Razvan Pascanu,
Soham De,
Samuel L. Smith
Abstract:
In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch. However recent work has suggested drawing multiple samples can achieve higher test accuracies. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out da…
▽ More
In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch. However recent work has suggested drawing multiple samples can achieve higher test accuracies. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets. We demonstrate drawing multiple samples per image consistently enhances the test accuracy achieved for both small and large batch training. Crucially, this benefit arises even if different numbers of augmentations per image perform the same number of parameter updates and gradient evaluations (requiring the same total compute). Although prior work has found variance in the gradient estimate arising from subsampling the dataset has an implicit regularization benefit, our experiments suggest variance which arises from the data augmentation process harms generalization. We apply these insights to the highly performant NFNet-F5, achieving 86.8$\%$ top-1 w/o extra data on ImageNet.
△ Less
Submitted 24 February, 2022; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Skillful Precipitation Nowcasting using Deep Generative Models of Radar
Authors:
Suman Ravuri,
Karel Lenc,
Matthew Willson,
Dmitry Kangin,
Remi Lam,
Piotr Mirowski,
Megan Fitzsimons,
Maria Athanassiadou,
Sheleem Kashem,
Sam Madge,
Rachel Prudden,
Amol Mandhane,
Aidan Clark,
Andrew Brock,
Karen Simonyan,
Raia Hadsell,
Niall Robinson,
Ellen Clancy,
Alberto Arribas,
Shakir Mohamed
Abstract:
Precipitation nowcasting, the high-resolution forecasting of precipitation up to two hours ahead, supports the real-world socio-economic needs of many sectors reliant on weather-dependent decision-making. State-of-the-art operational nowcasting methods typically advect precipitation fields with radar-based wind estimates, and struggle to capture important non-linear events such as convective initi…
▽ More
Precipitation nowcasting, the high-resolution forecasting of precipitation up to two hours ahead, supports the real-world socio-economic needs of many sectors reliant on weather-dependent decision-making. State-of-the-art operational nowcasting methods typically advect precipitation fields with radar-based wind estimates, and struggle to capture important non-linear events such as convective initiations. Recently introduced deep learning methods use radar to directly predict future rain rates, free of physical constraints. While they accurately predict low-intensity rainfall, their operational utility is limited because their lack of constraints produces blurry nowcasts at longer lead times, yielding poor performance on more rare medium-to-heavy rain events. To address these challenges, we present a Deep Generative Model for the probabilistic nowcasting of precipitation from radar. Our model produces realistic and spatio-temporally consistent predictions over regions up to 1536 km x 1280 km and with lead times from 5-90 min ahead. In a systematic evaluation by more than fifty expert forecasters from the Met Office, our generative model ranked first for its accuracy and usefulness in 88% of cases against two competitive methods, demonstrating its decision-making value and ability to provide physical insight to real-world experts. When verified quantitatively, these nowcasts are skillful without resorting to blurring. We show that generative nowcasting can provide probabilistic predictions that improve forecast value and support operational utility, and at resolutions and lead times where alternative methods struggle.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Perceiver: General Perception with Iterative Attention
Authors:
Andrew Jaegle,
Felix Gimeno,
Andrew Brock,
Andrew Zisserman,
Oriol Vinyals,
Joao Carreira
Abstract:
Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. T…
▽ More
Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for individual modalities, often relying on domain-specific assumptions such as the local grid structures exploited by virtually all existing vision models. These priors introduce helpful inductive biases, but also lock models to individual modalities. In this paper we introduce the Perceiver - a model that builds upon Transformers and hence makes few architectural assumptions about the relationship between its inputs, but that also scales to hundreds of thousands of inputs, like ConvNets. The model leverages an asymmetric attention mechanism to iteratively distill inputs into a tight latent bottleneck, allowing it to scale to handle very large inputs. We show that this architecture is competitive with or outperforms strong, specialized models on classification tasks across various modalities: images, point clouds, audio, video, and video+audio. The Perceiver obtains performance comparable to ResNet-50 and ViT on ImageNet without 2D convolutions by directly attending to 50,000 pixels. It is also competitive in all modalities in AudioSet.
△ Less
Submitted 22 June, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
Dynamic symmetry breaking in chiral magnetic systems
Authors:
Jeffrey A. Brock,
Michael D. Kitcher,
Pierre Vallobra,
Rajasekhar Medapalli,
Maxwell P. Li,
Marc De Graef,
Stéphane Mangin,
Vincent Sokalski,
Eric E. Fullerton
Abstract:
The Dzyaloshinskii-Moriya interaction (DMI) in magnetic systems stabilizes spin textures with preferred chirality, applicable to next-generation memory and computing architectures. In perpendicularly magnetized heavy-metal/ferromagnet films, the interfacial DMI originating from structural inversion asymmetry and strong spin-orbit coupling favors chiral Néel-type domain walls (DWs) whose energetics…
▽ More
The Dzyaloshinskii-Moriya interaction (DMI) in magnetic systems stabilizes spin textures with preferred chirality, applicable to next-generation memory and computing architectures. In perpendicularly magnetized heavy-metal/ferromagnet films, the interfacial DMI originating from structural inversion asymmetry and strong spin-orbit coupling favors chiral Néel-type domain walls (DWs) whose energetics and mobility remain at issue. Here, we characterize a new effect in which domains expand unidirectionally in response to a combination of out-of-plane and in-plane magnetic fields, with the growth direction controlled by the in-plane field strength. These growth directionalities and symmetries with applied fields cannot be understood from static treatments alone. We theoretically demonstrate that perpendicular field torques stabilize steady-state magnetization profiles highly asymmetric in elastic energy, resulting in a dynamic symmetry breaking consistent with the experimental findings. This phenomenon sheds light on the mechanisms governing the dynamics of Néel-type DWs and expands the utility of field-driven DW motion to probe and control chiral DWs.
△ Less
Submitted 24 May, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
High-Performance Large-Scale Image Recognition Without Normalization
Authors:
Andrew Brock,
Soham De,
Samuel L. Smith,
Karen Simonyan
Abstract:
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for l…
▽ More
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets. Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, and our largest models attain a new state-of-the-art top-1 accuracy of 86.5%. In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%. Our code is available at https://github.com/deepmind/ deepmind-research/tree/master/nfnets
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
Characterizing signal propagation to close the performance gap in unnormalized ResNets
Authors:
Andrew Brock,
Soham De,
Samuel L. Smith
Abstract:
Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to…
▽ More
Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with the state-of-the-art EfficientNets on ImageNet.
△ Less
Submitted 27 January, 2021; v1 submitted 21 January, 2021;
originally announced January 2021.
-
Training Generative Adversarial Networks by Solving Ordinary Differential Equations
Authors:
Chongli Qin,
Yan Wu,
Jost Tobias Springenberg,
Andrew Brock,
Jeff Donahue,
Timothy P. Lillicrap,
Pushmeet Kohli
Abstract:
The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly st…
▽ More
The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly stable. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error in discretising the continuous dynamics. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training - when combined with a regulariser that controls the integration error. Our approach represents a radical departure from previous methods which typically use adaptive optimisation and stabilisation techniques that constrain the functional space (e.g. Spectral Normalisation). Evaluation on CIFAR-10 and ImageNet shows that our method outperforms several strong baselines, demonstrating its efficacy.
△ Less
Submitted 28 November, 2020; v1 submitted 28 October, 2020;
originally announced October 2020.
-
BYOL works even without batch statistics
Authors:
Pierre H. Richemond,
Jean-Bastien Grill,
Florent Altché,
Corentin Tallec,
Florian Strub,
Andrew Brock,
Samuel Smith,
Soham De,
Razvan Pascanu,
Bilal Piot,
Michal Valko
Abstract:
Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids co…
▽ More
Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Energy-efficient generation of skyrmion phases in Co/Ni/Pt-based multilayers using Joule heating
Authors:
Jeffrey A. Brock,
Sergio A. Montoya,
Mi-Young Im,
Eric E. Fullerton
Abstract:
We have studied the effects of electrical current pulses on skyrmion formation in a series of Co/Ni/Pt-based multilayers. Transmission X-ray microscopy reveals that by applying electrical current pulses of duration and current density on the order of $τ$=50 $μ$s and j=1.7x10$^1$$^0$ A/m$^2$, respectively, in an applied magnetic field of $μ$$_0$Hz=50 mT, stripe-to-skyrmion transformations are attai…
▽ More
We have studied the effects of electrical current pulses on skyrmion formation in a series of Co/Ni/Pt-based multilayers. Transmission X-ray microscopy reveals that by applying electrical current pulses of duration and current density on the order of $τ$=50 $μ$s and j=1.7x10$^1$$^0$ A/m$^2$, respectively, in an applied magnetic field of $μ$$_0$Hz=50 mT, stripe-to-skyrmion transformations are attained. The skyrmions remain stable across a wide range of magnetic fields, including zero field. The skyrmions then remain stable across a wide range of magnetic fields, including zero field. We primarily attribute the transformation to current-induced Joule heating on the order of ~125 K. Reducing the magnetic moment and perpendicular anisotropy using thin rare-earth spacers dramatically reduces the pulse duration, current density, and magnetic field necessary to 25 $μ$s, 2.4x10$^9$ A/m$^2$, and 27 mT, respectively. These findings show that energetic inputs allow for the formation of skyrmion phases in a broad class of materials and that material properties can be tuned to yield more energy-efficient access to skyrmion phases.
△ Less
Submitted 24 August, 2020; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Evolving Normalization-Activation Layers
Authors:
Hanxiao Liu,
Andrew Brock,
Karen Simonyan,
Quoc V. Le
Abstract:
Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical function…
▽ More
Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods. To address the challenge, we develop efficient rejection protocols to quickly filter out candidate layers that do not work well. We also use multi-objective evolution to optimize each layer's performance across many architectures to prevent overfitting. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures that go beyond existing design patterns. For example, some EvoNorms do not assume that normalization and activation functions must be applied sequentially, nor need to center the feature maps, nor require explicit activation functions. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets but also transfer well to Mask R-CNN with FPN/SpineNet for instance segmentation and to BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers in many cases.
△ Less
Submitted 17 July, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
TF-Replicator: Distributed Machine Learning for Researchers
Authors:
Peter Buchlovsky,
David Budden,
Dominik Grewe,
Chris Jones,
John Aslanides,
Frederic Besse,
Andy Brock,
Aidan Clark,
Sergio Gómez Colmenarejo,
Aedan Pope,
Fabio Viola,
Dan Belov
Abstract:
We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchr…
▽ More
We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchronous or asynchronous training regimes. To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet image generation, and (3) a D4PG reinforcement learning agent for continuous control. Our results show strong scalability performance without demanding any distributed systems expertise of the user. The TF-Replicator programming model will be open-sourced as part of TensorFlow 2.0 (see https://github.com/tensorflow/community/pull/25).
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Authors:
Andrew Brock,
Jeff Donahue,
Karen Simonyan
Abstract:
Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenabl…
▽ More
Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128x128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous best IS of 52.52 and FID of 18.6.
△ Less
Submitted 25 February, 2019; v1 submitted 28 September, 2018;
originally announced September 2018.
-
Implicit Weight Uncertainty in Neural Networks
Authors:
Nick Pawlowski,
Andrew Brock,
Matthew C. H. Lee,
Martin Rajchl,
Ben Glocker
Abstract:
Modern neural networks tend to be overconfident on unseen, noisy or incorrectly labelled data and do not produce meaningful uncertainty measures. Bayesian deep learning aims to address this shortcoming with variational approximations (such as Bayes by Backprop or Multiplicative Normalising Flows). However, current approaches have limitations regarding flexibility and scalability. We introduce Baye…
▽ More
Modern neural networks tend to be overconfident on unseen, noisy or incorrectly labelled data and do not produce meaningful uncertainty measures. Bayesian deep learning aims to address this shortcoming with variational approximations (such as Bayes by Backprop or Multiplicative Normalising Flows). However, current approaches have limitations regarding flexibility and scalability. We introduce Bayes by Hypernet (BbH), a new method of variational approximation that interprets hypernetworks as implicit distributions. It naturally uses neural networks to model arbitrarily complex distributions and scales to modern deep learning architectures. In our experiments, we demonstrate that our method achieves competitive accuracies and predictive uncertainties on MNIST and a CIFAR5 task, while being the most robust against adversarial attacks.
△ Less
Submitted 25 May, 2018; v1 submitted 3 November, 2017;
originally announced November 2017.
-
SMASH: One-Shot Model Architecture Search through HyperNetworks
Authors:
Andrew Brock,
Theodore Lim,
J. M. Ritchie,
Nick Weston
Abstract:
Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively…
▽ More
Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed networks. Our code is available at https://github.com/ajbrock/SMASH
△ Less
Submitted 17 August, 2017;
originally announced August 2017.
-
FreezeOut: Accelerate Training by Progressively Freezing Layers
Authors:
Andrew Brock,
Theodore Lim,
J. M. Ritchie,
Nick Weston
Abstract:
The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time dur…
▽ More
The early layers of a deep neural net have the fewest parameters, but take up the most computation. In this extended abstract, we propose to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass. Through experiments on CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20% wall-clock time during training with 3% loss in accuracy for DenseNets, a 20% speedup without loss of accuracy for ResNets, and no improvement for VGG networks. Our code is publicly available at https://github.com/ajbrock/FreezeOut
△ Less
Submitted 18 June, 2017; v1 submitted 15 June, 2017;
originally announced June 2017.
-
Rotational Dynamics and Star Formation in the Nearby Dwarf Galaxy NGC 5238
Authors:
John M. Cannon,
Andrew T. McNichols,
Yaron G. Teich,
Catherine Ball,
John Banovetz,
Annika Brock,
Brian A. Eisner,
Kathleen Fitzgibbon,
Masao Miazzo,
Asra Nizami,
Bridget Reilly,
Elizabeth Ruvolo,
Quinton Singer
Abstract:
We present new HI spectral line images of the nearby low-mass galaxy NGC 5238, acquired with the Karl G. Jansky Very Large Array (VLA). Located at a distance of 4.51+/-0.04 Mpc, NGC 5238 is an actively star-forming galaxy with widespread H-alpha and UV continuum emission. The source is included in many ongoing and recent nearby galaxy surveys, but until this work the spatially resolved qualities o…
▽ More
We present new HI spectral line images of the nearby low-mass galaxy NGC 5238, acquired with the Karl G. Jansky Very Large Array (VLA). Located at a distance of 4.51+/-0.04 Mpc, NGC 5238 is an actively star-forming galaxy with widespread H-alpha and UV continuum emission. The source is included in many ongoing and recent nearby galaxy surveys, but until this work the spatially resolved qualities of its neutral interstellar medium have remained unstudied. Our HI images resolve the disk on physical scales of ~400 pc, allowing us to undertake a detailed comparative study of the gaseous and stellar components. The HI disk is asymmetric in the outer regions, and the areas of high HI mass surface density display a crescent-shaped morphology that is slightly offset from the center of the stellar populations. The HI column density exceeds 10^21 cm^-2 in much of the disk. We quantify the degree of co-spatiality of dense HI gas and sites of ongoing star formation as traced by far-UV and H-alpha emission. The neutral gas kinematics are complex; using a spatially-resolved position-velocity analysis, we infer a rotational velocity of 31+/-5 km/s. We place NGC 5238 on the baryonic Tully-Fisher relation and contextualize the system amongst other low-mass galaxies.
△ Less
Submitted 3 October, 2016;
originally announced October 2016.
-
Neural Photo Editing with Introspective Adversarial Networks
Authors:
Andrew Brock,
Theodore Lim,
J. M. Ritchie,
Nick Weston
Abstract:
The increasingly photorealistic sample quality of generative image models suggests their feasibility in applications beyond image generation. We present the Neural Photo Editor, an interface that leverages the power of generative neural networks to make large, semantically coherent changes to existing images. To tackle the challenge of achieving accurate reconstructions without loss of feature qua…
▽ More
The increasingly photorealistic sample quality of generative image models suggests their feasibility in applications beyond image generation. We present the Neural Photo Editor, an interface that leverages the power of generative neural networks to make large, semantically coherent changes to existing images. To tackle the challenge of achieving accurate reconstructions without loss of feature quality, we introduce the Introspective Adversarial Network, a novel hybridization of the VAE and GAN. Our model efficiently captures long-range dependencies through use of a computational block based on weight-shared dilated convolutions, and improves generalization performance with Orthogonal Regularization, a novel weight regularization method. We validate our contributions on CelebA, SVHN, and CIFAR-100, and produce samples and reconstructions with high visual fidelity.
△ Less
Submitted 6 February, 2017; v1 submitted 22 September, 2016;
originally announced September 2016.
-
Generative and Discriminative Voxel Modeling with Convolutional Neural Networks
Authors:
Andrew Brock,
Theodore Lim,
J. M. Ritchie,
Nick Weston
Abstract:
When working with three-dimensional data, choice of representation is key. We explore voxel-based models, and present evidence for the viability of voxellated representations in applications including shape modeling and object classification. Our key contributions are methods for training voxel-based variational autoencoders, a user interface for exploring the latent space learned by the autoencod…
▽ More
When working with three-dimensional data, choice of representation is key. We explore voxel-based models, and present evidence for the viability of voxellated representations in applications including shape modeling and object classification. Our key contributions are methods for training voxel-based variational autoencoders, a user interface for exploring the latent space learned by the autoencoder, and a deep convolutional neural network architecture for object classification. We address challenges unique to voxel-based representations, and empirically evaluate our models on the ModelNet benchmark, where we demonstrate a 51.5% relative improvement in the state of the art for object classification.
△ Less
Submitted 16 August, 2016; v1 submitted 15 August, 2016;
originally announced August 2016.
-
Interactive audio-tactile maps for visually impaired people
Authors:
Anke Brock,
Christophe Jouffrais
Abstract:
Visually impaired people face important challenges related to orientation and mobility. Indeed, 56% of visually impaired people in France declared having problems concerning autonomous mobility. These problems often mean that visually impaired people travel less, which influences their personal and professional life and can lead to exclusion from society. Therefore this issue presents a social cha…
▽ More
Visually impaired people face important challenges related to orientation and mobility. Indeed, 56% of visually impaired people in France declared having problems concerning autonomous mobility. These problems often mean that visually impaired people travel less, which influences their personal and professional life and can lead to exclusion from society. Therefore this issue presents a social challenge as well as an important research area. Accessible geographic maps are helpful for acquiring knowledge about a city's or neighborhood's configuration, as well as selecting a route to reach a destination. Traditionally, raised-line paper maps with braille text have been used. These maps have proved to be efficient for the acquisition of spatial knowledge by visually impaired people. Yet, these maps possess significant limitations. For instance, due to the specificities of the tactile sense only a limited amount of information can be displayed on a single map, which dramatically increases the number of maps that are needed. For the same reason, it is difficult to represent specific information such as distances. Finally, braille labels are used for textual descriptions but only a small percentage of the visually impaired population reads braille. In France 15% of blind people are braille readers and only 10% can read and write. In the United States, fewer than 10% of the legally blind people are braille readers and only 10% of blind children actually learn braille. Recent technological advances have enabled the design of interactive maps with the aim to overcome these limitations. Indeed, interactive maps have the potential to provide a broad spectrum of the population with spatial knowledge, irrespective of age, impairment, skill level, or other factors. To this regard, they might be an efficient means for providing visually impaired people with access to geospatial information. In this paper we give an overview of our research on making geographic maps accessible to visually impaired people.
△ Less
Submitted 3 December, 2015;
originally announced December 2015.
-
Design and User Satisfaction of Interactive Maps for Visually Impaired People
Authors:
Anke Brock,
Philippe Truillet,
Bernard Oriola,
Delphine Picard,
Christophe Jouffrais
Abstract:
Multimodal interactive maps are a solution for presenting spatial information to visually impaired people. In this paper, we present an interactive multimodal map prototype that is based on a tactile paper map, a multi-touch screen and audio output. We first describe the different steps for designing an interactive map: drawing and printing the tactile paper map, choice of multi-touch technology,…
▽ More
Multimodal interactive maps are a solution for presenting spatial information to visually impaired people. In this paper, we present an interactive multimodal map prototype that is based on a tactile paper map, a multi-touch screen and audio output. We first describe the different steps for designing an interactive map: drawing and printing the tactile paper map, choice of multi-touch technology, interaction technologies and the software architecture. Then we describe the method used to assess user satisfaction. We provide data showing that an interactive map - although based on a unique, elementary, double tap interaction - has been met with a high level of user satisfaction. Interestingly, satisfaction is independent of a user's age, previous visual experience or Braille experience. This prototype will be used as a platform to design advanced interactions for spatial learning.
△ Less
Submitted 19 July, 2012;
originally announced July 2012.