-
Guided Evolution with Binary Discriminators for ML Program Search
Authors:
John D. Co-Reyes,
Yingjie Miao,
George Tucker,
Aleksandra Faust,
Esteban Real
Abstract:
How to automatically design better machine learning programs is an open problem within AutoML. While evolution has been a popular tool to search for better ML programs, using learning itself to guide the search has been less successful and less understood on harder problems but has the promise to dramatically increase the speed and final performance of the optimization process. We propose guiding…
▽ More
How to automatically design better machine learning programs is an open problem within AutoML. While evolution has been a popular tool to search for better ML programs, using learning itself to guide the search has been less successful and less understood on harder problems but has the promise to dramatically increase the speed and final performance of the optimization process. We propose guiding evolution with a binary discriminator, trained online to distinguish which program is better given a pair of programs. The discriminator selects better programs without having to perform a costly evaluation and thus speed up the convergence of evolution. Our method can encode a wide variety of ML components including symbolic optimizers, neural architectures, RL loss functions, and symbolic regression equations with the same directed acyclic graph representation. By combining this representation with modern GNNs and an adaptive mutation strategy, we demonstrate our method can speed up evolution across a set of diverse problems including a 3.7x speedup on the symbolic search for ML optimizers and a 4x speedup for RL loss functions.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
AutoNumerics-Zero: Automated Discovery of State-of-the-Art Mathematical Functions
Authors:
Esteban Real,
Yao Chen,
Mirko Rossini,
Connal de Souza,
Manav Garg,
Akhil Verghese,
Moritz Firsching,
Quoc V. Le,
Ekin Dogus Cubuk,
David H. Park
Abstract:
Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, su…
▽ More
Computers calculate transcendental functions by approximating them through the composition of a few limited-precision instructions. For example, an exponential can be calculated with a Taylor series. These approximation methods were developed over the centuries by mathematicians, who emphasized the attainability of arbitrary precision. Computers, however, operate on few limited precision types, such as the popular float32. In this study, we show that when aiming for limited precision, existing approximation methods can be outperformed by programs automatically discovered from scratch by a simple evolutionary algorithm. In particular, over real numbers, our method can approximate the exponential function reaching orders of magnitude more precision for a given number of operations when compared to previous approaches. More practically, over float32 numbers and constrained to less than 1 ULP of error, the same method attains a speedup over baselines by generating code that triggers better XLA/LLVM compilation paths. In other words, in both cases, evolution searched a vast space of possible programs, without knowledge of mathematics, to discover previously unknown optimized approximations to high precision, for the first time. We also give evidence that these results extend beyond the exponential. The ubiquity of transcendental functions suggests that our method has the potential to reduce the cost of scientific computing applications.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Discovering Adaptable Symbolic Algorithms from Scratch
Authors:
Stephen Kelly,
Daniel S. Park,
Xingyou Song,
Mitchell McIntire,
Pranav Nashikkar,
Ritam Guha,
Wolfgang Banzhaf,
Kalyanmoy Deb,
Vishnu Naresh Boddeti,
Jie Tan,
Esteban Real
Abstract:
Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaptation policies, where only model parameters are optimized, ARZ can build control algorithms with the full expre…
▽ More
Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaptation policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies.
△ Less
Submitted 13 October, 2023; v1 submitted 31 July, 2023;
originally announced July 2023.
-
Symbolic Discovery of Optimization Algorithms
Authors:
Xiangning Chen,
Chen Liang,
Da Huang,
Esteban Real,
Kaiyuan Wang,
Yao Liu,
Hieu Pham,
Xuanyi Dong,
Thang Luong,
Cho-Jui Hsieh,
Yifeng Lu,
Quoc V. Le
Abstract:
We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program space. To bridge the large generalization gap between proxy and target tasks, we also introduce program selection and simplification strategies. Our method discove…
▽ More
We present a method to formulate algorithm discovery as program search, and apply it to discover optimization algorithms for deep neural network training. We leverage efficient search techniques to explore an infinite and sparse program space. To bridge the large generalization gap between proxy and target tasks, we also introduce program selection and simplification strategies. Our method discovers a simple and effective optimization algorithm, $\textbf{Lion}$ ($\textit{Evo$\textbf{L}$ved S$\textbf{i}$gn M$\textbf{o}$me$\textbf{n}$tum}$). It is more memory-efficient than Adam as it only keeps track of the momentum. Different from adaptive optimizers, its update has the same magnitude for each parameter calculated through the sign operation. We compare Lion with widely used optimizers, such as Adam and Adafactor, for training a variety of models on different tasks. On image classification, Lion boosts the accuracy of ViT by up to 2% on ImageNet and saves up to 5x the pre-training compute on JFT. On vision-language contrastive learning, we achieve 88.3% $\textit{zero-shot}$ and 91.1% $\textit{fine-tuning}$ accuracy on ImageNet, surpassing the previous best results by 2% and 0.1%, respectively. On diffusion models, Lion outperforms Adam by achieving a better FID score and reducing the training compute by up to 2.3x. For autoregressive, masked language modeling, and fine-tuning, Lion exhibits a similar or better performance compared to Adam. Our analysis of Lion reveals that its performance gain grows with the training batch size. It also requires a smaller learning rate than Adam due to the larger norm of the update produced by the sign function. Additionally, we examine the limitations of Lion and identify scenarios where its improvements are small or not statistically significant. Lion is also successfully deployed in production systems such as Google search ads CTR model.
△ Less
Submitted 8 May, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Unified Functional Hashing in Automatic Machine Learning
Authors:
Ryan Gillard,
Stephen Jonany,
Yingjie Miao,
Michael Munn,
Connal de Souza,
Jonathan Dungay,
Chen Liang,
David R. So,
Quoc V. Le,
Esteban Real
Abstract:
The field of Automatic Machine Learning (AutoML) has recently attained impressive results, including the discovery of state-of-the-art machine learning solutions, such as neural image classifiers. This is often done by applying an evolutionary search method, which samples multiple candidate solutions from a large space and evaluates the quality of each candidate through a long training process. As…
▽ More
The field of Automatic Machine Learning (AutoML) has recently attained impressive results, including the discovery of state-of-the-art machine learning solutions, such as neural image classifiers. This is often done by applying an evolutionary search method, which samples multiple candidate solutions from a large space and evaluates the quality of each candidate through a long training process. As a result, the search tends to be slow. In this paper, we show that large efficiency gains can be obtained by employing a fast unified functional hash, especially through the functional equivalence caching technique, which we also present. The central idea is to detect by hashing when the search method produces equivalent candidates, which occurs very frequently, and this way avoid their costly re-evaluation. Our hash is "functional" in that it identifies equivalent candidates even if they were represented or coded differently, and it is "unified" in that the same algorithm can hash arbitrary representations; e.g. compute graphs, imperative code, or lambda functions. As evidence, we show dramatic improvements on multiple AutoML domains, including neural architecture search and algorithm discovery. Finally, we consider the effect of hash collisions, evaluation noise, and search distribution through empirical analysis. Altogether, we hope this paper may serve as a guide to hashing techniques in AutoML.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
PyGlove: Efficiently Exchanging ML Ideas as Code
Authors:
Daiyi Peng,
Xuanyi Dong,
Esteban Real,
Yifeng Lu,
Quoc V. Le
Abstract:
The increasing complexity and scale of machine learning (ML) has led to the need for more efficient collaboration among multiple teams. For example, when a research team invents a new architecture like "ResNet," it is desirable for multiple engineering teams to adopt it. However, the effort required for each team to study and understand the invention does not scale well with the number of teams or…
▽ More
The increasing complexity and scale of machine learning (ML) has led to the need for more efficient collaboration among multiple teams. For example, when a research team invents a new architecture like "ResNet," it is desirable for multiple engineering teams to adopt it. However, the effort required for each team to study and understand the invention does not scale well with the number of teams or inventions. In this paper, we present an extension of our PyGlove library to easily and scalably share ML ideas. PyGlove represents ideas as symbolic rule-based patches, enabling researchers to write down the rules for models they have not seen. For example, an inventor can write rules that will "add skip-connections." This permits a network effect among teams: at once, any team can issue patches to all other teams. Such a network effect allows users to quickly surmount the cost of adopting PyGlove by writing less code quicker, providing a benefit that scales with time. We describe the new paradigm of organizing ML through symbolic patches and compare it to existing approaches. We also perform a case study of a large codebase where PyGlove led to an 80% reduction in the number of lines of code.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and Stability
Authors:
Juan Jose Garau-Luis,
Yingjie Miao,
John D. Co-Reyes,
Aaron Parisi,
Jie Tan,
Esteban Real,
Aleksandra Faust
Abstract:
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world. Designing RL algorithms that optimize these objectives can be a costly and painstaking process. This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions. MetaPG explicitly optimizes for generalizability and performance, and implicit…
▽ More
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world. Designing RL algorithms that optimize these objectives can be a costly and painstaking process. This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions. MetaPG explicitly optimizes for generalizability and performance, and implicitly optimizes the stability of both metrics. We initialize our loss function population with Soft Actor-Critic (SAC) and perform multi-objective optimization using fitness metrics encoding single-task performance, zero-shot generalizability to unseen environment configurations, and stability across independent runs with different random seeds. On a set of continuous control tasks from the Real-World RL Benchmark Suite, we find that our method, using a single environment during evolution, evolves algorithms that improve upon SAC's performance and generalizability by 4% and 20%, respectively, and reduce instability up to 67%. Then, we scale up to more complex environments from the Brax physics simulator and replicate generalizability tests encountered in practical settings, such as different friction coefficients. MetaPG evolves algorithms that can obtain 10% better generalizability without loss of performance within the same meta-training environment and obtain similar results to SAC when doing cross-domain evaluations in other Brax environments. The evolution results are interpretable; by analyzing the structure of the best algorithms we identify elements that help optimizing certain objectives, such as regularization terms for the critic loss.
△ Less
Submitted 24 April, 2023; v1 submitted 8 April, 2022;
originally announced April 2022.
-
PyGlove: Symbolic Programming for Automated Machine Learning
Authors:
Daiyi Peng,
Xuanyi Dong,
Esteban Real,
Mingxing Tan,
Yifeng Lu,
Hanxiao Liu,
Gabriel Bender,
Adam Kraft,
Chen Liang,
Quoc V. Le
Abstract:
Neural networks are sensitive to hyper-parameter and architecture choices. Automated Machine Learning (AutoML) is a promising paradigm for automating these choices. Current ML software libraries, however, are quite limited in handling the dynamic interactions among the components of AutoML. For example, efficientNAS algorithms, such as ENAS and DARTS, typically require an implementation coupling b…
▽ More
Neural networks are sensitive to hyper-parameter and architecture choices. Automated Machine Learning (AutoML) is a promising paradigm for automating these choices. Current ML software libraries, however, are quite limited in handling the dynamic interactions among the components of AutoML. For example, efficientNAS algorithms, such as ENAS and DARTS, typically require an implementation coupling between the search space and search algorithm, the two key components in AutoML. Furthermore, implementing a complex search flow, such as searching architectures within a loop of searching hardware configurations, is difficult. To summarize, changing the search space, search algorithm, or search flow in current ML libraries usually requires a significant change in the program logic. In this paper, we introduce a new way of programming AutoML based on symbolic programming. Under this paradigm, ML programs are mutable, thus can be manipulated easily by another program. As a result, AutoML can be reformulated as an automated process of symbolic manipulation. With this formulation, we decouple the triangle of the search algorithm, the search space and the child program. This decoupling makes it easy to change the search space and search algorithm (without and with weight sharing), as well as to add search capabilities to existing code and implement complex search flows. We then introduce PyGlove, a new Python library that implements this paradigm. Through case studies on ImageNet and NAS-Bench-101, we show that with PyGlove users can easily convert a static program into a search space, quickly iterate on the search spaces and search algorithms, and craft complex search flows to achieve better results.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
Evolving Reinforcement Learning Algorithms
Authors:
John D. Co-Reyes,
Yingjie Miao,
Daiyi Peng,
Esteban Real,
Sergey Levine,
Quoc V. Le,
Honglak Lee,
Aleksandra Faust
Abstract:
We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, l…
▽ More
We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.
△ Less
Submitted 10 November, 2022; v1 submitted 8 January, 2021;
originally announced January 2021.
-
Statistical analysis of the material, geometrical and imperfection characteristics of structural stainless steels and members
Authors:
Itsaso Arrayago,
Kim J. R. Rasmussen,
Esther Real
Abstract:
Traditional member-based two-step design approaches included in current structural codes for steel structures, as well as more recent system-based direct-design alternatives, require building rigorous structural reliability frameworks for the calibration of partial coefficients (resistance factors) to achieve specified target reliability requirements. Key design parameters affecting the strength o…
▽ More
Traditional member-based two-step design approaches included in current structural codes for steel structures, as well as more recent system-based direct-design alternatives, require building rigorous structural reliability frameworks for the calibration of partial coefficients (resistance factors) to achieve specified target reliability requirements. Key design parameters affecting the strength of structures and their random variations are generally modelled by nominal or characteristic values in design standards, which are combined with partial coefficients that need to be calibrated from measurements on real samples. While the statistical characterization of material and geometric properties of structural steels has been consolidated over the last decades, information about the characterization of structural stainless steels is virtually non-existent due to the limited pool of available data. Thus, this paper presents the basic ingredient for developing reliability frameworks for stainless steel structures and components by statistically characterizing the main random parameters affecting their strength through a comprehensive database collected from the literature. Based on the collected data, appropriate probabilistic models are proposed for geometric properties, material properties, imperfections and residual stresses of different stainless steel alloys and cross-section or product types. The data is equally applicable to member-based reliability analyses as described in existing codes and system-based analyses targeted at the direct-design of stainless steel structures by advanced analysis.
△ Less
Submitted 26 October, 2021; v1 submitted 28 October, 2020;
originally announced October 2020.
-
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Authors:
Esteban Real,
Chen Liang,
David R. So,
Quoc V. Le
Abstract:
Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spac…
▽ More
Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks---or similarly restrictive search spaces. Our goal is to show that AutoML can go further: it is possible today to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. Despite the vastness of this space, evolutionary search can still discover two-layer neural networks trained by backpropagation. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR-10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging. Moreover, evolution adapts algorithms to different task types: e.g., dropout-like techniques appear when little data is available. We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction for the field.
△ Less
Submitted 30 June, 2020; v1 submitted 6 March, 2020;
originally announced March 2020.
-
NAS-Bench-101: Towards Reproducible Neural Architecture Search
Authors:
Chris Ying,
Aaron Klein,
Esteban Real,
Eric Christiansen,
Kevin Murphy,
Frank Hutter
Abstract:
Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We aim to ameliorate these problems by introducing NAS-Bench-101, the first public architecture dataset for NAS research. To build NAS-Bench-101, we carefully constru…
▽ More
Recent advances in neural architecture search (NAS) demand tremendous computational resources, which makes it difficult to reproduce experiments and imposes a barrier-to-entry to researchers without access to large-scale computation. We aim to ameliorate these problems by introducing NAS-Bench-101, the first public architecture dataset for NAS research. To build NAS-Bench-101, we carefully constructed a compact, yet expressive, search space, exploiting graph isomorphisms to identify 423k unique convolutional architectures. We trained and evaluated all of these architectures multiple times on CIFAR-10 and compiled the results into a large dataset of over 5 million trained models. This allows researchers to evaluate the quality of a diverse range of models in milliseconds by querying the pre-computed dataset. We demonstrate its utility by analyzing the dataset as a whole and by benchmarking a range of architecture optimization algorithms.
△ Less
Submitted 14 May, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Regularized Evolution for Image Classifier Architecture Search
Authors:
Esteban Real,
Alok Aggarwal,
Yanping Huang,
Quoc V Le
Abstract:
The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier---AmoebaNet-A---that surpasses hand-design…
▽ More
The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier---AmoebaNet-A---that surpasses hand-designs for the first time. To do this, we modify the tournament selection evolutionary algorithm by introducing an age property to favor the younger genotypes. Matching size, AmoebaNet-A has comparable accuracy to current state-of-the-art ImageNet models discovered with more complex architecture-search methods. Scaled to larger size, AmoebaNet-A sets a new state-of-the-art 83.9% / 96.6% top-5 ImageNet accuracy. In a controlled comparison against a well known reinforcement learning algorithm, we give evidence that evolution can obtain results faster with the same hardware, especially at the earlier stages of the search. This is relevant when fewer compute resources are available. Evolution is, thus, a simple method to effectively discover high-quality architectures.
△ Less
Submitted 16 February, 2019; v1 submitted 5 February, 2018;
originally announced February 2018.
-
Large-Scale Evolution of Image Classifiers
Authors:
Esteban Real,
Sherry Moore,
Andrew Selle,
Saurabh Saxena,
Yutaka Leon Suematsu,
Jie Tan,
Quoc Le,
Alex Kurakin
Abstract:
Neural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone. Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. Despite significant computational requirements, we show that it is now possible to evolve models with accurac…
▽ More
Neural networks have proven effective at solving difficult problems but designing their architectures can be challenging, even for image classification problems alone. Our goal is to minimize human participation, so we employ evolutionary algorithms to discover such networks automatically. Despite significant computational requirements, we show that it is now possible to evolve models with accuracies within the range of those published in the last year. Specifically, we employ simple evolutionary techniques at unprecedented scales to discover models for the CIFAR-10 and CIFAR-100 datasets, starting from trivial initial conditions and reaching accuracies of 94.6% (95.6% for ensemble) and 77.0%, respectively. To do this, we use novel and intuitive mutation operators that navigate large search spaces; we stress that no human participation is required once evolution starts and that the output is a fully-trained model. Throughout this work, we place special emphasis on the repeatability of results, the variability in the outcomes and the computational requirements.
△ Less
Submitted 11 June, 2017; v1 submitted 3 March, 2017;
originally announced March 2017.
-
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
Authors:
Esteban Real,
Jonathon Shlens,
Stefano Mazzocchi,
Xin Pan,
Vincent Vanhoucke
Abstract:
We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. T…
▽ More
We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the MS COCO label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second. The use of a cascade of increasingly precise human annotations ensures a label accuracy above 95% for every class and tight bounding boxes. Finally, we train and evaluate well-known deep network architectures and report baseline figures for per-frame classification and localization to provide a point of comparison for future work. We also demonstrate how the temporal contiguity of video can potentially be used to improve such inferences. Please see the PDF file to find the URL to download the data. We hope the availability of such large curated corpus will spur new advances in video object detection and tracking.
△ Less
Submitted 24 March, 2017; v1 submitted 2 February, 2017;
originally announced February 2017.
-
Attention for Fine-Grained Categorization
Authors:
Pierre Sermanet,
Andrea Frome,
Esteban Real
Abstract:
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, specifically fine-grained categorization on the Stanford Dogs data set. In this work we use an RNN of the same structure but substitute a more powerful visual network and perform large-scale pre-training of the visual network outside of the atte…
▽ More
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, specifically fine-grained categorization on the Stanford Dogs data set. In this work we use an RNN of the same structure but substitute a more powerful visual network and perform large-scale pre-training of the visual network outside of the attention RNN. Most work in attention models to date focuses on tasks with toy or more constrained visual environments, whereas we present results for fine-grained categorization better than the state-of-the-art GoogLeNet classification model. We show that our model learns to direct high resolution attention to the most discriminative regions without any spatial supervision such as bounding boxes, and it is able to discriminate fine-grained dog breeds moderately well even when given only an initial low-resolution context image and narrow, inexpensive glimpses at faces and fur patterns. This and similar attention models have the major advantage of being trained end-to-end, as opposed to other current detection and recognition pipelines with hand-engineered components where information is lost. While our model is state-of-the-art, further work is needed to fully leverage the sequential input.
△ Less
Submitted 10 April, 2015; v1 submitted 22 December, 2014;
originally announced December 2014.