Search | arXiv e-print repository

Characterization of AlGaAs/GeSn heterojunction band alignment via X-ray photoelectron spectroscopy

Authors: Yang Liu, Jiarui Gong, Sudip Acharya, Yiran Lia, Alireza Abrand, Justin M. Rudie, Jie Zhou, Yi Lu, Haris Naeem Abbasi, Daniel Vincent, Samuel Haessly, Tsung-Han Tsai, Parsian K. Mohseni, Shui-Qing Yu, Zhenqiang Ma

Abstract: GeSn-based SWIR lasers featuring imaging, sensing, and communications has gained dynamic development recently. However, the existing SiGeSn/GeSn double heterostructure lacks adequate electron confinement and is insufficient for room temperature lasing. The recently demonstrated semiconductor grafting technique provides a viable approach towards AlGaAs/GeSn p-i-n heterojunctions with better electro… ▽ More GeSn-based SWIR lasers featuring imaging, sensing, and communications has gained dynamic development recently. However, the existing SiGeSn/GeSn double heterostructure lacks adequate electron confinement and is insufficient for room temperature lasing. The recently demonstrated semiconductor grafting technique provides a viable approach towards AlGaAs/GeSn p-i-n heterojunctions with better electron confinement and high-quality interfaces, promising for room temperature electrically pumped GeSn laser devices. Therefore, understanding and quantitatively characterizing the band alignment in this grafted heterojunction is crucial. In this study, we explore the band alignment in the grafted monocrystalline Al0.3Ga0.7As /Ge0.853Sn0.147 p-i-n heterojunction. We determined the bandgap values of AlGaAs and GeSn to be 1.81 eV and 0.434 eV by photoluminescence measurements, respectively. We further conducted X-ray photoelectron spectroscopy measurements and extracted a valence band offset of 0.19 eV and a conduction band offset of 1.186 eV. A Type-I band alignment was confirmed which effectively confining electrons at the AlGaAs/GeSn interface. This study improves our understanding of the interfacial band structure in grafted AlGaAs/GeSn heterostructure, providing experimental evidence of the Type-I band alignment between AlGaAs and GeSn, and paving the way for their application in laser technologies. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 18 pages, 4 figures

arXiv:2408.08451 [pdf]

AlGaAs/GeSn p-i-n diode interfaced with ultrathin Al$_2$O$_3$

Authors: Yang Liu, Yiran Li, Sudip Acharya, Jie Zhou, Jiarui Gong, Alireza Abrand, Yi Lu, Daniel Vincent, Samuel Haessly, Parsian K. Mohseni, Shui-Qing Yu, Zhenqiang Ma

Abstract: This study presents the fabrication and characterizations of an Al$_{0.3}$Ga$_{0.7}$As/Ge$_{0.87}$Sn$_{0.13}$/GeSn p-i-n double heterostructure (DHS) diode following the grafting approach for enhanced optoelectronic applications. By integrating ultra-thin Al$_2$O$_3$ as a quantum tunneling layer and enhancing interfacial double-side passivation, we achieved a heterostructure with a substantial 1.1… ▽ More This study presents the fabrication and characterizations of an Al$_{0.3}$Ga$_{0.7}$As/Ge$_{0.87}$Sn$_{0.13}$/GeSn p-i-n double heterostructure (DHS) diode following the grafting approach for enhanced optoelectronic applications. By integrating ultra-thin Al$_2$O$_3$ as a quantum tunneling layer and enhancing interfacial double-side passivation, we achieved a heterostructure with a substantial 1.186 eV conduction band barrier between AlGaAs and GeSn, along with a low interfacial density of states. The diode demonstrated impressive electrical characteristics with high uniformity, including a mean ideality factor of 1.47 and a mean rectification ratio of 2.95E103 at +/-2 V across 326 devices, indicating high-quality device fabrication. Comprehensive electrical characterizations, including C-V and I-V profiling, affirm the diode's capability to provide robust electrical confinement and efficient carrier injection. These properties make the Al$_{0.3}$Ga$_{0.7}$As/Ge$_{0.87}$Sn$_{0.13}$/GeSn DHS a promising candidate for next-generation electrically pumped GeSn lasers, potentially operable at higher temperatures. Our results provide a viable pathway for further advancements in various GeSn-based devices. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: 5 pages, 4 figures

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.04229 [pdf, other]

MusicRL: Aligning Music Generation to Human Preferences

Authors: Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli

Abstract: We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such… ▽ More We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2306.12925 [pdf, other]

AudioPaLM: A Large Language Model That Can Speak and Listen

Authors: Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats , et al. (5 additional authors not shown)

Abstract: We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the… ▽ More We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2. We demonstrate that initializing AudioPaLM with the weights of a text-only large language model improves speech processing, successfully leveraging the larger quantity of text training data used in pretraining to assist with the speech tasks. The resulting model significantly outperforms existing systems for speech translation tasks and has the ability to perform zero-shot speech-to-text translation for many languages for which input/target language combinations were not seen in training. AudioPaLM also demonstrates features of audio language models, such as transferring a voice across languages based on a short spoken prompt. We release examples of our method at https://google-research.github.io/seanet/audiopalm/examples △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: Technical report

arXiv:2305.09636 [pdf, other]

SoundStorm: Efficient Parallel Audio Generation

Authors: Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi

Abstract: We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a neural audio codec. Compared to the autoregressive generation approach of AudioLM, our model produces audio of the same quality and with higher consist… ▽ More We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding to generate the tokens of a neural audio codec. Compared to the autoregressive generation approach of AudioLM, our model produces audio of the same quality and with higher consistency in voice and acoustic conditions, while being two orders of magnitude faster. SoundStorm generates 30 seconds of audio in 0.5 seconds on a TPU-v4. We demonstrate the ability of our model to scale audio generation to longer sequences by synthesizing high-quality, natural dialogue segments, given a transcript annotated with speaker turns and a short prompt with the speakers' voices. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2304.02217 [pdf]

XPS analysis of molecular contamination and sp2 amorphous carbon on oxidized (100) diamond

Authors: Ricardo Vidrio, Daniel Vincent, Benjamin Bachman, Cesar Saucedo, Maryam Zahedian, Zihong Xu, Junyu Lai, Timothy A. Grotjohn, Shimon Kolkowitz, Jung-Hun Seo, Robert J. Hamers, Keith G. Ray, Zhenqiang Ma, Jennifer T. Choy

Abstract: The efficacy of oxygen (O) surface terminations on diamond is an important factor for the performance and stability for diamond-based quantum sensors and electronics. Given the wide breadth of O-termination techniques, it can be difficult to discern which method would yield the highest and most consistent O coverage. Furthermore, the interpretation of surface characterization techniques is complic… ▽ More The efficacy of oxygen (O) surface terminations on diamond is an important factor for the performance and stability for diamond-based quantum sensors and electronics. Given the wide breadth of O-termination techniques, it can be difficult to discern which method would yield the highest and most consistent O coverage. Furthermore, the interpretation of surface characterization techniques is complicated by surface morphology and purity, which if not accounted for will yield inconsistent determination of the oxygen coverage. We present a comprehensive approach to consistently prepare and analyze oxygen termination of surfaces on (100) single-crystalline diamond. We report on X-ray Photoelectron Spectroscopy (XPS) characterization of diamond surfaces treated with six oxidation methods that include various wet chemical oxidation techniques, photochemical oxidation with UV illumination, and steam oxidation using atomic layer deposition (ALD). Our analysis entails a rigorous XPS peak-fitting procedure for measuring the functionalization of O-terminated diamond. The findings herein have provided molecular-level insights on oxidized surfaces in (100) diamond, including the demonstration of clear correlation between the measured oxygen atomic percentage and the presence of molecular contaminants containing nitrogen, silicon, and sulfur. We also provide a comparison of the sp2 carbon content with the O1s atomic percentage and discern a correlation with the diamond samples treated with dry oxidation which eventually tapers off at a max O1s atomic percentage value of 7.09 +/- 0.40%. Given these results, we conclude that the dry oxidation methods yield some of the highest oxygen amounts, with the ALD water vapor technique proving to be the cleanest technique out of all the oxidation methods explored in this work. △ Less

Submitted 8 May, 2024; v1 submitted 5 April, 2023; originally announced April 2023.

arXiv:2302.03540 [pdf, other]

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

Authors: Eugene Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Sertan Girgin, Olivier Pietquin, Matt Sharifi, Marco Tagliasacchi, Neil Zeghidour

Abstract: We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as a composition of two sequence-to-sequence tasks: from text to high-level semantic tokens (akin to "reading") and from semantic tokens to low-level acoustic tokens ("speaking"). Decoupling these two tasks enables… ▽ More We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as a composition of two sequence-to-sequence tasks: from text to high-level semantic tokens (akin to "reading") and from semantic tokens to low-level acoustic tokens ("speaking"). Decoupling these two tasks enables training of the "speaking" module using abundant audio-only data, and unlocks the highly efficient combination of pretraining and backtranslation to reduce the need for parallel data when training the "reading" component. To control the speaker identity, we adopt example prompting, which allows SPEAR-TTS to generalize to unseen speakers using only a short sample of 3 seconds, without any explicit speaker representation or speaker-id labels. Our experiments demonstrate that SPEAR-TTS achieves a character error rate that is competitive with state-of-the-art methods using only 15 minutes of parallel data, while matching ground-truth speech in terms of naturalness and acoustic quality, as measured in subjective tests. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2209.03143 [pdf, other]

AudioLM: a Language Modeling Approach to Audio Generation

Authors: Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour

Abstract: We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenizati… ▽ More We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, we demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music. △ Less

Submitted 25 July, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

arXiv:2202.10255 [pdf, other]

Length partition of random multicurves on large genus hyperbolic surfaces

Authors: Delecroix Vincent, Liu Mingkun

Abstract: We study the length statistics of the components of a random multicurve on a surface of genus $g \geq 2$. For each fixed genus, the existence of such statistics follows from the work of M.~Mirzakhani, F.~Arana-Herrera and M.~Liu. We prove that as the genus $g$ tends to infinity the statistics converge in law to the Poisson--Dirichlet distribution of parameter $θ=1/2$. In particular, as the genus t… ▽ More We study the length statistics of the components of a random multicurve on a surface of genus $g \geq 2$. For each fixed genus, the existence of such statistics follows from the work of M.~Mirzakhani, F.~Arana-Herrera and M.~Liu. We prove that as the genus $g$ tends to infinity the statistics converge in law to the Poisson--Dirichlet distribution of parameter $θ=1/2$. In particular, as the genus tends to infinity the mean length of the three longest components converge respectively to $75.8\%$, $17.1\%$ and $4.9\%$ of the total length. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2111.02767 [pdf, other]

RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning

Authors: Sabela Ramos, Sertan Girgin, Léonard Hussenot, Damien Vincent, Hanna Yakubovich, Daniel Toyama, Anita Gergely, Piotr Stanczyk, Raphael Marinier, Jeremiah Harmsen, Olivier Pietquin, Nikola Momchev

Abstract: We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning. RLDS enables not only reproducibility of existing research and easy generation of new datasets, but also acceler… ▽ More We introduce RLDS (Reinforcement Learning Datasets), an ecosystem for recording, replaying, manipulating, annotating and sharing data in the context of Sequential Decision Making (SDM) including Reinforcement Learning (RL), Learning from Demonstrations, Offline RL or Imitation Learning. RLDS enables not only reproducibility of existing research and easy generation of new datasets, but also accelerates novel research. By providing a standard and lossless format of datasets it enables to quickly test new algorithms on a wider range of tasks. The RLDS ecosystem makes it easy to share datasets without any loss of information and to be agnostic to the underlying original format when applying various data processing pipelines to large collections of datasets. Besides, RLDS provides tools for collecting data generated by either synthetic agents or humans, as well as for inspecting and manipulating the collected data. Ultimately, integration with TFDS facilitates the sharing of RL datasets with the research community. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: https://github.com/google-research/rlds

arXiv:2110.10149 [pdf, other]

Continuous Control with Action Quantization from Demonstrations

Authors: Robert Dadashi, Léonard Hussenot, Damien Vincent, Sertan Girgin, Anton Raichuk, Matthieu Geist, Olivier Pietquin

Abstract: In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations. This discretization returns a set of plausible actions (in light of the demonstrations) for each input state, thus cap… ▽ More In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations. This discretization returns a set of plausible actions (in light of the demonstrations) for each input state, thus capturing the priors of the demonstrator and their multimodal behavior. By discretizing the action space, any discrete action deep RL technique can be readily applied to the continuous control problem. Experiments show that the proposed approach outperforms state-of-the-art methods such as SAC in the RL setup, and GAIL in the Imitation Learning setup. We provide a website with interactive videos: https://google-research.github.io/aquadem/ and make the code available: https://github.com/google-research/google-research/tree/master/aquadem. △ Less

Submitted 3 June, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: Accepted to ICML 2022

arXiv:2110.05407 [pdf, other]

A new orbit closure in genus 8?

Authors: Delecroix Vincent, Rüth Julian

Abstract: We provide numerical evidence that the orbit closure of the unfolding of the $(3,4,13)$-triangle is a previously unknown 4-dimensional variety. We provide numerical evidence that the orbit closure of the unfolding of the $(3,4,13)$-triangle is a previously unknown 4-dimensional variety. △ Less

Submitted 19 September, 2023; v1 submitted 11 October, 2021; originally announced October 2021.

arXiv:2106.00672 [pdf, other]

What Matters for Adversarial Imitation Learning?

Authors: Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz

Abstract: Adversarial imitation learning has become a popular framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and un… ▽ More Adversarial imitation learning has become a popular framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter. To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. While many of our findings confirm common practices, some of them are surprising or even contradict prior work. In particular, our results suggest that artificial demonstrations are not a good proxy for human data and that the very common practice of evaluating imitation algorithms only with synthetic demonstrations may lead to algorithms which perform poorly in the more realistic scenarios with human demonstrations. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2105.12034 [pdf, other]

Hyperparameter Selection for Imitation Learning

Authors: Leonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Lukasz Stafiniak, Sertan Girgin, Raphael Marinier, Nikola Momchev, Sabela Ramos, Manu Orsini, Olivier Bachem, Matthieu Geist, Olivier Pietquin

Abstract: We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting. Indeed, would this reward fu… ▽ More We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting. Indeed, would this reward function be available, it could then directly be used for policy training and imitation would not be necessary. To tackle this mostly ignored problem, we propose a number of possible proxies to the external reward. We evaluate them in an extensive empirical study (more than 10'000 agents across 9 environments) and make practical recommendations for selecting HPs. Our results show that while imitation learning algorithms are sensitive to HP choices, it is often possible to select good enough HPs through a proxy to the reward function. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: ICML 2021

arXiv:2102.09566 [pdf, other]

doi 10.1093/mnras/stab1665

The Origins of Off-Centre Massive Black Holes in Dwarf Galaxies

Authors: Jillian M. Bellovary, Sarra Hayoune, Katheryn Chafla, Donovan Vincent, Alyson Brooks, Charlotte Christensen, Ferah Munshi, Michael Tremmel, Thomas R. Quinn, Jordan Van Nest, Serena K. Sligh, Michelle Luzuriaga

Abstract: Massive black holes often exist within dwarf galaxies, and both simulations and observations have shown that a substantial fraction of these may be off-center with respect to their hosts. We trace the evolution of off-center massive black holes (MBHs) in dwarf galaxies using cosmological hydrodynamical simulations, and show that the reason for off-center locations is mainly due to galaxy-galaxy me… ▽ More Massive black holes often exist within dwarf galaxies, and both simulations and observations have shown that a substantial fraction of these may be off-center with respect to their hosts. We trace the evolution of off-center massive black holes (MBHs) in dwarf galaxies using cosmological hydrodynamical simulations, and show that the reason for off-center locations is mainly due to galaxy-galaxy mergers. We calculate dynamical timescales and show that off-center MBHs are unlikely to sink to their galaxys' centers within a Hubble time, due to the shape of the hosts' potential wells and low stellar densities. These wandering MBHs are unlikely to be detected electromagnetically, nor is there a measurable dynamical effect on the galaxy's stellar population. We conclude that off-center MBHs may be common in dwarfs, especially if the mass of the MBH is small or the stellar mass of the host galaxy is large. However detecting them is extremely challenging, because their accretion luminosities are very low and they do not measurably alter the dynamics of their host galaxies. △ Less

Submitted 22 October, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: updated to reflect MNRAS accepted version. Includes carbon footprint estimate due to supercomputer use

arXiv:2006.00979 [pdf, other]

Acme: A Research Framework for Distributed Reinforcement Learning

Authors: Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang , et al. (14 additional authors not shown)

Abstract: Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce publishe… ▽ More Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation. This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme. △ Less

Submitted 20 September, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme

arXiv:1907.11180 [pdf, other]

Google Research Football: A Novel Reinforcement Learning Environment

Authors: Karol Kurach, Anton Raichuk, Piotr Stańczyk, Michał Zając, Olivier Bachem, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet, Sylvain Gelly

Abstract: Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator… ▽ More Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research Football Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. In addition, it provides support for multiplayer and multi-agent experiments. We propose three full-game scenarios of varying difficulty with the Football Benchmarks and report baseline results for three commonly used reinforcement algorithms (IMPALA, PPO, and Ape-X DQN). We also provide a diverse set of simpler scenarios with the Football Academy and showcase several promising research directions. △ Less

Submitted 14 April, 2020; v1 submitted 25 July, 2019; originally announced July 2019.

arXiv:1907.00868 [pdf, other]

MULEX: Disentangling Exploitation from Exploration in Deep RL

Authors: Lucas Beyer, Damien Vincent, Olivier Teboul, Sylvain Gelly, Matthieu Geist, Olivier Pietquin

Abstract: An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour. This trade-off is usually obtained by perturbing either the agent's actions (e.g., e-greedy or Gibbs sampling) or the agent's parameters (e.g., NoisyNet), or by modifying the reward it re… ▽ More An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour. This trade-off is usually obtained by perturbing either the agent's actions (e.g., e-greedy or Gibbs sampling) or the agent's parameters (e.g., NoisyNet), or by modifying the reward it receives (e.g., exploration bonus, intrinsic motivation, or hand-shaped rewards). Here, we adopt a disruptive but simple and generic perspective, where we explicitly disentangle exploration and exploitation. Different losses are optimized in parallel, one of them coming from the true objective (maximizing cumulative rewards from the environment) and others being related to exploration. Every loss is used in turn to learn a policy that generates transitions, all shared in a single replay buffer. Off-policy methods are then applied to these transitions to optimize each loss. We showcase our approach on a hard-exploration environment, show its sample-efficiency and robustness, and discuss further implications. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1906.07987 [pdf, other]

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Authors: Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

Abstract: We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this pa… ▽ More We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates. We demonstrate in a variety of policy evaluation tasks that this simple adaptive algorithm performs competitively with the best approach in hindsight, suggesting that learned confidence intervals are a powerful technique for adapting policy evaluation to use TD or MC returns in a data-driven way. △ Less

Submitted 19 June, 2019; originally announced June 2019.

arXiv:1810.02274 [pdf, other]

Episodic Curiosity through Reachability

Authors: Nikolay Savinov, Anton Raichuk, Raphaël Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly

Abstract: Rewards are sparse in the real world and most of today's reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the agent to create rewards for itself - thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up w… ▽ More Rewards are sparse in the real world and most of today's reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the agent to create rewards for itself - thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up with the real task reward - making it possible for RL algorithms to learn from the combined reward. We propose a new curiosity method which uses episodic memory to form the novelty bonus. To determine the bonus, the current observation is compared with the observations in memory. Crucially, the comparison is done based on how many environment steps it takes to reach the current observation from those in memory - which incorporates rich information about environment dynamics. This allows us to overcome the known "couch-potato" issues of prior work - when the agent finds a way to instantly gratify itself by exploiting actions which lead to hardly predictable consequences. We test our approach in visually rich 3D environments in ViZDoom, DMLab and MuJoCo. In navigational tasks from ViZDoom and DMLab, our agent outperforms the state-of-the-art curiosity method ICM. In MuJoCo, an ant equipped with our curiosity module learns locomotion out of the first-person-view curiosity only. △ Less

Submitted 6 August, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

Comments: Accepted to ICLR 2019. Code at https://github.com/google-research/episodic-curiosity/. Videos at https://sites.google.com/view/episodic-curiosity/

arXiv:1807.03064 [pdf, other]

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

Authors: Hugo Penedones, Damien Vincent, Hartmut Maennel, Sylvain Gelly, Timothy Mann, Andre Barreto

Abstract: Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation. To increase our understanding of the problem, we investigate the issue of approximation errors in areas of sharp discontinuities of the value function being further propagated by bootstr… ▽ More Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation. To increase our understanding of the problem, we investigate the issue of approximation errors in areas of sharp discontinuities of the value function being further propagated by bootstrap updates. We show empirical evidence of this leakage propagation, and show analytically that it must occur, in a simple Markov chain, when function approximation errors are present. For reversible policies, the result can be interpreted as the tension between two terms of the loss function that TD minimises, as recently described by [Ollivier, 2018]. We show that the upper bounds from [Tsitsiklis and Van Roy, 1997] hold, but they do not imply that leakage propagation occurs and under what conditions. Finally, we test whether the problem could be mitigated with a better state representation, and whether it can be learned in an unsupervised manner, without rewards or privileged information. △ Less

Submitted 9 July, 2018; originally announced July 2018.

arXiv:1804.11130 [pdf, other]

Competitive Training of Mixtures of Independent Deep Generative Models

Authors: Francesco Locatello, Damien Vincent, Ilya Tolstikhin, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf

Abstract: A common assumption in causal modeling posits that the data is generated by a set of independent mechanisms, and algorithms should aim to recover this structure. Standard unsupervised learning, however, is often concerned with training a single model to capture the overall distribution or aspects thereof. Inspired by clustering approaches, we consider mixtures of implicit generative models that ``… ▽ More A common assumption in causal modeling posits that the data is generated by a set of independent mechanisms, and algorithms should aim to recover this structure. Standard unsupervised learning, however, is often concerned with training a single model to capture the overall distribution or aspects thereof. Inspired by clustering approaches, we consider mixtures of implicit generative models that ``disentangle'' the independent generative mechanisms underlying the data. Relying on an additional set of discriminators, we propose a competitive training procedure in which the models only need to capture the portion of the data distribution from which they can produce realistic samples. As a by-product, each model is simpler and faster to train. We empirically show that our approach splits the training distribution in a sensible way and increases the quality of the generated samples. △ Less

Submitted 3 March, 2019; v1 submitted 30 April, 2018; originally announced April 2018.

arXiv:1802.02629 [pdf, other]

Spatially adaptive image compression using a tiled deep network

Authors: David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, Saurabh Singh

Abstract: Deep neural networks represent a powerful class of function approximators that can learn to compress and reconstruct images. Existing image compression algorithms based on neural networks learn quantized representations with a constant spatial bit rate across each image. While entropy coding introduces some spatial variation, traditional codecs have benefited significantly by explicitly adapting t… ▽ More Deep neural networks represent a powerful class of function approximators that can learn to compress and reconstruct images. Existing image compression algorithms based on neural networks learn quantized representations with a constant spatial bit rate across each image. While entropy coding introduces some spatial variation, traditional codecs have benefited significantly by explicitly adapting the bit rate based on local image complexity and visual saliency. This paper introduces an algorithm that combines deep neural networks with quality-sensitive bit rate adaptation using a tiled network. We demonstrate the importance of spatial context prediction and show improved quantitative (PSNR) and qualitative (subjective rater assessment) results compared to a non-adaptive baseline and a recently published image compression model based on fully-convolutional neural networks. △ Less

Submitted 7 February, 2018; originally announced February 2018.

Journal ref: International Conference on Image Processing 2017

arXiv:1706.03200 [pdf, other]

Critical Hyper-Parameters: No Random, No Cry

Authors: Olivier Bousquet, Sylvain Gelly, Karol Kurach, Olivier Teytaud, Damien Vincent

Abstract: The selection of hyper-parameters is critical in Deep Learning. Because of the long training time of complex models and the availability of compute resources in the cloud, "one-shot" optimization schemes - where the sets of hyper-parameters are selected in advance (e.g. on a grid or in a random manner) and the training is executed in parallel - are commonly used. It is known that grid search is su… ▽ More The selection of hyper-parameters is critical in Deep Learning. Because of the long training time of complex models and the availability of compute resources in the cloud, "one-shot" optimization schemes - where the sets of hyper-parameters are selected in advance (e.g. on a grid or in a random manner) and the training is executed in parallel - are commonly used. It is known that grid search is sub-optimal, especially when only a few critical parameters matter, and suggest to use random search instead. Yet, random search can be "unlucky" and produce sets of values that leave some part of the domain unexplored. Quasi-random methods, such as Low Discrepancy Sequences (LDS) avoid these issues. We show that such methods have theoretical properties that make them appealing for performing hyperparameter search, and demonstrate that, when applied to the selection of hyperparameters of complex Deep Learning models (such as state-of-the-art LSTM language models and image classification models), they yield suitable hyperparameters values with much fewer runs than random search. We propose a particularly simple LDS method which can be used as a drop-in replacement for grid or random search in any Deep Learning pipeline, both as a fully one-shot hyperparameter search or as an initializer in iterative batch optimization. △ Less

Submitted 10 June, 2017; originally announced June 2017.

arXiv:1706.03199 [pdf, other]

Toward Optimal Run Racing: Application to Deep Learning Calibration

Authors: Olivier Bousquet, Sylvain Gelly, Karol Kurach, Marc Schoenauer, Michele Sebag, Olivier Teytaud, Damien Vincent

Abstract: This paper aims at one-shot learning of deep neural nets, where a highly parallel setting is considered to address the algorithm calibration problem - selecting the best neural architecture and learning hyper-parameter values depending on the dataset at hand. The notoriously expensive calibration problem is optimally reduced by detecting and early stopping non-optimal runs. The theoretical contrib… ▽ More This paper aims at one-shot learning of deep neural nets, where a highly parallel setting is considered to address the algorithm calibration problem - selecting the best neural architecture and learning hyper-parameter values depending on the dataset at hand. The notoriously expensive calibration problem is optimally reduced by detecting and early stopping non-optimal runs. The theoretical contribution regards the optimality guarantees within the multiple hypothesis testing framework. Experimentations on the Cifar10, PTB and Wiki benchmarks demonstrate the relevance of the approach with a principled and consistent improvement on the state of the art with no extra hyper-parameter. △ Less

Submitted 20 June, 2017; v1 submitted 10 June, 2017; originally announced June 2017.

arXiv:1705.08403 [pdf]

Atomic Layer deposition of 2D and 3D standards for quantitative synchrotron-based composition and structural analysis methods

Authors: Nicholas G. Becker, Anna Butterworth, Andrey Sokolov, Muriel Salome, Steven Sutton, De Andrade Vincent, Andrew Westphal, Thomas Proslier

Abstract: The use of Standard Reference Materials (SRM) from the National Institute of Standards and Technology (NIST) for quantitative analysis of chemical composition using Synchrotron based X-Ray Florescence (SR-XRF) and Scanning Transmission X-Ray Microscopy (STXM) is common. These standards however can suffer from inhomogeneity in chemical composition and thickness and often require further calculation… ▽ More The use of Standard Reference Materials (SRM) from the National Institute of Standards and Technology (NIST) for quantitative analysis of chemical composition using Synchrotron based X-Ray Florescence (SR-XRF) and Scanning Transmission X-Ray Microscopy (STXM) is common. These standards however can suffer from inhomogeneity in chemical composition and thickness and often require further calculations, based on sample mounting and detector geometry, to obtain quantitative results. These inhomogeneities negatively impact the reproducibility of the measurements and the quantitative measure itself. Atomic Layer Deposition (ALD) is an inexpensive, scalable deposition technique known for producing uniform, conformal films of a wide range of compounds on nearly any substrate material. These traits make it an ideal deposition method for producing films to replace the NIST standards and create SRM on a wide range of relevant 2D and 3D substrates. Utilizing Rutherford Backscattering, X-ray Reflectivity, Quartz crystal microbalance, STXM, and SR-XRF we show that ALD is capable of producing films that are homogenous over scales ranging from 100's of microns to nms △ Less

Submitted 23 May, 2017; originally announced May 2017.

arXiv:1705.08386 [pdf, other]

Better Text Understanding Through Image-To-Text Transfer

Authors: Karol Kurach, Sylvain Gelly, Michal Jastrzebski, Philip Haeusser, Olivier Teytaud, Damien Vincent, Olivier Bousquet

Abstract: Generic text embeddings are successfully used in a variety of tasks. However, they are often learnt by capturing the co-occurrence structure from pure text corpora, resulting in limitations of their ability to generalize. In this paper, we explore models that incorporate visual information into the text representation. Based on comprehensive ablation studies, we propose a conceptually simple, yet… ▽ More Generic text embeddings are successfully used in a variety of tasks. However, they are often learnt by capturing the co-occurrence structure from pure text corpora, resulting in limitations of their ability to generalize. In this paper, we explore models that incorporate visual information into the text representation. Based on comprehensive ablation studies, we propose a conceptually simple, yet well performing architecture. It outperforms previous multimodal approaches on a set of well established benchmarks. We also improve the state-of-the-art results for image-related text datasets, using orders of magnitude less data. △ Less

Submitted 26 May, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

arXiv:1705.06687 [pdf, other]

Target-Quality Image Compression with Recurrent, Convolutional Neural Networks

Authors: Michele Covell, Nick Johnston, David Minnen, Sung Jin Hwang, Joel Shor, Saurabh Singh, Damien Vincent, George Toderici

Abstract: We introduce a stop-code tolerant (SCT) approach to training recurrent convolutional neural networks for lossy image compression. Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in areas around stop-code masking as well as in highly-detailed areas. These methods lead to lower true bitrates for a given recursion count, both pre- and p… ▽ More We introduce a stop-code tolerant (SCT) approach to training recurrent convolutional neural networks for lossy image compression. Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in areas around stop-code masking as well as in highly-detailed areas. These methods lead to lower true bitrates for a given recursion count, both pre- and post-entropy coding, even using unstructured LZ77 code compression. The pre-LZ77 gains are achieved by trimming stop codes. The post-LZ77 gains are due to the highly unequal distributions of 0/1 codes from the SCT architectures. With these code compressions, the SCT architecture maintains or exceeds the image quality at all compression rates compared to JPEG and to RNN auto-encoders across the Kodak dataset. In addition, the SCT coding results in lower variance in image quality across the extent of the image, a characteristic that has been shown to be important in human ratings of image quality △ Less

Submitted 18 May, 2017; originally announced May 2017.

arXiv:1703.10114 [pdf, other]

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

Authors: Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jin Hwang, Joel Shor, George Toderici

Abstract: We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several m… ▽ More We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM. We introduce three improvements over previous research that lead to this state-of-the-art result. First, we show that training with a pixel-wise loss weighted by SSIM increases reconstruction quality according to several metrics. Second, we modify the recurrent architecture to improve spatial diffusion, which allows the network to more effectively capture and propagate image information through the network's hidden state. Finally, in addition to lossless entropy coding, we use a spatially adaptive bit allocation algorithm to more efficiently use the limited number of bits to encode visually complex image regions. We evaluate our method on the Kodak and Tecnick image sets and compare against standard codecs as well recently published methods based on deep neural networks. △ Less

Submitted 29 March, 2017; originally announced March 2017.

arXiv:1608.05148 [pdf, other]

Full Resolution Image Compression with Recurrent Neural Networks

Authors: George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, Michele Covell

Abstract: This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a ne… ▽ More This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural network for entropy coding. We compare RNN types (LSTM, associative LSTM) and introduce a new hybrid of GRU and ResNet. We also study "one-shot" versus additive reconstruction architectures and introduce a new scaled-additive framework. We compare to previous work, showing improvements of 4.3%-8.8% AUC (area under the rate-distortion curve), depending on the perceptual metric used. As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding. △ Less

Submitted 7 July, 2017; v1 submitted 17 August, 2016; originally announced August 2016.

Comments: Updated with content for CVPR and removed supplemental material to an external link for size limitations

arXiv:1511.06085 [pdf, other]

Variable Rate Image Compression with Recurrent Neural Networks

Authors: George Toderici, Sean M. O'Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, Rahul Sukthankar

Abstract: A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing… ▽ More A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore a current research focus, as any byte savings will significantly enhance the experience of mobile device users. Toward this end, we propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks. Our models address the main issues that have prevented autoencoder neural networks from competing with existing image compression algorithms: (1) our networks only need to be trained once (not per-image), regardless of input image dimensions and the desired compression rate; (2) our networks are progressive, meaning that the more bits are sent, the more accurate the image reconstruction; and (3) the proposed architecture is at least as efficient as a standard purpose-trained autoencoder for a given number of bits. On a large-scale benchmark of 32$\times$32 thumbnails, our LSTM-based approaches provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size that is reduced by 10% or more. △ Less

Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

Comments: Under review as a conference paper at ICLR 2016

arXiv:1310.5877 [pdf, other]

doi 10.1016/j.nima.2014.05.093

The camera of the fifth H.E.S.S. telescope. Part I: System description

Authors: J. Bolmont, P. Corona, P. Gauron, P. Ghislain, C. Goffin, L. Guevara Riveros, J. -F. Huppert, O. Martineau-Huynh, P. Nayman, J. -M. Parraud, J. -P. Tavernet, F. Toussenel, D. Vincent, P. Vincent, W. Bertoli, P. Espigat, M. Punch, D. Besin, E. Delagnes, J. -F. Glicenstein, Y. Moudden, P. Venault, H. Zaghia, L. Brunetti, P. -Y. David , et al. (32 additional authors not shown)

Abstract: In July 2012, as the four ground-based gamma-ray telescopes of the H.E.S.S. (High Energy Stereoscopic System) array reached their tenth year of operation in Khomas Highlands, Namibia, a fifth telescope took its first data as part of the system. This new Cherenkov detector, comprising a 614.5 m^2 reflector with a highly pixelized camera in its focal plane, improves the sensitivity of the current ar… ▽ More In July 2012, as the four ground-based gamma-ray telescopes of the H.E.S.S. (High Energy Stereoscopic System) array reached their tenth year of operation in Khomas Highlands, Namibia, a fifth telescope took its first data as part of the system. This new Cherenkov detector, comprising a 614.5 m^2 reflector with a highly pixelized camera in its focal plane, improves the sensitivity of the current array by a factor two and extends its energy domain down to a few tens of GeV. The present part I of the paper gives a detailed description of the fifth H.E.S.S. telescope's camera, presenting the details of both the hardware and the software, emphasizing the main improvements as compared to previous H.E.S.S. camera technology. △ Less

Submitted 26 May, 2014; v1 submitted 22 October, 2013; originally announced October 2013.

Comments: 16 pages, 13 figures, accepted for publication in NIM A

arXiv:0911.0203 [pdf, ps, other]

A quantum trampoline for ultra-cold atoms

Authors: Martin Robert De Saint Vincent, Jean-Philippe Brantut, Christian J. Bordé, Alain Aspect, Thomas Bourdel, Philippe Bouyer

Abstract: We have observed the interferometric suspension of a free-falling Bose-Einstein condensate periodically submitted to multiple-order diffraction by a vertical 1D standing wave. The various diffracted matter waves recombine coherently, resulting in high contrast interference in the number of atoms detected at constant height. For long suspension times, multiple-wave interference is revealed throug… ▽ More We have observed the interferometric suspension of a free-falling Bose-Einstein condensate periodically submitted to multiple-order diffraction by a vertical 1D standing wave. The various diffracted matter waves recombine coherently, resulting in high contrast interference in the number of atoms detected at constant height. For long suspension times, multiple-wave interference is revealed through a sharpening of the fringes. We use this scheme to measure the acceleration of gravity. △ Less

Submitted 1 November, 2009; originally announced November 2009.

Journal ref: Europhysics Letters 89, 1 (2010) 5

arXiv:0903.2745 [pdf, ps, other]

doi 10.1103/PhysRevA.79.061406

All-optical runaway evaporation to Bose-Einstein condensation

Authors: Jean-Francois Clément, Jean-Philippe Brantut, Martin Robert De Saint Vincent, Robert A. Nyman, A. Aspect, Thomas Bourdel, Philippe Bouyer

Abstract: We demonstrate runaway evaporative cooling directly with a tightly confining optical dipole trap and achieve fast production of condensates of 1.5x10^5 87Rb atoms. Our scheme is characterized by an independent control of the optical trap confinement and depth, permitting forced evaporative cooling without reducing the trap stiffness. Although our configuration is particularly well suited to the… ▽ More We demonstrate runaway evaporative cooling directly with a tightly confining optical dipole trap and achieve fast production of condensates of 1.5x10^5 87Rb atoms. Our scheme is characterized by an independent control of the optical trap confinement and depth, permitting forced evaporative cooling without reducing the trap stiffness. Although our configuration is particularly well suited to the case of 87Rb atoms in a 1565nm optical trap, where an efficient initial loading is possible, our scheme is general and should allow all-optical evaporative cooling at constant stiffness for most species. △ Less

Submitted 16 March, 2009; originally announced March 2009.

Journal ref: Physical Review A 79, 6 (2009) 061406(R)

arXiv:0807.3672 [pdf, ps, other]

doi 10.1103/PhysRevA.78.031401

Light-shift tomography in an optical-dipole trap for neutral atoms

Authors: Jean-Philippe Brantut, Jean-François Clément, Martin Robert De Saint Vincent, Gael Varoquaux, Robert A. Nyman, Alain Aspect, Thomas Bourdel, Philippe Bouyer

Abstract: We report on light-shift tomography of a cloud of 87 Rb atoms in a far-detuned optical-dipole trap at 1565 nm. Our method is based on standard absorption imaging, but takes advantage of the strong light-shift of the excited state of the imaging transition, which is due to a quasi-resonance of the trapping laser with a higher excited level. We use this method to (i) map the equipotentials of a cr… ▽ More We report on light-shift tomography of a cloud of 87 Rb atoms in a far-detuned optical-dipole trap at 1565 nm. Our method is based on standard absorption imaging, but takes advantage of the strong light-shift of the excited state of the imaging transition, which is due to a quasi-resonance of the trapping laser with a higher excited level. We use this method to (i) map the equipotentials of a crossed optical-dipole trap, and (ii) study the thermalisation of an atomic cloud by following the evolution of the potential-energy of atoms during the free-evaporation process. △ Less

Submitted 23 July, 2008; originally announced July 2008.

arXiv:astro-ph/0405232 [pdf, ps, other]

Supernova / Acceleration Probe: A Satellite Experiment to Study the Nature of the Dark Energy

Authors: SNAP Collaboration, G. Aldering, W. Althouse, R. Amanullah, J. Annis, P. Astier, C. Baltay, E. Barrelet, S. Basa, C. Bebek, L. Bergstrom, G. Bernstein, M. Bester, B. Bigelow, R. Blandford, R. Bohlin, A. Bonissent, C. Bower, M. Brown, M. Campbell, W. Carithers, E. Commins, W. Craig, C. Day, F. DeJongh , et al. (87 additional authors not shown)

Abstract: The Supernova / Acceleration Probe (SNAP) is a proposed space-based experiment designed to study the dark energy and alternative explanations of the acceleration of the Universe's expansion by performing a series of complementary systematics-controlled measurements. We describe a self-consistent reference mission design for building a Type Ia supernova Hubble diagram and for performing a wide-ar… ▽ More The Supernova / Acceleration Probe (SNAP) is a proposed space-based experiment designed to study the dark energy and alternative explanations of the acceleration of the Universe's expansion by performing a series of complementary systematics-controlled measurements. We describe a self-consistent reference mission design for building a Type Ia supernova Hubble diagram and for performing a wide-area weak gravitational lensing study. A 2-m wide-field telescope feeds a focal plane consisting of a 0.7 square-degree imager tiled with equal areas of optical CCDs and near infrared sensors, and a high-efficiency low-resolution integral field spectrograph. The SNAP mission will obtain high-signal-to-noise calibrated light-curves and spectra for several thousand supernovae at redshifts between z=0.1 and 1.7. A wide-field survey covering one thousand square degrees resolves ~100 galaxies per square arcminute. If we assume we live in a cosmological-constant-dominated Universe, the matter density, dark energy density, and flatness of space can all be measured with SNAP supernova and weak-lensing measurements to a systematics-limited accuracy of 1%. For a flat universe, the density-to-pressure ratio of dark energy can be similarly measured to 5% for the present value w0 and ~0.1 for the time variation w'. The large survey area, depth, spatial resolution, time-sampling, and nine-band optical to NIR photometry will support additional independent and/or complementary dark-energy measurement approaches as well as a broad range of auxiliary science programs. (Abridged) △ Less

Submitted 12 May, 2004; originally announced May 2004.

Comments: 40 pages, 18 figures, submitted to PASP, http://snap.lbl.gov

Showing 1–37 of 37 results for author: Vincent, D