Search | arXiv e-print repository

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2311.07264 [pdf, other]

Danish Foundation Models

Authors: Kenneth Enevoldsen, Lasse Hansen, Dan S. Nielsen, Rasmus A. F. Egebæk, Søren V. Holm, Martin C. Nielsen, Martin Bernstorff, Rasmus Larsen, Peter B. Jørgensen, Malte Højmark-Bertelsen, Peter B. Vahlstrup, Per Møldrup-Dalum, Kristoffer Nielbo

Abstract: Large language models, sometimes referred to as foundation models, have transformed multiple fields of research. However, smaller languages risk falling behind due to high training costs and small incentives for large companies to train these models. To combat this, the Danish Foundation Models project seeks to provide and maintain open, well-documented, and high-quality foundation models for the… ▽ More Large language models, sometimes referred to as foundation models, have transformed multiple fields of research. However, smaller languages risk falling behind due to high training costs and small incentives for large companies to train these models. To combat this, the Danish Foundation Models project seeks to provide and maintain open, well-documented, and high-quality foundation models for the Danish language. This is achieved through broad cooperation with public and private institutions, to ensure high data quality and applicability of the trained models. We present the motivation of the project, the current status, and future perspectives. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 4 pages, 2 tables

arXiv:2201.06863 [pdf, other]

Programmatic Policy Extraction by Iterative Local Search

Authors: Rasmus Larsen, Mikkel Nørgaard Schmidt

Abstract: Reinforcement learning policies are often represented by neural networks, but programmatic policies are preferred in some cases because they are more interpretable, amenable to formal verification, or generalize better. While efficient algorithms for learning neural policies exist, learning programmatic policies is challenging. Combining imitation-projection and dataset aggregation with a local se… ▽ More Reinforcement learning policies are often represented by neural networks, but programmatic policies are preferred in some cases because they are more interpretable, amenable to formal verification, or generalize better. While efficient algorithms for learning neural policies exist, learning programmatic policies is challenging. Combining imitation-projection and dataset aggregation with a local search heuristic, we present a simple and direct approach to extracting a programmatic policy from a pretrained neural policy. After examining our local search heuristic on a programming by example problem, we demonstrate our programmatic policy extraction method on a pendulum swing-up problem. Both when trained using a hand crafted expert policy and a learned neural policy, our method discovers simple and interpretable policies that perform almost as well as the original. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:2010.15745 [pdf, other]

Reinforcement Learning of Causal Variables Using Mediation Analysis

Authors: Tue Herlau, Rasmus Larsen

Abstract: Many open problems in machine learning are intrinsically related to causality, however, the use of causal analysis in machine learning is still in its early stage. Within a general reinforcement learning setting, we consider the problem of building a general reinforcement learning agent which uses experience to construct a causal graph of the environment, and use this graph to inform its policy. O… ▽ More Many open problems in machine learning are intrinsically related to causality, however, the use of causal analysis in machine learning is still in its early stage. Within a general reinforcement learning setting, we consider the problem of building a general reinforcement learning agent which uses experience to construct a causal graph of the environment, and use this graph to inform its policy. Our approach has three characteristics: First, we learn a simple, coarse-grained causal graph, in which the variables reflect states at many time instances, and the interventions happen at the level of policies, rather than individual actions. Secondly, we use mediation analysis to obtain an optimization target. By minimizing this target, we define the causal variables. Thirdly, our approach relies on estimating conditional expectations rather the familiar expected return from reinforcement learning, and we therefore apply a generalization of Bellman's equations. We show the method can learn a plausible causal graph in a grid-world environment, and the agent obtains an improvement in performance when using the causally informed policy. To our knowledge, this is the first attempt to apply causal analysis in a reinforcement learning setting without strict restrictions on the number of states. We have observed that mediation analysis provides a promising avenue for transforming the problem of causal acquisition into one of cost-function minimization, but importantly one which involves estimating conditional expectations. This is a new challenge, and we think that causal reinforcement learning will involve development methods suited for online estimation of such conditional expectations. Finally, a benefit of our approach is the use of very simple causal models, which are arguably a more natural model of human causal understanding. △ Less

Submitted 17 May, 2022; v1 submitted 29 October, 2020; originally announced October 2020.

Comments: As accepted at proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI, 2022

MSC Class: 68T05 ACM Class: I.2.6

arXiv:1905.04114 [pdf, ps, other]

Fast delta evaluation for the Vehicle Routing Problem with Multiple Time Windows

Authors: Rune Larsen, Dario Pacino

Abstract: In many applications of vehicle routing, a set of time windows are feasible for each visit, giving rise to the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). We argue that such disjunctions are problematic for many solution methods, and exemplify this using a state of the art Adaptive Large Neighbourhood Search heuristic. VRPMTW comes in two variants depending on whether the time use… ▽ More In many applications of vehicle routing, a set of time windows are feasible for each visit, giving rise to the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). We argue that such disjunctions are problematic for many solution methods, and exemplify this using a state of the art Adaptive Large Neighbourhood Search heuristic. VRPMTW comes in two variants depending on whether the time used en route must be minimised. A more compact and corrected mathematical formulation for both variants of the problem is presented, and new best solutions for all but six benchmark instances of VRPMTW without time minimisation is found. A new solution representation for VRPMTW with time minimisation is presented, its importance is demonstrated and it is used to find new best solutions for all but one benchmark instance of VRPMTW with time minimisation. △ Less

Submitted 10 May, 2019; originally announced May 2019.

arXiv:1811.03974 [pdf, other]

doi 10.1145/3264888.3264897

Science Hackathons for Cyberphysical System Security Research: Putting CPS testbed platforms to good use

Authors: Simon N. Foley, Fabien Autrel, Edwin Bourget, Thomas Cledel, Stephane Grunenwald, Jose Rubio Hernan, Alexandre Kabil, Raphael Larsen, Vivien M. Rooney, Kirsten Vanhulst

Abstract: A challenge is to develop cyber-physical system scenarios that reflect the diversity and complexity of real-life cyber-physical systems in the research questions that they address. Time-bounded collaborative events, such as hackathons, jams and sprints, are increasingly used as a means of bringing groups of individuals together, in order to explore challenges and develop solutions. This paper desc… ▽ More A challenge is to develop cyber-physical system scenarios that reflect the diversity and complexity of real-life cyber-physical systems in the research questions that they address. Time-bounded collaborative events, such as hackathons, jams and sprints, are increasingly used as a means of bringing groups of individuals together, in order to explore challenges and develop solutions. This paper describes our experiences, using a science hackathon to bring individual researchers together, in order to develop a common use-case implemented on a shared CPS testbed platform that embodies the diversity in their own security research questions. A qualitative study of the event was conducted, in order to evaluate the success of the process, with a view to improving future similar events. △ Less

Submitted 9 November, 2018; originally announced November 2018.

Journal ref: Proceedings of the 2018 ACM Workshop on Cyber-Physical Systems Security and PrivaCy (CPS-SPC@CCS 2018)

arXiv:1807.10363 [pdf, other]

doi 10.1063/1.5099132

Message-passing neural networks for high-throughput polymer screening

Authors: Peter C. St. John, Caleb Phillips, Travis W. Kemper, A. Nolan Wilson, Michael F. Crowley, Mark R. Nimlos, Ross E. Larsen

Abstract: Machine learning methods have shown promise in predicting molecular properties, and given sufficient training data machine learning approaches can enable rapid high-throughput virtual screening of large libraries of compounds. Graph-based neural network architectures have emerged in recent years as the most successful approach for predictions based on molecular structure, and have consistently ach… ▽ More Machine learning methods have shown promise in predicting molecular properties, and given sufficient training data machine learning approaches can enable rapid high-throughput virtual screening of large libraries of compounds. Graph-based neural network architectures have emerged in recent years as the most successful approach for predictions based on molecular structure, and have consistently achieved the best performance on benchmark quantum chemical datasets. However, these models have typically required optimized 3D structural information for the molecule to achieve the highest accuracy. These 3D geometries are costly to compute for high levels of theory, limiting the applicability and practicality of machine learning methods in high-throughput screening applications. In this study, we present a new database of candidate molecules for organic photovoltaic applications, comprising approximately 91,000 unique chemical structures.Compared to existing datasets, this dataset contains substantially larger molecules (up to 200 atoms) as well as extrapolated properties for long polymer chains. We show that message-passing neural networks trained with and without 3D structural information for these molecules achieve similar accuracy, comparable to state-of-the-art methods on existing benchmark datasets. These results therefore emphasize that for larger molecules with practical applications, near-optimal prediction results can be obtained without using optimized 3D geometry as an input. We further show that learned molecular representations can be leveraged to reduce the training data required to transfer predictions to a new DFT functional. △ Less

Submitted 5 April, 2019; v1 submitted 26 July, 2018; originally announced July 2018.

Comments: 7 pages, 3 figures

arXiv:1706.04972 [pdf, ps, other]

Device Placement Optimization with Reinforcement Learning

Authors: Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

Abstract: The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human expe… ▽ More The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification, and on RNN LSTM, for language modeling and neural machine translation, our model finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods. △ Less

Submitted 25 June, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

Comments: To appear at ICML 2017

arXiv:1012.4691 [pdf, other]

Solving a real-life large-scale energy management problem

Authors: Steffen Godskesen, Thomas Sejr Jensen, Niels Kjeldsen, Rune Larsen

Abstract: This paper introduces a three-phase heuristic approach for a large-scale energy management and maintenance scheduling problem. The problem is concerned with scheduling maintenance and refueling for nuclear power plants up to five years into the future, while handling a number of scenarios for future demand and prices. The goal is to minimize the expected total production costs. The first phase of… ▽ More This paper introduces a three-phase heuristic approach for a large-scale energy management and maintenance scheduling problem. The problem is concerned with scheduling maintenance and refueling for nuclear power plants up to five years into the future, while handling a number of scenarios for future demand and prices. The goal is to minimize the expected total production costs. The first phase of the heuristic solves a simplified constraint programming model of the problem, the second performs a local search, and the third handles overproduction in a greedy fashion. This work was initiated in the context of the ROADEF/EURO Challenge 2010, a competition organized jointly by the French Operational Research and Decision Support Society, the European Operational Research Society, and the European utility company Electricite de France. In the concluding phase of the competition our team ranked second in the junior category and sixth overall. After correcting an implementation bug in the program that was submitted for evaluation, our heuristic solves all ten real-life instances, and the solutions obtained are all within 2.45% of the currently best known solutions. The results given here would have ranked first in the original competition. △ Less

Submitted 21 December, 2010; originally announced December 2010.

Showing 1–9 of 9 results for author: Larsen, R