Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Strouse, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2402.14903  [pdf, other

    cs.CL cs.LG

    Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

    Authors: Aaditya K. Singh, DJ Strouse

    Abstract: Tokenization, the division of input text into input tokens, is an often overlooked aspect of the large language model (LLM) pipeline and could be the source of useful or harmful inductive biases. Historically, LLMs have relied on byte pair encoding, without care to specific input domains. With the increased use of LLMs for reasoning, various number-specific tokenization schemes have been adopted,… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 21 pages, 18 figures

  3. arXiv:2310.04373  [pdf, other

    cs.LG cs.AI

    Confronting Reward Model Overoptimization with Constrained RLHF

    Authors: Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen McAleer

    Abstract: Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback. However, human preferences are multi-faceted, and it is increasingly common to derive reward from a composition of simpler reward models which each capture a different aspect of language quality. This itself presents a challenge, as it is difficult to appropriat… ▽ More

    Submitted 10 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  4. arXiv:2211.13746  [pdf, other

    cs.MA cs.AI cs.GT cs.NE

    Melting Pot 2.0

    Authors: John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo

    Abstract: Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures ge… ▽ More

    Submitted 30 October, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

    Comments: 69 pages, 54 figures. arXiv admin note: text overlap with arXiv:2107.06857

  5. arXiv:2210.14215  [pdf, other

    cs.LG cs.AI

    In-context Reinforcement Learning with Algorithm Distillation

    Authors: Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

    Abstract: We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transf… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  6. arXiv:2204.05080  [pdf, other

    cs.LG cs.AI

    Semantic Exploration from Language Abstractions and Pretrained Representations

    Authors: Allison C. Tam, Neil C. Rabinowitz, Andrew K. Lampinen, Nicholas A. Roy, Stephanie C. Y. Chan, DJ Strouse, Jane X. Wang, Andrea Banino, Felix Hill

    Abstract: Effective exploration is a challenge in reinforcement learning (RL). Novelty-based exploration methods can suffer in high-dimensional state spaces, such as continuous partially-observable 3D environments. We address this challenge by defining novelty using semantically meaningful state abstractions, which can be found in learned representations shaped by natural language. In particular, we evaluat… ▽ More

    Submitted 26 April, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022

  7. arXiv:2110.08176  [pdf, other

    cs.LG cs.HC cs.MA

    Collaborating with Humans without Human Data

    Authors: DJ Strouse, Kevin R. McKee, Matt Botvinick, Edward Hughes, Richard Everett

    Abstract: Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model… ▽ More

    Submitted 7 January, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021 (spotlight)

  8. arXiv:2107.14226  [pdf, other

    cs.LG cs.AI stat.ML

    Learning more skills through optimistic exploration

    Authors: DJ Strouse, Kate Baumli, David Warde-Farley, Vlad Mnih, Steven Hansen

    Abstract: Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach et al., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards. They work by simultaneously training a policy to produce distinguishable latent-conditioned trajectories, and a discriminator to evaluate distinguishability by trying to infer latents from trajectories. The hope is for the agen… ▽ More

    Submitted 12 May, 2022; v1 submitted 29 July, 2021; originally announced July 2021.

    Comments: Accepted at ICLR 2022 (spotlight)

  9. arXiv:1907.05181  [pdf, other

    cs.MA cs.LG

    Learning Truthful, Efficient, and Welfare Maximizing Auction Rules

    Authors: Andrea Tacchetti, DJ Strouse, Marta Garnelo, Thore Graepel, Yoram Bachrach

    Abstract: From social networks to supply chains, more and more aspects of how humans, firms and organizations interact is mediated by artificial learning agents. As the influence of machine learning systems grows, it is paramount that we study how to imbue our modern institutions with our own values and principles. Here we consider the problem of allocating goods to buyers who have preferences over them in… ▽ More

    Submitted 1 November, 2022; v1 submitted 11 July, 2019; originally announced July 2019.

  10. arXiv:1901.10902  [pdf, other

    stat.ML cs.LG

    InfoBot: Transfer and Exploration via the Information Bottleneck

    Authors: Anirudh Goyal, Riashat Islam, Daniel Strouse, Zafarali Ahmed, Matthew Botvinick, Hugo Larochelle, Yoshua Bengio, Sergey Levine

    Abstract: A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postulate that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We p… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: Accepted at ICLR'19

  11. arXiv:1810.08647  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

    Authors: Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

    Abstract: We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agen… ▽ More

    Submitted 18 June, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

  12. arXiv:1808.02093  [pdf, other

    cs.AI cs.IT cs.LG cs.MA stat.ML

    Learning to Share and Hide Intentions using Information Regularization

    Authors: DJ Strouse, Max Kleiman-Weiner, Josh Tenenbaum, Matt Botvinick, David Schwab

    Abstract: Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. Typically to do so, one requires access to either a model of or interaction with the other agent(s). Here we show how to learn effective strategies for cooperation and competition in an asymmetric information game with no such model or interaction. Our approach is to encourage an agen… ▽ More

    Submitted 1 January, 2019; v1 submitted 6 August, 2018; originally announced August 2018.

    Comments: Presented at the 32nd Conference on Neural Information Processing Systems (NIPS 2018)

  13. arXiv:1712.09657  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    The information bottleneck and geometric clustering

    Authors: DJ Strouse, David J Schwab

    Abstract: The information bottleneck (IB) approach to clustering takes a joint distribution $P\!\left(X,Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions $P\!\left(Y\mid X\right)$. This is in contrast to classic "g… ▽ More

    Submitted 31 May, 2020; v1 submitted 27 December, 2017; originally announced December 2017.

    Comments: Updated to final published version with more detailed relationship to GMMs/k-means

    Journal ref: Neural Computation 31 (2019) 596-612

  14. arXiv:1604.00268  [pdf, other

    q-bio.NC cond-mat.stat-mech cs.IT q-bio.QM stat.ML

    The deterministic information bottleneck

    Authors: DJ Strouse, David J Schwab

    Abstract: Lossy compression and clustering fundamentally involve a decision about what features are relevant and which are not. The information bottleneck method (IB) by Tishby, Pereira, and Bialek formalized this notion as an information-theoretic optimization problem and proposed an optimal tradeoff between throwing away as many bits as possible, and selectively keeping those that are most important. In t… ▽ More

    Submitted 19 December, 2016; v1 submitted 1 April, 2016; originally announced April 2016.

    Comments: 15 pages, 4 figures