Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Sessa, P G

.
  1. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  2. arXiv:2407.14622  [pdf, other

    cs.LG cs.AI cs.CL

    BOND: Aligning LLMs with Best-of-N Distillation

    Authors: Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Nino Vieillard, Alexandre Ramé, Bobak Shariari, Sarah Perrin, Abe Friesen, Geoffrey Cideron, Sertan Girgin, Piotr Stanczyk, Andrea Michi, Danila Sinopalnikov, Sabela Ramos, Amélie Héliou, Aliaksei Severyn, Matt Hoffman, Nikola Momchev, Olivier Bachem

    Abstract: Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best generation among N candidates. In this paper, we propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its sign… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2406.16768  [pdf, other

    cs.LG cs.AI

    WARP: On the Benefits of Weight Averaged Rewarded Policies

    Authors: Alexandre Ramé, Johan Ferret, Nino Vieillard, Robert Dadashi, Léonard Hussenot, Pierre-Louis Cedoz, Pier Giuseppe Sessa, Sertan Girgin, Arthur Douillard, Olivier Bachem

    Abstract: Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs) by encouraging their generations to have high rewards, using a reward model trained on human preferences. To prevent the forgetting of pre-trained knowledge, RLHF usually incorporates a KL regularization; this forces the policy to remain close to its supervised fine-tuned initialization, though it hinders the rew… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 11 main pages (34 pages with Appendix)

  4. arXiv:2405.20304  [pdf, other

    cs.CL cs.LG

    Group Robust Preference Optimization in Reward-free RLHF

    Authors: Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier Giuseppe Sessa, Haitham Bou Ammar, Ilija Bogunovic

    Abstract: Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimiz… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Preprint

  5. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-tr… ▽ More

    Submitted 28 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  6. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  7. arXiv:2310.06177  [pdf, other

    cs.LG

    DockGame: Cooperative Games for Multimeric Rigid Protein Docking

    Authors: Vignesh Ram Somnath, Pier Giuseppe Sessa, Maria Rodriguez Martinez, Andreas Krause

    Abstract: Protein interactions and assembly formation are fundamental to most biological processes. Predicting the assembly structure from constituent proteins -- referred to as the protein docking task -- is thus a crucial step in protein design applications. Most traditional and deep learning methods for docking have focused mainly on binary docking, following either a search-based, regression-based, or g… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Under Review

  8. arXiv:2309.02236  [pdf, other

    cs.LG cs.AI stat.ML

    Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

    Authors: Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic

    Abstract: Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Journal ref: AISTATS 2024

  9. arXiv:2308.01744  [pdf, other

    cs.LG

    Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

    Authors: Pier Giuseppe Sessa, Pierre Laforgue, Nicolò Cesa-Bianchi, Andreas Krause

    Abstract: Multitask learning is a powerful framework that enables one to simultaneously learn multiple related tasks by sharing information between them. Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. In this work, we provide novel multitask confidence intervals in the challenging agnostic setting, i.e., when neith… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  10. arXiv:2307.16625  [pdf, other

    cs.LG stat.ML

    Adversarial Causal Bayesian Optimization

    Authors: Scott Sussex, Pier Giuseppe Sessa, Anastasiia Makarova, Andreas Krause

    Abstract: In Causal Bayesian Optimization (CBO), an agent intervenes on an unknown structural causal model to maximize a downstream reward variable. In this paper, we consider the generalization where other agents or external events also intervene on the system, which is key for enabling adaptiveness to non-stationarities such as weather changes, market forces, or adversaries. We formalize this generalizati… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 21 pages, 8 figures

  11. arXiv:2210.13064  [pdf, other

    cs.RO cs.AI cs.GT

    How Bad is Selfish Driving? Bounding the Inefficiency of Equilibria in Urban Driving Games

    Authors: Alessandro Zanardi, Pier Giuseppe Sessa, Nando Käslin, Saverio Bolognani, Andrea Censi, Emilio Frazzoli

    Abstract: We consider the interaction among agents engaging in a driving task and we model it as general-sum game. This class of games exhibits a plurality of different equilibria posing the issue of equilibrium selection. While selecting the most efficient equilibrium (in term of social cost) is often impractical from a computational standpoint, in this work we study the (in)efficiency of any equilibrium p… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Under review

  12. arXiv:2210.08087  [pdf, other

    stat.ML cs.LG

    Movement Penalized Bayesian Optimization with Application to Wind Energy Systems

    Authors: Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Andreas Krause, Ilija Bogunovic

    Abstract: Contextual Bayesian optimization (CBO) is a powerful framework for sequential decision-making given side information, with important applications, e.g., in wind energy systems. In this setting, the learner receives context (e.g., weather conditions) at each round, and has to choose an action (e.g., turbine parameters). Standard algorithms assume no cost for switching their decisions at every round… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  13. arXiv:2203.07322  [pdf, other

    cs.LG cs.MA

    Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

    Authors: Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause

    Abstract: We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve g… ▽ More

    Submitted 10 July, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

  14. arXiv:2109.00527  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Boosting Search Engines with Interactive Agents

    Authors: Leonard Adolphs, Benjamin Boerschinger, Christian Buck, Michelle Chen Huebscher, Massimiliano Ciaramita, Lasse Espeholt, Thomas Hofmann, Yannic Kilcher, Sascha Rothe, Pier Giuseppe Sessa, Lierni Sestorain Saralegui

    Abstract: This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and s… ▽ More

    Submitted 7 June, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: Published in Transactions on Machine Learning Research (06/2022)

  15. arXiv:2107.06327  [pdf, other

    cs.GT cs.LG

    Contextual Games: Multi-Agent Learning with Side Information

    Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour

    Abstract: We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round. By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize the contextual regret of individual players. We define game-theoretic… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Journal ref: Proc. of Neural Information Processing Systems (NeurIPS), 2020

  16. arXiv:2007.05271  [pdf, other

    cs.LG cs.AI stat.ML

    Learning to Play Sequential Games versus Unknown Opponents

    Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

    Abstract: We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We seek to design strategies for the learner to successfully interact with the opponent. While most previous approaches consider known opponent models, we focus on the setting in which the opponent's model is unknown. To this end, we use kernel-based regularity assumptions… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

  17. arXiv:2002.12613  [pdf, other

    cs.LG stat.ML

    Mixed Strategies for Robust Optimization of Unknown Objectives

    Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

    Abstract: We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter. For this setting, we design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations. GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes t… ▽ More

    Submitted 2 March, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

  18. No-Regret Learning from Partially Observed Data in Repeated Auctions

    Authors: Orcun Karaca, Pier Giuseppe Sessa, Anna Leidi, Maryam Kamgarpour

    Abstract: We study a general class of repeated auctions, such as the ones found in electricity markets, as multi-agent games between the bidders. In such a repeated setting, bidders can adapt their strategies online based on the data observed in the previous auction rounds. Moreover, if no-regret algorithms are employed by the bidders to update their strategies, the game is known to converge to a coarse-cor… ▽ More

    Submitted 20 December, 2019; originally announced December 2019.

    Journal ref: IFAC-PapersOnLine, 53(2), 14-19, 2020

  19. arXiv:1909.08540  [pdf, other

    cs.LG cs.GT cs.MA stat.ML

    No-Regret Learning in Unknown Games with Correlated Payoffs

    Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

    Abstract: We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performanc… ▽ More

    Submitted 28 October, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

  20. arXiv:1903.00950  [pdf, ps, other

    cs.GT

    Bounding Inefficiency of Equilibria in Continuous Actions Games using Submodularity and Curvature

    Authors: Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause

    Abstract: Games with continuous strategy sets arise in several machine learning problems (e.g. adversarial learning). For such games, simple no-regret learning algorithms exist in several cases and ensure convergence to coarse correlated equilibria (CCE). The efficiency of such equilibria with respect to a social function, however, is not well understood. In this paper, we define the class of valid utility… ▽ More

    Submitted 3 March, 2019; originally announced March 2019.

  21. From Uncertainty Data to Robust Policies for Temporal Logic Planning

    Authors: Pier Giuseppe Sessa, Damian Frick, Tony A. Wood, Maryam Kamgarpour

    Abstract: We consider the problem of synthesizing robust disturbance feedback policies for systems performing complex tasks. We formulate the tasks as linear temporal logic specifications and encode them into an optimization framework via mixed-integer constraints. Both the system dynamics and the specifications are known but affected by uncertainty. The distribution of the uncertainty is unknown, however r… ▽ More

    Submitted 27 August, 2018; v1 submitted 11 January, 2018; originally announced January 2018.

    MSC Class: 90C15

    Journal ref: Proceedings of the 21st International Conference on Hybrid Systems: Computation and Control (part of CPS Week) (HSCC '18). 2018, 157-166

  22. Exploiting structure of chance constrained programs via submodularity

    Authors: Damian Frick, Pier Giuseppe Sessa, Tony A. Wood, Maryam Kamgarpour

    Abstract: We introduce a novel approach to reduce the computational effort of solving mixed-integer convex chance constrained programs through the scenario approach. Instead of reducing the number of required scenarios, we directly minimize the computational cost of the scenario program. We exploit the problem structure by efficiently partitioning the constraint function and considering a multiple chance co… ▽ More

    Submitted 18 September, 2018; v1 submitted 10 January, 2018; originally announced January 2018.

    MSC Class: 90C15

    Journal ref: Automatica Volume 105, July 2019, Pages 89-95

  23. Designing Coalition-Proof Reverse Auctions over Continuous Goods

    Authors: Orcun Karaca, Pier Giuseppe Sessa, Neil Walton, Maryam Kamgarpour

    Abstract: This paper investigates reverse auctions that involve continuous values of different types of goods, general nonconvex constraints, and second stage costs. We seek to design the payment rules and conditions under which coalitions of participants cannot influence the auction outcome in order to obtain higher collective utility. Under the incentive-compatible Vickrey-Clarke-Groves mechanism, we show… ▽ More

    Submitted 31 December, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

    Journal ref: IEEE Transactions on Automatic Control, 64(11), 4803-4810, 2019

  24. arXiv:1611.03044  [pdf, other

    cs.GT

    Exploring Vickrey-Clarke-Groves Mechanism for Electricity Markets

    Authors: Pier Giuseppe Sessa, Neil Walton, Maryam Kamgarpour

    Abstract: Control reserves are power generation or consumption entities that ensure balance of supply and demand of electricity in real-time. In many countries, they are operated through a market mechanism in which entities provide bids. The system operator determines the accepted bids based on an optimization algorithm. We develop the Vickrey-Clarke-Groves (VCG) mechanism for these electricity markets. We… ▽ More

    Submitted 21 November, 2016; v1 submitted 9 November, 2016; originally announced November 2016.