Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Spangher, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19107  [pdf, ps, other

    cs.LG cs.AI

    Offline Regularised Reinforcement Learning for Large Language Models Alignment

    Authors: Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Avila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Remi Munos, Bilal Piot

    Abstract: The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2312.10289  [pdf, other

    cs.LG cs.AI eess.SY

    Active Reinforcement Learning for Robust Building Control

    Authors: Doseok Jang, Larry Yan, Lucas Spangher, Costas Spanos

    Abstract: Reinforcement learning (RL) is a powerful tool for optimal control that has found great success in Atari games, the game of Go, robotic control, and building optimization. RL is also very brittle; agents often overfit to their training environment and fail to generalize to new settings. Unsupervised environment design (UED) has been proposed as a solution to this problem, in which the agent trains… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  3. arXiv:2312.01286  [pdf, other

    cs.LG physics.plasm-ph

    Continuous Convolutional Neural Networks for Disruption Prediction in Nuclear Fusion Plasmas

    Authors: William F Arnold, Lucas Spangher, Christina Rea

    Abstract: Grid decarbonization for climate change requires dispatchable carbon-free energy like nuclear fusion. The tokamak concept offers a promising path for fusion, but one of the foremost challenges in implementation is the occurrence of energetic plasma disruptions. In this study, we delve into Machine Learning approaches to predict plasma state outcomes. Our contributions are twofold: (1) We present a… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Accepted at CCAI NeurIPS 2023

  4. arXiv:2211.14889  [pdf, other

    cs.LG eess.SY

    Machine Learning for Smart and Energy-Efficient Buildings

    Authors: Hari Prasanna Das, Yu-Wen Lin, Utkarsha Agwan, Lucas Spangher, Alex Devonport, Yu Yang, Jan Drgona, Adrian Chong, Stefano Schiavon, Costas J. Spanos

    Abstract: Energy consumption in buildings, both residential and commercial, accounts for approximately 40% of all energy usage in the U.S., and similar numbers are being reported from countries around the world. This significant amount of energy is used to maintain a comfortable, secure, and productive environment for the occupants. So, it is crucial that the energy consumption in buildings must be optimize… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

  5. arXiv:2210.06820  [pdf, other

    cs.LG cs.AI eess.SY

    Personalized Federated Hypernetworks for Privacy Preservation in Multi-Task Reinforcement Learning

    Authors: Doseok Jang, Larry Yan, Lucas Spangher, Costas J. Spanos

    Abstract: Multi-Agent Reinforcement Learning currently focuses on implementations where all data and training can be centralized to one machine. But what if local agents are split across multiple tasks, and need to keep data private between each? We develop the first application of Personalized Federated Hypernetworks (PFH) to Reinforcement Learning (RL). We then present a novel application of PFH to few-sh… ▽ More

    Submitted 19 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

  6. arXiv:2111.06025  [pdf, other

    cs.LG

    Adapting Surprise Minimizing Reinforcement Learning Techniques for Transactive Control

    Authors: William Arnold, Tarang Srivastava, Lucas Spangher, Utkarsha Agwan, Costas Spanos

    Abstract: Optimizing prices for energy demand response requires a flexible controller with ability to navigate complex environments. We propose a reinforcement learning controller with surprise minimizing modifications in its architecture. We suggest that surprise minimization can be used to improve learning speed, taking advantage of predictability in peoples' energy usage. Our architecture performs well i… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

  7. arXiv:2108.06594  [pdf, other

    cs.LG cs.AI

    Offline-Online Reinforcement Learning for Energy Pricing in Office Demand Response: Lowering Energy and Data Costs

    Authors: Doseok Jang, Lucas Spangher, Manan Khattar, Utkarsha Agwan, Selvaprabuh Nadarajah, Costas Spanos

    Abstract: Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we examine how offline training can be leveraged to minimize data costs (accelerate convergence) and program impl… ▽ More

    Submitted 14 August, 2021; originally announced August 2021.

    Comments: arXiv admin note: text overlap with arXiv:2104.14670

  8. Using Meta Reinforcement Learning to Bridge the Gap between Simulation and Experiment in Energy Demand Response

    Authors: Doseok Jang, Lucas Spangher, Manan Khattar, Utkarsha Agwan, Costas Spanos

    Abstract: Our team is proposing to run a full-scale energy demand response experiment in an office building. Although this is an exciting endeavor which will provide value to the community, collecting training data for the reinforcement learning agent is costly and will be limited. In this work, we apply a meta-learning architecture to warm start the experiment with simulated tasks, to increase sample effic… ▽ More

    Submitted 17 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.