Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Sel, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.16601  [pdf, other

    cs.LG

    A CMDP-within-online framework for Meta-Safe Reinforcement Learning

    Authors: Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, Ming Jin

    Abstract: Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience. However, the aspect of constraint violations has not been adequately addressed in the existing works, making their application restricted in real-world settings. In this paper, we study the problem of meta-safe reinforcement learning (Meta-SRL) through the CMDP-within-on… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Journal ref: ICLR 2023

  2. arXiv:2405.16390  [pdf, other

    cs.AI cs.LG

    Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning

    Authors: Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Alois Knoll, Ming Jin

    Abstract: In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints. To tackle this issue, we propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence. Our method employs a novel natural policy gr… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  3. arXiv:2405.12933  [pdf, other

    cs.CL cs.AI cs.LG

    Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs

    Authors: Bilgehan Sel, Priya Shanmugasundaram, Mohammad Kachuee, Kun Zhou, Ruoxi Jia, Ming Jin

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities in tasks such as summarization, arithmetic reasoning, and question answering. However, they encounter significant challenges in the domain of moral reasoning and ethical decision-making, especially in complex scenarios with multiple stakeholders. This paper introduces the Skin-in-the-Game (SKIG) framework, aimed at enhancing moral rea… ▽ More

    Submitted 2 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: ACL 2024, long paper

  4. arXiv:2405.01677  [pdf, other

    cs.LG cs.AI

    Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation

    Authors: Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, Alois Knoll

    Abstract: Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the t… ▽ More

    Submitted 7 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2308.10380  [pdf, other

    cs.AI cs.CL

    A Human-on-the-Loop Optimization Autoformalism Approach for Sustainability

    Authors: Ming Jin, Bilgehan Sel, Fnu Hardeep, Wotao Yin

    Abstract: This paper outlines a natural conversational approach to solving personalized energy-related problems using large language models (LLMs). We focus on customizable optimization problems that necessitate repeated solving with slight variations in modeling and are user-specific, hence posing a challenge to devising a one-size-fits-all model. We put forward a strategy that augments an LLM with an opti… ▽ More

    Submitted 22 August, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

  6. arXiv:2308.10379  [pdf, other

    cs.CL cs.AI

    Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

    Authors: Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Ruoxi Jia, Ming Jin

    Abstract: Current literature, aiming to surpass the "Chain-of-Thought" approach, often resorts to external modi operandi involving halting, modifying, and then resuming the generation process to boost Large Language Models' (LLMs) reasoning capacities. Due to their myopic perspective, they escalate the number of query requests, leading to increased costs, memory, and computational overheads. Addressing this… ▽ More

    Submitted 2 June, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: ICML 2024

  7. arXiv:2212.01314  [pdf, other

    cs.LG math.OC

    On Solution Functions of Optimization: Universal Approximation and Covering Number Bounds

    Authors: Ming Jin, Vanshaj Khattar, Harshal Kaushik, Bilgehan Sel, Ruoxi Jia

    Abstract: We study the expressibility and learnability of convex optimization solution functions and their multi-layer architectural extension. The main results are: \emph{(1)} the class of solution functions of linear programming (LP) and quadratic programming (QP) is a universal approximant for the $C^k$ smooth model class or some restricted Sobolev space, and we characterize the rate-distortion, \emph{(2… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.