Zum Hauptinhalt springen

Showing 1–3 of 3 results for author: RRV, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.14790  [pdf, other

    cs.CL cs.AI

    Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?

    Authors: Nemika Tyagi, Mihir Parmar, Mohith Kulkarni, Aswin RRV, Nisarg Patel, Mutsumi Nakamura, Arindam Mitra, Chitta Baral

    Abstract: Solving grid puzzles involves a significant amount of logical reasoning. Hence, it is a good domain to evaluate the reasoning capability of a model which can then guide us to improve the reasoning ability of models. However, most existing works evaluate only the final predicted answer of a puzzle, without delving into an in-depth analysis of the LLMs' reasoning chains (such as where they falter) o… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 16 Pages

  2. arXiv:2406.03827  [pdf, other

    cs.CL

    Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies

    Authors: Aswin RRV, Nemika Tyagi, Md Nayem Uddin, Neeraj Varshney, Chitta Baral

    Abstract: This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines,… ▽ More

    Submitted 24 August, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  3. arXiv:2405.16681  [pdf, other

    cs.CL

    Triple Preference Optimization: Achieving Better Alignment with Less Data in a Single Step Optimization

    Authors: Amir Saeidi, Shivanshu Verma, Aswin RRV, Chitta Baral

    Abstract: Large Language Models (LLMs) perform well across diverse tasks, but aligning them with human demonstrations is challenging. Recently, Reinforcement Learning (RL)-free methods like Direct Preference Optimization (DPO) have emerged, offering improved stability and scalability while retaining competitive performance relative to RL-based methods. However, while RL-free methods deliver satisfactory per… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.