Zum Hauptinhalt springen

Showing 1–50 of 160 results for author: Joty, S

.
  1. arXiv:2408.08656  [pdf, other

    cs.CL

    LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs

    Authors: Do Xuan Long, Hai Nguyen Ngoc, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F. Chen, Min-Yen Kan

    Abstract: We present the first systematic evaluation examining format bias in performance of large language models (LLMs). Our approach distinguishes between two categories of an evaluation metric under format constraints to reliably and accurately assess performance: one measures performance when format constraints are adhered to, while the other evaluates performance regardless of constraint adherence. We… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  2. arXiv:2408.05346  [pdf, other

    cs.CL

    DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts

    Authors: Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty

    Abstract: Data-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. These stories integrate visual aids, such as highlighted bars and lines in charts, along with textual annotations explaining insights. However, creating such stories requires a deep understanding of the data and meticulous narrative planning, often necessitating huma… ▽ More

    Submitted 13 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

  3. arXiv:2407.21794  [pdf, other

    cs.CV cs.AI cs.LG

    Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

    Authors: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Yueqian Lin, Qing Yu, Go Irie, Shafiq Joty, Yixuan Li, Hai Li, Ziwei Liu, Toshihiko Yamasaki, Kiyoharu Aizawa

    Abstract: Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework w… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: survey paper. We welcome questions, issues, and paper requests via https://github.com/AtsuMiyai/Awesome-OOD-VLM

  4. arXiv:2407.04172  [pdf, other

    cs.AI cs.CL cs.CV

    ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

    Authors: Ahmed Masry, Megh Thakkar, Aayush Bajaj, Aaryaman Kartha, Enamul Hoque, Shafiq Joty

    Abstract: Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  5. arXiv:2407.04069  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

    Authors: Md Tahmid Rahman Laskar, Sawsan Alqahtani, M Saiful Bari, Mizanur Rahman, Mohammad Abdullah Matin Khan, Haidar Khan, Israt Jahan, Amran Bhuiyan, Chee Wei Tan, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty, Jimmy Huang

    Abstract: Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2406.03776  [pdf, other

    cs.CL cs.AI cs.CV cs.IR

    XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags

    Authors: Faisal Tareque Shohan, Mir Tafseer Nayeem, Samsul Islam, Abu Ubaida Akash, Shafiq Joty

    Abstract: Millions of news articles published online daily can overwhelm readers. Headlines and entity (topic) tags are essential for guiding readers to decide if the content is worth their time. While headline generation has been extensively studied, tag generation remains largely unexplored, yet it offers readers better access to topics of interest. The need for conciseness in capturing readers' attention… ▽ More

    Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: ACL 2024 camera ready. The first two authors contributed equally

  7. arXiv:2405.15329  [pdf, other

    cs.CL

    Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework

    Authors: Minzhi Li, Zhengyuan Liu, Shumin Deng, Shafiq Joty, Nancy F. Chen, Min-Yen Kan

    Abstract: The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as a crucial research question. Prior research efforts in the meta-evaluation of LLMs as judges limit the prompting of an LLM to a single use to obtain a final ev… ▽ More

    Submitted 14 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  8. arXiv:2404.16251  [pdf, other

    cs.CR cs.AI cs.CL

    Prompt Leakage effect and defense strategies for multi-turn LLM interactions

    Authors: Divyansh Agarwal, Alexander R. Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, Chien-Sheng Wu

    Abstract: Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities a… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  9. arXiv:2404.12728  [pdf, other

    cs.CL

    Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

    Authors: Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty

    Abstract: Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context… ▽ More

    Submitted 23 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  10. arXiv:2404.02507  [pdf, other

    cs.CL

    Lifelong Event Detection with Embedding Space Separation and Compaction

    Authors: Chengwei Qin, Ruirui Chen, Ruochen Zhao, Wenhan Xia, Shafiq Joty

    Abstract: To mitigate forgetting, existing lifelong event detection methods typically maintain a memory module and replay the stored memory data during the learning of a new task. However, the simple combination of memory data and new-task samples can still result in substantial forgetting of previously acquired knowledge, which may occur due to the potential overlap between the feature distribution of new… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 main conference

  11. arXiv:2404.00699  [pdf, other

    cs.CL

    How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library

    Authors: Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, Shafiq Joty

    Abstract: With the rise of Large Language Models (LLMs) in recent years, abundant new opportunities are emerging, but also new challenges, among which contamination is quickly becoming critical. Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pres… ▽ More

    Submitted 20 August, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure, 1 table

  12. arXiv:2404.00570  [pdf, other

    cs.CL

    ParaICL: Towards Robust Parallel In-Context Learning

    Authors: Xingxuan Li, Xuan-Phi Nguyen, Shafiq Joty, Lidong Bing

    Abstract: Large language models (LLMs) have become the norm in natural language processing (NLP), excelling in few-shot in-context learning (ICL) with their remarkable abilities. Nonetheless, the success of ICL largely hinges on the choice of few-shot demonstration examples, making the selection process increasingly crucial. Existing methods have delved into optimizing the quantity and semantic similarity o… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Work in progress

  13. arXiv:2403.12027  [pdf, other

    cs.CL cs.AI cs.CV

    From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

    Authors: Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

    Abstract: Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increa… ▽ More

    Submitted 25 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  14. arXiv:2403.09028  [pdf, other

    cs.CL

    ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning

    Authors: Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty

    Abstract: Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and summarization. A common strategy to solve these tasks is to fine-tune various models originally trained on vision tasks language. However, such task-specific mo… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  15. arXiv:2403.02990  [pdf, other

    cs.CL cs.AI

    Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges

    Authors: Bosheng Ding, Chengwei Qin, Ruochen Zhao, Tianze Luo, Xinze Li, Guizhen Chen, Wenhan Xia, Junjie Hu, Anh Tuan Luu, Shafiq Joty

    Abstract: In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This survey explores the transformative impact of LLMs on DA, particularly addressing the unique challenges and opportunities they present in the context of natural… ▽ More

    Submitted 2 July, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  16. arXiv:2402.00658  [pdf, other

    cs.AI cs.CL

    Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

    Authors: Fangkai Jiao, Chengwei Qin, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in handling complex reasoning tasks through step-by-step rationale generation. However, recent studies have raised concerns regarding the hallucination and flaws in their reasoning process. Substantial efforts are being made to improve the reliability and faithfulness of the generated rationales. Some approaches model reasoning a… ▽ More

    Submitted 15 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 17 pages, 9 figures

  17. arXiv:2401.13974  [pdf, other

    cs.CV cs.AI cs.GR

    BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models

    Authors: Senthil Purushwalkam, Akash Gokul, Shafiq Joty, Nikhil Naik

    Abstract: Recent text-to-image generation models have demonstrated incredible success in generating images that faithfully follow input prompts. However, the requirement of using words to describe a desired concept provides limited control over the appearance of the generated concepts. In this work, we address this shortcoming by proposing an approach to enable personalization capabilities in existing text-… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  18. arXiv:2312.17055  [pdf, other

    cs.CL

    Improving In-context Learning via Bidirectional Alignment

    Authors: Chengwei Qin, Wenhan Xia, Fangkai Jiao, Chen Chen, Yuchen Hu, Bosheng Ding, Shafiq Joty

    Abstract: Large language models (LLMs) have shown impressive few-shot generalization on many tasks via in-context learning (ICL). Despite their success in showing such emergent abilities, the scale and complexity of larger models also lead to unprecedentedly high computational demands and deployment challenges. In reaction, researchers explore transferring the powerful capabilities of larger models to more… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  19. arXiv:2312.10610  [pdf, other

    cs.CL

    Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization

    Authors: Xuan Long Do, Mohammad Hassanpour, Ahmed Masry, Parsa Kavehzadeh, Enamul Hoque, Shafiq Joty

    Abstract: A number of tasks have been proposed recently to facilitate easy access to charts such as chart QA and summarization. The dominant paradigm to solve these tasks has been to fine-tune a pretrained model on the task data. However, this approach is not only expensive but also not generalizable to unseen tasks. On the other hand, large language models (LLMs) have shown impressive generalization capabi… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 23 pages

  20. arXiv:2311.18799  [pdf, other

    cs.CV cs.CL

    X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

    Authors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

    Abstract: Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs). In this paper, we introduce a simple, yet effective, cross-modality framework built atop frozen LLMs that allows the integration of various modalities without extensive modality-specific custo… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  21. arXiv:2311.16989  [pdf, other

    cs.CL

    ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

    Authors: Hailin Chen, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, Shafiq Joty

    Abstract: Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a large language model (LLM) with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: version v4, included latest top-performing open-sourced LLMs

  22. arXiv:2311.12908  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Diffusion Model Alignment Using Direct Preference Optimization

    Authors: Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik

    Abstract: Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality im… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  23. arXiv:2311.09184  [pdf, other

    cs.CL cs.LG

    Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

    Authors: Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, Pengfei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan

    Abstract: While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, where the model input consists of both a source article and a natural language requirement for desired summary characteristi… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 Findings, GitHub Repo: https://github.com/yale-nlp/InstruSum, LLM-evaluators Leaderboard: https://huggingface.co/spaces/yale-nlp/InstruSumEval

  24. arXiv:2310.20170  [pdf, other

    cs.CL

    DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

    Authors: Wenting Zhao, Ye Liu, Tong Niu, Yao Wan, Philip S. Yu, Shafiq Joty, Yingbo Zhou, Semih Yavuz

    Abstract: Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasi… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  25. arXiv:2310.18628  [pdf, other

    cs.CL cs.LG

    Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

    Authors: Hailin Chen, Amrita Saha, Steven Hoi, Shafiq Joty

    Abstract: With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the stude… ▽ More

    Submitted 26 January, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023; Codes at: https://github.com/SalesforceAIResearch/PersDistill

  26. arXiv:2310.10570  [pdf, other

    cs.CL

    On Context Utilization in Summarization with Large Language Models

    Authors: Mathieu Ravaut, Aixin Sun, Nancy F. Chen, Shafiq Joty

    Abstract: Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. However, in question answering, language models exhibit uneven utilization of their input context. They tend to favor the initial and final segments, resulting in a U-shaped perfo… ▽ More

    Submitted 14 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ACL 2024. 9 pages, 7 figures, 3 tables

  27. arXiv:2310.09886  [pdf, other

    cs.CL cs.AI

    Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation

    Authors: Chengwei Qin, Chen Chen, Shafiq Joty

    Abstract: Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can bett… ▽ More

    Submitted 22 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  28. arXiv:2310.08992  [pdf, other

    cs.AI cs.CL cs.PL

    CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

    Authors: Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty

    Abstract: Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modul… ▽ More

    Submitted 13 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  29. arXiv:2310.01917  [pdf, other

    cs.CL cs.HC

    Hierarchical Evaluation Framework: Best Practices for Human Evaluation

    Authors: Iva Bojic, Jessica Chen, Si Yuan Chang, Qi Chwen Ong, Shafiq Joty, Josip Car

    Abstract: Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards. Through an extensive analysis of existing li… ▽ More

    Submitted 12 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

  30. arXiv:2309.17446  [pdf, other

    cs.CL cs.LG cs.PL cs.SE

    L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

    Authors: Ansong Ni, Pengcheng Yin, Yilun Zhao, Martin Riddell, Troy Feng, Rui Shen, Stephen Yin, Ye Liu, Semih Yavuz, Caiming Xiong, Shafiq Joty, Yingbo Zhou, Dragomir Radev, Arman Cohan

    Abstract: Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising results, there is a notable lack of a comprehensive evaluation of these models language-to-code generation capabilities. Existing studies often focus on specific task… ▽ More

    Submitted 2 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Project Website: https://l2c-eval.github.io/

  31. arXiv:2309.09369  [pdf, other

    cs.CL

    Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

    Authors: Kung-Hsiang Huang, Philippe Laban, Alexander R. Fabbri, Prafulla Kumar Choubey, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains underexplored. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To… ▽ More

    Submitted 22 March, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: NAACL 2024

  32. arXiv:2309.06057  [pdf, other

    cs.SE cs.CL

    RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

    Authors: Weishi Wang, Yue Wang, Shafiq Joty, Steven C. H. Hoi

    Abstract: Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. While conventional search-based techniques typically rely on heuristic rules or a redundancy assumption to mine fix patterns, recent years have witnessed the surge of deep learning (DL) based approaches to automate the program repair process in a data-driven manner. However… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: FSE 2023, Long paper

  33. arXiv:2309.03450  [pdf, other

    cs.CL cs.AI cs.LG

    XGen-7B Technical Report

    Authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong

    Abstract: Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many t… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  34. arXiv:2308.12574  [pdf, other

    cs.IR cs.AI

    Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs

    Authors: Ye Liu, Semih Yavuz, Rui Meng, Meghana Moorthy, Shafiq Joty, Caiming Xiong, Yingbo Zhou

    Abstract: The integration of retrieved passages and large language models (LLMs), such as ChatGPTs, has significantly contributed to improving open-domain question answering. However, there is still a lack of exploration regarding the optimal approach for incorporating retrieved passages into the answer generation process. This paper aims to fill this gap by investigating different methods of combining retr… ▽ More

    Submitted 7 April, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  35. arXiv:2308.03117  [pdf, other

    cs.CL

    PromptSum: Parameter-Efficient Controllable Abstractive Summarization

    Authors: Mathieu Ravaut, Hailin Chen, Ruochen Zhao, Chengwei Qin, Shafiq Joty, Nancy Chen

    Abstract: Prompt tuning (PT), a parameter-efficient technique that only tunes the additional prompt embeddings while keeping the backbone pre-trained language model (PLM) frozen, has shown promising results in language understanding tasks, especially in low-resource scenarios. However, effective prompt design methods suitable for generation tasks such as summarization are still lacking. At the same time, su… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  36. arXiv:2306.11372  [pdf, other

    cs.CL cs.AI

    Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

    Authors: Xuan-Phi Nguyen, Sharifah Mahani Aljunied, Shafiq Joty, Lidong Bing

    Abstract: Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented lan… ▽ More

    Submitted 19 July, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: ACL 2024 Main Conference

  37. arXiv:2306.01150  [pdf, other

    cs.CL cs.AI

    Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning

    Authors: Fan Yin, Jesse Vig, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Jason Wu

    Abstract: Large language models (LLMs) have shown impressive performance in following natural language instructions to solve unseen tasks. However, it remains unclear whether models truly understand task definitions and whether the human-written definitions are optimal. In this paper, we systematically study the role of task definitions in instruction learning. We first conduct an ablation analysis informed… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ACL 2023, camera-ready; 10 pages

  38. arXiv:2305.19707  [pdf, other

    cs.CL

    Building Extractive Question Answering System to Support Human-AI Health Coaching Model for Sleep Domain

    Authors: Iva Bojic, Qi Chwen Ong, Shafiq Joty, Josip Car

    Abstract: Non-communicable diseases (NCDs) are a leading cause of global deaths, necessitating a focus on primary prevention and lifestyle behavior change. Health coaching, coupled with Question Answering (QA) systems, has the potential to transform preventive healthcare. This paper presents a human-Artificial Intelligence (AI) health coaching model incorporating a domain-specific extractive QA system. A sl… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 2 pages, 1 figure

  39. arXiv:2305.19204  [pdf, other

    cs.CL

    SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages

    Authors: Philippe Laban, Jesse Vig, Wojciech Kryscinski, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Text simplification research has mostly focused on sentence-level simplification, even though many desirable edits - such as adding relevant background information or reordering content - may require document-level context. Prior work has also predominantly framed simplification as a single-step, input-to-output task, only implicitly modeling the fine-grained, span-level edits that elucidate the s… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: ACL 2023, Long Paper

  40. arXiv:2305.18486  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

    Authors: Md Tahmid Rahman Laskar, M Saiful Bari, Mizanur Rahman, Md Amran Hossen Bhuiyan, Shafiq Joty, Jimmy Xiangji Huang

    Abstract: The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recently. However, their evaluation in the benchmark academic datasets remains under-explored due to the difficulty of evaluating the generative outputs produced by this model against the ground truth. In this paper, we aim to present a thorough evaluation of ChatGPT's performance on diverse academic dat… ▽ More

    Submitted 5 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 Findings. The first three authors contributed equally

  41. arXiv:2305.15014  [pdf, other

    cs.CL

    Unlocking Temporal Question Answering for Large Language Models Using Code Execution

    Authors: Xingxuan Li, Liying Cheng, Qingyu Tan, Hwee Tou Ng, Shafiq Joty, Lidong Bing

    Abstract: Large language models (LLMs) have made significant progress in natural language processing (NLP), and are utilized extensively in various applications. Recent works, such as chain-of-thought (CoT), have shown that intermediate reasoning steps can improve the performance of LLMs for complex reasoning tasks, such as math problems and symbolic question-answering tasks. However, we notice the challeng… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  42. arXiv:2305.14761  [pdf, other

    cs.CL

    UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

    Authors: Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, Shafiq Joty

    Abstract: Charts are very popular for analyzing data, visualizing key insights and answering complex reasoning questions about data. To facilitate chart-based data analysis using natural language, several downstream tasks have been introduced recently such as chart question answering and chart summarization. However, most of the methods that solve these tasks use pretraining on language or vision-language t… ▽ More

    Submitted 10 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  43. arXiv:2305.14540  [pdf, other

    cs.CL

    LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

    Authors: Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu

    Abstract: With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency de… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  44. arXiv:2305.13718  [pdf, other

    cs.CL

    Exploring Self-supervised Logic-enhanced Training for Large Language Models

    Authors: Fangkai Jiao, Zhiyang Teng, Bosheng Ding, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty

    Abstract: Existing efforts to improve logical reasoning ability of language models have predominantly relied on supervised fine-tuning, hindering generalization to new domains and/or tasks. The development of Large Langauge Models (LLMs) has demonstrated the capacity of compressing abundant knowledge into a single proxy, enabling them to tackle multiple tasks effectively. Our preliminary experiments, nevert… ▽ More

    Submitted 16 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 16 pages, NAACL 2024

  45. arXiv:2305.13269  [pdf, other

    cs.CL

    Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

    Authors: Xingxuan Li, Ruochen Zhao, Yew Ken Chia, Bosheng Ding, Shafiq Joty, Soujanya Poria, Lidong Bing

    Abstract: We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-inten… ▽ More

    Submitted 21 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by ICLR 2024

  46. arXiv:2305.06522  [pdf, other

    cs.CL cs.AI

    Randomized Smoothing with Masked Inference for Adversarially Robust Text Classifications

    Authors: Han Cheol Moon, Shafiq Joty, Ruochen Zhao, Megh Thakkar, Xu Chi

    Abstract: Large-scale pre-trained language models have shown outstanding performance in a variety of NLP tasks. However, they are also known to be significantly brittle against specifically crafted adversarial examples, leading to increasing interest in probing the adversarial robustness of NLP systems. We introduce RSMI, a novel two-stage framework that combines randomized smoothing (RS) with masked infere… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: 19 pages, 4 figures, ACL23

  47. arXiv:2305.03268  [pdf, other

    cs.CL

    Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework

    Authors: Ruochen Zhao, Xingxuan Li, Shafiq Joty, Chengwei Qin, Lidong Bing

    Abstract: As large language models (LLMs) have become the norm in NLP, demonstrating good performance in generation and reasoning tasks, one of its most fatal disadvantages is the lack of factual correctness. Generating unfactual texts not only leads to lower performances but also degrades the trust and validity of their applications. Chain-of-Thought (CoT) prompting improves trust and model performance on… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  48. arXiv:2305.03088  [pdf, other

    cs.CL cs.AI

    Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation

    Authors: Xuan Long Do, Bowei Zou, Shafiq Joty, Anh Tai Tran, Liangming Pan, Nancy F. Chen, Ai Ti Aw

    Abstract: Conversational Question Generation (CQG) is a critical task for machines to assist humans in fulfilling their information needs through conversations. The task is generally cast into two different settings: answer-aware and answer-unaware. While the former facilitates the models by exposing the expected answer, the latter is more realistic and receiving growing attentions recently. What-to-ask and… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 17 pages, ACL 2023

  49. arXiv:2305.02160  [pdf, other

    cs.CL

    Explaining Language Models' Predictions with High-Impact Concepts

    Authors: Ruochen Zhao, Shafiq Joty, Yongjie Wang, Tan Wang

    Abstract: The emergence of large-scale pretrained language models has posed unprecedented challenges in deriving explanations of why the model has made some predictions. Stemmed from the compositional nature of languages, spurious correlations have further undermined the trustworthiness of NLP systems, leading to unreliable model explanations that are merely correlated with the output predictions. To encour… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  50. arXiv:2304.01295  [pdf, other

    cs.CL cs.AI

    Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning

    Authors: Lifu Tu, Jin Qu, Semih Yavuz, Shafiq Joty, Wenhao Liu, Caiming Xiong, Yingbo Zhou

    Abstract: Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks, but focus on conversational tasks has been rather limited. This is partly due to the high cost of obtaining non-English conversational data, which results in limited coverage. In this work, we introduce XSGD for cross-lingual alignment pretraining, a parallel and la… ▽ More

    Submitted 26 January, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted to the Finding of the ACL: EACL 2024