Zum Hauptinhalt springen

Showing 1–50 of 131 results for author: Poria, S

.
  1. arXiv:2408.10701  [pdf, other

    cs.CL

    Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

    Authors: Tej Deep Pala, Vernon Y. H. Toh, Rishabh Bhardwaj, Soujanya Poria

    Abstract: In today's era, where large language models (LLMs) are integrated into numerous real-world applications, ensuring their safety and robustness is crucial for responsible AI usage. Automated red-teaming methods play a key role in this process by generating adversarial attacks to identify and mitigate potential vulnerabilities in these models. However, existing methods often struggle with slow perfor… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.09481  [pdf, other

    cs.CL cs.AI

    PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

    Authors: Meng Luo, Hao Fei, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu

    Abstract: While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversati… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024 (Oral)

  3. arXiv:2408.03837  [pdf, other

    cs.CL cs.AI

    WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

    Authors: Prannaya Gupta, Le Qi Yau, Hao Han Low, I-Shiang Lee, Hugo Maximus Lim, Yu Xin Teoh, Jia Hng Koh, Dar Win Liew, Rishabh Bhardwaj, Rajat Bhardwaj, Soujanya Poria

    Abstract: WalledEval is a comprehensive AI safety testing toolkit designed to evaluate large language models (LLMs). It accommodates a diverse range of models, including both open-weight and API-based ones, and features over 35 safety benchmarks covering areas such as multilingual safety, exaggerated safety, and prompt injections. The framework supports both LLM and judge benchmarking and incorporates custo… ▽ More

    Submitted 19 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Under review

  4. arXiv:2406.17257  [pdf, other

    cs.CL cs.SD eess.AS

    Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation

    Authors: Yingting Li, Ambuj Mehrish, Bryan Chew, Bo Cheng, Soujanya Poria

    Abstract: Different languages have distinct phonetic systems and vary in their prosodic features making it challenging to develop a Text-to-Speech (TTS) model that can effectively synthesise speech in multilingual settings. Furthermore, TTS architecture needs to be both efficient enough to capture nuances in multiple languages and efficient enough to be practical for deployment. The standard approach is to… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.15487  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving Text-To-Audio Models with Synthetic Captions

    Authors: Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro

    Abstract: It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model}… ▽ More

    Submitted 8 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.15193  [pdf, other

    cs.CL

    Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

    Authors: Chia-Yu Hung, Navonil Majumder, Ambuj Mehrish, Soujanya Poria

    Abstract: The widespread applicability and increasing omnipresence of LLMs have instigated a need to align LLM responses to user and stakeholder preferences. Many preference optimization approaches have been proposed that fine-tune LLM parameters to achieve good alignment. However, such parameter tuning is known to interfere with model performance on many tasks. Moreover, keeping up with shifting user prefe… ▽ More

    Submitted 8 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.11801  [pdf, other

    cs.CL

    Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations

    Authors: Rima Hazra, Sayan Layek, Somnath Banerjee, Soujanya Poria

    Abstract: Ensuring the safe alignment of large language models (LLMs) with human values is critical as they become integral to applications like translation and question answering. Current alignment methods struggle with dynamic user intentions and complex objectives, making models vulnerable to generating harmful content. We propose Safety Arithmetic, a training-free framework enhancing LLM safety across d… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under Review. Codes are available at: https://github.com/declare-lab/safety-arithmetic

  8. arXiv:2406.11654  [pdf, other

    cs.CL

    Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

    Authors: Vernon Toh Yan Han, Rishabh Bhardwaj, Soujanya Poria

    Abstract: We propose Ruby Teaming, a method that improves on Rainbow Teaming by including a memory cache as its third dimension. The memory dimension provides cues to the mutator to yield better-quality prompts, both in terms of attack success rate (ASR) and quality diversity. The prompt archive generated by Ruby Teaming has an ASR of 74%, which is 20% higher than the baseline. In terms of quality diversity… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11617  [pdf, other

    cs.CL

    DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

    Authors: Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria

    Abstract: With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows si… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2405.07229  [pdf, other

    cs.MM

    MM-InstructEval: Zero-Shot Evaluation of (Multimodal) Large Language Models on Multimodal Reasoning Tasks

    Authors: Xiaocui Yang, Wenfang Wu, Shi Feng, Ming Wang, Daling Wang, Yang Li, Qi Sun, Yifei Zhang, Xiaoming Fu, Soujanya Poria

    Abstract: The rising popularity of multimodal large language models (MLLMs) has sparked a significant increase in research dedicated to evaluating these models. However, current evaluation studies predominantly concentrate on the ability of models to comprehend and reason within a unimodal (vision-only) context, overlooking critical performance evaluations in complex multimodal reasoning tasks that integrat… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Under review, the new version of MM-BigBench: arXiv:2310.09036

  11. Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

    Authors: Yanfei Dong, Lambert Deng, Jiazheng Zhang, Xiaodong Yu, Ting Lin, Francesco Gelli, Soujanya Poria, Wee Sun Lee

    Abstract: Documents that consist of diverse templates and exhibit complex spatial structures pose a challenge for document entity classification. We propose KNN-former, which incorporates a new kind of spatial bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We limit entities' attention only to their local radius defined by the KNN graph. We also use combinator… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  12. arXiv:2405.04655  [pdf, other

    cs.CL

    Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense

    Authors: Siqi Shen, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Soujanya Poria, Rada Mihalcea

    Abstract: Large language models (LLMs) have demonstrated substantial commonsense understanding through numerous benchmark evaluations. However, their understanding of cultural commonsense remains largely unexamined. In this paper, we conduct a comprehensive examination of the capabilities and limitations of several state-of-the-art LLMs in the context of cultural commonsense tasks. Using several general and… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  13. arXiv:2404.09956  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

    Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

    Abstract: Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of such processes in the music and film industry. Many of the recent diffusion-based text-to-audio models… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at ACM MM 2024

  14. arXiv:2404.04645  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks

    Authors: Yingting Li, Rishabh Bhardwaj, Ambuj Mehrish, Bo Cheng, Soujanya Poria

    Abstract: Neural speech synthesis, or text-to-speech (TTS), aims to transform a signal from the text domain to the speech domain. While developing TTS architectures that train and test on the same set of speakers has seen significant improvements, out-of-domain speaker performance still faces enormous limitations. Domain adaptation on a new set of speakers can be achieved by fine-tuning the whole model for… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  15. arXiv:2404.00569  [pdf, other

    cs.SD cs.CL eess.AS

    CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

    Authors: Xiang Li, Fan Bu, Ambuj Mehrish, Yingting Li, Jiale Han, Bo Cheng, Soujanya Poria

    Abstract: Neural Text-to-Speech (TTS) systems find broad applications in voice assistants, e-learning, and audiobook creation. The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis. Yet, the efficiency of multi-step sampling in Diffusion Models presents challenges. Efforts have been made to integrate GANs with DMs, speeding up infere… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted by Findings of NAACL 2024. Code is available at https://github.com/XiangLi2022/CM-TTS

  16. arXiv:2403.13315  [pdf, other

    cs.CV

    PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

    Authors: Yew Ken Chia, Vernon Toh Yan Han, Deepanway Ghosal, Lidong Bing, Soujanya Poria

    Abstract: Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of 2000 puzzle instances based on abstract… ▽ More

    Submitted 17 August, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: ACL 2024 Camera Ready

  17. arXiv:2403.03864  [pdf, other

    cs.CV cs.AI

    Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

    Authors: Deepanway Ghosal, Vernon Toh Yan Han, Chia Yew Ken, Soujanya Poria

    Abstract: This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering. We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning. We create the puzzles… ▽ More

    Submitted 12 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  18. arXiv:2402.14492  [pdf, other

    cs.CL cs.AI

    Towards Robust Instruction Tuning on Multimodal Large Language Models

    Authors: Wei Han, Hui Chen, Soujanya Poria

    Abstract: Fine-tuning large language models (LLMs) on multi-task instruction-following data has been proven to be a powerful learning paradigm for improving their zero-shot capabilities on new tasks. Recent works about high-quality instruction-following data generation and selection require amounts of human labor to conceive model-understandable instructions for the given tasks and carefully filter the LLM-… ▽ More

    Submitted 14 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 24 pages, 7 figures

  19. arXiv:2402.11746  [pdf, other

    cs.CL cs.AI

    Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

    Authors: Rishabh Bhardwaj, Do Duc Anh, Soujanya Poria

    Abstract: Aligned language models face a significant limitation as their fine-tuning often results in compromised safety. To tackle this, we propose a simple method RESTA that performs LLM safety realignment. RESTA stands for REstoring Safety through Task Arithmetic. At its core, it involves a simple arithmetic addition of a safety vector to the weights of the compromised model. We demonstrate the effective… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  20. arXiv:2401.13697  [pdf, other

    cs.CV cs.AI cs.CL

    Toward Robust Multimodal Learning using Multimodal Foundational Models

    Authors: Xianbing Zhao, Soujanya Poria, Xuejiao Li, Yixin Chen, Buzhou Tang

    Abstract: Existing multimodal sentiment analysis tasks are highly rely on the assumption that the training and test sets are complete multimodal data, while this assumption can be difficult to hold: the multimodal data are often incomplete in real-world scenarios. Therefore, a robust multimodal model in scenarios with randomly missing modalities is highly preferred. Recently, CLIP-based multimodal foundatio… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Under Review

  21. arXiv:2401.13598  [pdf, other

    cs.CL

    Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction

    Authors: Qi Sun, Kun Huang, Xiaocui Yang, Rong Tong, Kun Zhang, Soujanya Poria

    Abstract: Document-level Relation Triplet Extraction (DocRTE) is a fundamental task in information systems that aims to simultaneously extract entities with semantic relations from a document. Existing methods heavily rely on a substantial amount of fully labeled data. However, collecting and annotating data for newly emerging relations is time-consuming and labor-intensive. Recent advanced Large Language M… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted by WWW 2024

  22. arXiv:2401.10647  [pdf, other

    cs.CL

    Sowing the Wind, Reaping the Whirlwind: The Impact of Editing Language Models

    Authors: Rima Hazra, Sayan Layek, Somnath Banerjee, Soujanya Poria

    Abstract: In the rapidly advancing field of artificial intelligence, the concept of Red-Teaming or Jailbreaking large language models (LLMs) has emerged as a crucial area of study. This approach is especially significant in terms of assessing and enhancing the safety and robustness of these models. This paper investigates the intricate consequences of such modifications through model editing, uncovering a c… ▽ More

    Submitted 16 May, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted at ACL 2024

  23. arXiv:2401.09395  [pdf, other

    cs.CL

    Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

    Authors: Pengfei Hong, Navonil Majumder, Deepanway Ghosal, Somak Aditya, Rada Mihalcea, Soujanya Poria

    Abstract: Recent advancements in Large Language Models (LLMs) have showcased striking results on existing logical reasoning benchmarks, with some models even surpassing human performance. However, the true depth of their competencies and robustness in reasoning tasks remains an open question. To this end, in this paper, we focus on two popular reasoning tasks: arithmetic reasoning and code generation. Parti… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  24. arXiv:2311.09277  [pdf, other

    cs.CL

    Contrastive Chain-of-Thought Prompting

    Authors: Yew Ken Chia, Guizhen Chen, Luu Anh Tuan, Soujanya Poria, Lidong Bing

    Abstract: Despite the success of chain of thought in enhancing language model reasoning, the underlying process remains less well understood. Although logically sound reasoning appears inherently crucial for chain of thought, prior studies surprisingly reveal minimal impact when using invalid demonstrations instead. Furthermore, the conventional chain of thought does not inform language models on what mista… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  25. arXiv:2311.08355  [pdf, other

    eess.AS

    Mustango: Toward Controllable Text-to-Music Generation

    Authors: Jan Melechovsky, Zixun Guo, Deepanway Ghosal, Navonil Majumder, Dorien Herremans, Soujanya Poria

    Abstract: The quality of the text-to-music models has reached new heights due to recent advancements in diffusion models. The controllability of various musical aspects, however, has barely been explored. In this paper, we propose Mustango: a music-domain-knowledge-inspired text-to-music system based on diffusion. Mustango aims to control the generated music, not only with general text captions, but with mo… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  26. arXiv:2311.00968  [pdf, other

    cs.SD cs.AI eess.AS

    Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

    Authors: Jaeyong Kang, Soujanya Poria, Dorien Herremans

    Abstract: Numerous studies in the field of music generation have demonstrated impressive performance, yet virtually no models are able to directly generate music to match accompanying videos. In this work, we develop a generative music AI framework, Video2Music, that can match a provided video. We first curated a unique collection of music videos. Then, we analysed the music videos to obtain semantic, scene… ▽ More

    Submitted 4 March, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Journal ref: Expert Systems with Applications 249 (2024): 123640

  27. arXiv:2310.20159  [pdf, other

    cs.CV cs.AI

    Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts

    Authors: Deepanway Ghosal, Navonil Majumder, Roy Ka-Wei Lee, Rada Mihalcea, Soujanya Poria

    Abstract: Visual question answering (VQA) is the task of answering questions about an image. The task assumes an understanding of both the image and the question to provide a natural language answer. VQA has gained popularity in recent years due to its potential applications in a wide range of fields, including robotics, education, and healthcare. In this paper, we focus on knowledge-augmented VQA, where an… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  28. arXiv:2310.19232  [pdf, other

    cs.CL

    Adapter Pruning using Tropical Characterization

    Authors: Rishabh Bhardwaj, Tushar Vaidya, Soujanya Poria

    Abstract: Adapters are widely popular parameter-efficient transfer learning approaches in natural language processing that insert trainable modules in between layers of a pre-trained language model. Apart from several heuristics, however, there has been a lack of studies analyzing the optimal number of adapter parameters needed for downstream applications. In this paper, we propose an adapter pruning approa… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023, Findings

  29. arXiv:2310.14303  [pdf, other

    cs.CL

    Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases

    Authors: Rishabh Bhardwaj, Soujanya Poria

    Abstract: Red-teaming has been a widely adopted way to evaluate the harmfulness of Large Language Models (LLMs). It aims to jailbreak a model's safety behavior to make it act as a helpful agent disregarding the harmfulness of the query. Existing methods are primarily based on input text-based red-teaming such as adversarial prompts, low-resource prompts, or contextualized prompts to condition the model in a… ▽ More

    Submitted 13 November, 2023; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Under Review

  30. arXiv:2310.09036  [pdf, other

    cs.CL cs.MM

    MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks

    Authors: Xiaocui Yang, Wenfang Wu, Shi Feng, Ming Wang, Daling Wang, Yang Li, Qi Sun, Yifei Zhang, Xiaoming Fu, Soujanya Poria

    Abstract: The popularity of multimodal large language models (MLLMs) has triggered a recent surge in research efforts dedicated to evaluating these models. Nevertheless, existing evaluation studies of MLLMs primarily focus on the comprehension and reasoning of unimodal (vision) content, neglecting performance evaluations in the domain of multimodal (vision-language) content understanding. Beyond multimodal… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Underview

  31. arXiv:2309.02726  [pdf, other

    cs.CL cs.AI

    Large Language Models for Automated Open-domain Scientific Hypotheses Discovery

    Authors: Zonglin Yang, Xinya Du, Junxian Li, Jie Zheng, Soujanya Poria, Erik Cambria

    Abstract: Hypothetical induction is recognized as the main reasoning type when scientists make observations about the world and try to propose hypotheses to explain those observations. Past research on hypothetical induction is under a constrained setting: (1) the observation annotations in the dataset are carefully manually handpicked sentences (resulting in a close-domain setting); and (2) the ground trut… ▽ More

    Submitted 12 June, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: Accepted by ACL 2024 (findings)

  32. arXiv:2308.09662  [pdf, other

    cs.CL

    Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

    Authors: Rishabh Bhardwaj, Soujanya Poria

    Abstract: Larger language models (LLMs) have taken the world by storm with their massive multi-tasking capabilities simply by optimizing over a next-word prediction objective. With the emergence of their properties and encoded knowledge, the risk of LLMs producing harmful outputs increases, making them unfit for scalable deployment for the public. In this work, we propose a new safety evaluation benchmark R… ▽ More

    Submitted 30 August, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

  33. arXiv:2307.04192  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models

    Authors: Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria

    Abstract: Video question-answering is a fundamental task in the field of video understanding. Although current vision--language models (VLMs) equipped with Video Transformers have enabled temporal modeling and yielded superior results, they are at the cost of huge computational power and thus too expensive to deploy in real-time application scenarios. An economical workaround only samples a small portion of… ▽ More

    Submitted 31 March, 2024; v1 submitted 9 July, 2023; originally announced July 2023.

    Comments: 13 pages, 7 figures, accepted to Findings of NAACL 2024

  34. arXiv:2307.02053  [pdf, other

    cs.CL

    Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning

    Authors: Deepanway Ghosal, Yew Ken Chia, Navonil Majumder, Soujanya Poria

    Abstract: Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of large language models (LLMs) that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skill… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  35. arXiv:2306.04757  [pdf, other

    cs.CL cs.AI

    INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

    Authors: Yew Ken Chia, Pengfei Hong, Lidong Bing, Soujanya Poria

    Abstract: Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve complex tasks in areas like mathematics, coding, medicine, and law. Despite their impressive capabilities, there is still a lack of comprehensive understanding r… ▽ More

    Submitted 15 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Github: https://github.com/declare-lab/instruct-eval Leaderboard: https://declare-lab.github.io/instruct-eval/

  36. arXiv:2305.18028  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

    Authors: Ambuj Mehrish, Abhinav Ramesh Kashyap, Li Yingting, Navonil Majumder, Soujanya Poria

    Abstract: There are significant challenges for speaker adaptation in text-to-speech for languages that are not widely spoken or for speakers with accents or dialects that are not well-represented in the training data. To address this issue, we propose the use of the "mixture of adapters" method. This approach involves adding multiple adapters within a backbone-model layer to learn the unique characteristics… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  37. arXiv:2305.14434  [pdf, other

    cs.CL

    Domain-Expanded ASTE: Rethinking Generalization in Aspect Sentiment Triplet Extraction

    Authors: Yew Ken Chia, Hui Chen, Wei Han, Guizhen Chen, Sharifah Mahani Aljunied, Soujanya Poria, Lidong Bing

    Abstract: Aspect Sentiment Triplet Extraction (ASTE) is a subtask of Aspect-Based Sentiment Analysis (ABSA) that considers each opinion term, their expressed sentiment, and the corresponding aspect targets. However, existing methods are limited to the in-domain setting with two domains. Hence, we propose a domain-expanded benchmark to address the in-domain, out-of-domain and cross-domain settings. We suppor… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  38. arXiv:2305.13269  [pdf, other

    cs.CL

    Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

    Authors: Xingxuan Li, Ruochen Zhao, Yew Ken Chia, Bosheng Ding, Shafiq Joty, Soujanya Poria, Lidong Bing

    Abstract: We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-inten… ▽ More

    Submitted 21 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by ICLR 2024

  39. arXiv:2305.12641  [pdf, other

    cs.CL

    A Comprehensive Survey of Sentence Representations: From the BERT Epoch to the ChatGPT Era and Beyond

    Authors: Abhinav Ramesh Kashyap, Thanh-Tung Nguyen, Viktor Schlegel, Stefan Winkler, See-Kiong Ng, Soujanya Poria

    Abstract: Sentence representations are a critical component in NLP applications such as retrieval, question answering, and text classification. They capture the meaning of a sentence, enabling machines to understand and reason over human language. In recent years, significant progress has been made in developing methods for learning sentence representations, including unsupervised, supervised, and transfer… ▽ More

    Submitted 2 February, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted to EACL'24

  40. arXiv:2305.12301  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding

    Authors: Yi Xuan Tan, Navonil Majumder, Soujanya Poria

    Abstract: The pre-trained speech encoder wav2vec 2.0 performs very well on various spoken language understanding (SLU) tasks. However, on many tasks, it trails behind text encoders with textual input. To improve the understanding capability of SLU encoders, various studies have used knowledge distillation to transfer knowledge from natural language understanding (NLU) encoders. We use a very simple method o… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  41. arXiv:2305.11029  [pdf, other

    cs.CL cs.AI

    Uncertainty Guided Label Denoising for Document-level Distant Relation Extraction

    Authors: Qi Sun, Kun Huang, Xiaocui Yang, Pengfei Hong, Kun Zhang, Soujanya Poria

    Abstract: Document-level relation extraction (DocRE) aims to infer complex semantic relations among entities in a document. Distant supervision (DS) is able to generate massive auto-labeled data, which can improve DocRE performance. Recent works leverage pseudo labels generated by the pre-denoising model to reduce noise in DS data. However, unreliable pseudo labels bring new noise, e.g., adding false pseudo… ▽ More

    Submitted 26 May, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 9 pages, ACL 2023 Long Paper

  42. arXiv:2305.10169  [pdf, other

    cs.MM

    Few-shot Joint Multimodal Aspect-Sentiment Analysis Based on Generative Multimodal Prompt

    Authors: Xiaocui Yang, Shi Feng, Daling Wang, Sun Qi, Wenfang Wu, Yifei Zhang, Pengfei Hong, Soujanya Poria

    Abstract: We have witnessed the rapid proliferation of multimodal data on numerous social media platforms. Conventional studies typically require massive labeled data to train models for Multimodal Aspect-Based Sentiment Analysis (MABSA). However, collecting and annotating fine-grained multimodal data for MABSA is tough. To alleviate the above issue, we perform three MABSA-related tasks with quite a small n… ▽ More

    Submitted 18 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: 13 pages, 7 figures, 6 tables, ACL 2023 Long Paper (Findings)

  43. arXiv:2305.02858  [pdf, other

    cs.CL cs.AI

    ReMask: A Robust Information-Masking Approach for Domain Counterfactual Generation

    Authors: Pengfei Hong, Rishabh Bhardwaj, Navonil Majumdar, Somak Aditya, Soujanya Poria

    Abstract: Domain shift is a big challenge in NLP, thus, many approaches resort to learning domain-invariant features to mitigate the inference phase domain shift. Such methods, however, fail to leverage the domain-specific nuances relevant to the task at hand. To avoid such drawbacks, domain counterfactual generation aims to transform a text from the source domain to a given target domain. However, due to t… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 12 pages, 1 figure, 8 tables, ACL 2023 Long Paper (Findings)

  44. arXiv:2305.00359  [pdf, other

    eess.AS

    A Review of Deep Learning Techniques for Speech Processing

    Authors: Ambuj Mehrish, Navonil Majumder, Rishabh Bhardwaj, Rada Mihalcea, Soujanya Poria

    Abstract: The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognitio… ▽ More

    Submitted 30 May, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

  45. arXiv:2304.13731  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

    Authors: Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria

    Abstract: The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks. Inspired by such successes, we adopt such an instruction-tuned LLM Flan-T5 as the text encoder for text-to-audio (TTA) generation… ▽ More

    Submitted 29 May, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: https://github.com/declare-lab/tango

  46. arXiv:2304.01933  [pdf, other

    cs.CL

    LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

    Authors: Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, Roy Ka-Wei Lee

    Abstract: The success of large language models (LLMs), like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most… ▽ More

    Submitted 9 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: EMNLP 2023. The code of our framework can be found at https://github.com/AGI-Edgerunners/LLM-Adapters. We will keep all of the code open-source and continue to update the framework with new adapters, LLMs, and tasks

  47. arXiv:2303.03267  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding

    Authors: Yingting Li, Ambuj Mehrish, Shuai Zhao, Rishabh Bhardwaj, Amir Zadeh, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models. Parameter inefficiency can however arise when, during transfer learning, all the parameters of a large pre-trained model need to be updated for individual downstream tasks. As the number of parameters grows, fine-tuning is prone to overfitting and catastrophic forgetting. In addition, full fine-tunin… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  48. arXiv:2302.03194  [pdf, other

    cs.CL

    UDApter -- Efficient Domain Adaptation Using Adapters

    Authors: Bhavitvya Malik, Abhinav Ramesh Kashyap, Min-Yen Kan, Soujanya Poria

    Abstract: We propose two methods to make unsupervised domain adaptation (UDA) more parameter efficient using adapters, small bottleneck layers interspersed with every layer of the large-scale pre-trained language model (PLM). The first method deconstructs UDA into a two-step process: first by adding a domain adapter to learn domain-invariant information and then by adding a task adapter that uses domain-inv… ▽ More

    Submitted 16 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted to EACL 2023

  49. arXiv:2211.10018  [pdf, other

    cs.CL

    A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach

    Authors: Yew Ken Chia, Lidong Bing, Sharifah Mahani Aljunied, Luo Si, Soujanya Poria

    Abstract: Relation extraction has the potential for large-scale knowledge graph construction, but current methods do not consider the qualifier attributes for each relation triplet, such as time, quantity or location. The qualifiers form hyper-relational facts which better capture the rich and complex knowledge graph structure. For example, the relation triplet (Leonard Parker, Educated At, Harvard Universi… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: 19 pages, 6 figures, accepted by EMNLP 2022

  50. Few-shot Multimodal Sentiment Analysis based on Multimodal Probabilistic Fusion Prompts

    Authors: Xiaocui Yang, Shi Feng, Daling Wang, Pengfei Hong, Soujanya Poria

    Abstract: Multimodal sentiment analysis has gained significant attention due to the proliferation of multimodal content on social media. However, existing studies in this area rely heavily on large-scale supervised data, which is time-consuming and labor-intensive to collect. Thus, there is a need to address the challenge of few-shot multimodal sentiment analysis. To tackle this problem, we propose a novel… ▽ More

    Submitted 1 August, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

    Comments: 9 pages, 2 figures, 7 tables. It has been accepted ACM MM 2023