Skip to main content

Showing 1–31 of 31 results for author: Ghosal, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15487  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving Text-To-Audio Models with Synthetic Captions

    Authors: Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro

    Abstract: It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model}… ▽ More

    Submitted 8 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2404.09956  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

    Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

    Abstract: Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of such processes in the music and film industry. Many of the recent diffusion-based text-to-audio models… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at ACM MM 2024

  3. arXiv:2403.13315  [pdf, other

    cs.CV

    PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns

    Authors: Yew Ken Chia, Vernon Toh Yan Han, Deepanway Ghosal, Lidong Bing, Soujanya Poria

    Abstract: Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns. Wit… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  4. arXiv:2403.03864  [pdf, other

    cs.CV cs.AI

    Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning

    Authors: Deepanway Ghosal, Vernon Toh Yan Han, Chia Yew Ken, Soujanya Poria

    Abstract: This paper introduces the novel task of multimodal puzzle solving, framed within the context of visual question-answering. We present a new dataset, AlgoPuzzleVQA designed to challenge and evaluate the capabilities of multimodal language models in solving algorithmic puzzles that necessitate both visual understanding, language understanding, and complex algorithmic reasoning. We create the puzzles… ▽ More

    Submitted 12 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  5. arXiv:2401.09395  [pdf, other

    cs.CL

    Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions

    Authors: Pengfei Hong, Navonil Majumder, Deepanway Ghosal, Somak Aditya, Rada Mihalcea, Soujanya Poria

    Abstract: Recent advancements in Large Language Models (LLMs) have showcased striking results on existing logical reasoning benchmarks, with some models even surpassing human performance. However, the true depth of their competencies and robustness in reasoning tasks remains an open question. To this end, in this paper, we focus on two popular reasoning tasks: arithmetic reasoning and code generation. Parti… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  6. arXiv:2310.20159  [pdf, other

    cs.CV cs.AI

    Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts

    Authors: Deepanway Ghosal, Navonil Majumder, Roy Ka-Wei Lee, Rada Mihalcea, Soujanya Poria

    Abstract: Visual question answering (VQA) is the task of answering questions about an image. The task assumes an understanding of both the image and the question to provide a natural language answer. VQA has gained popularity in recent years due to its potential applications in a wide range of fields, including robotics, education, and healthcare. In this paper, we focus on knowledge-augmented VQA, where an… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  7. arXiv:2307.02053  [pdf, other

    cs.CL

    Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning

    Authors: Deepanway Ghosal, Yew Ken Chia, Navonil Majumder, Soujanya Poria

    Abstract: Recently, the release of INSTRUCTEVAL has provided valuable insights into the performance of large language models (LLMs) that utilize encoder-decoder or decoder-only architecture. Interestingly, despite being introduced four years ago, T5-based LLMs, such as FLAN-T5, continue to outperform the latest decoder-based LLMs, such as LLAMA and VICUNA, on tasks that require general problem-solving skill… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  8. arXiv:2305.11826  [pdf, other

    cs.CL cs.AI

    ReTAG: Reasoning Aware Table to Analytic Text Generation

    Authors: Deepanway Ghosal, Preksha Nema, Aravindan Raghuveer

    Abstract: The task of table summarization involves generating text that both succinctly and accurately represents the table or a specific set of highlighted cells within a table. While significant progress has been made in table to text generation techniques, models still mostly generate descriptive summaries, which reiterates the information contained within the table in sentences. Through analysis of popu… ▽ More

    Submitted 29 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  9. arXiv:2304.13731  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

    Authors: Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria

    Abstract: The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks. Inspired by such successes, we adopt such an instruction-tuned LLM Flan-T5 as the text encoder for text-to-audio (TTA) generation… ▽ More

    Submitted 29 May, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: https://github.com/declare-lab/tango

  10. arXiv:2210.16495  [pdf, other

    cs.CL cs.AI cs.LG

    Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering

    Authors: Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: We propose a simple refactoring of multi-choice question answering (MCQA) tasks as a series of binary classifications. The MCQA task is generally performed by scoring each (question, answer) pair normalized over all the pairs, and then selecting the answer from the pair that yield the highest score. For n answer choices, this is equivalent to an n-class classification setup where only one class (t… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

  11. arXiv:2210.02890  [pdf, other

    cs.CL

    Multiview Contextual Commonsense Inference: A New Dataset and Task

    Authors: Siqi Shen, Deepanway Ghosal, Navonil Majumder, Henry Lim, Rada Mihalcea, Soujanya Poria

    Abstract: Contextual commonsense inference is the task of generating various types of explanations around the events in a dyadic dialogue, including cause, motivation, emotional reaction, and others. Producing a coherent and non-trivial explanation requires awareness of the dialogue's structure and of how an event is grounded in the context. In this work, we create CICEROv2, a dataset consisting of 8,351 in… ▽ More

    Submitted 2 November, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

  12. arXiv:2208.14641  [pdf, other

    cs.CL cs.AI

    Generating Intermediate Steps for NLI with Next-Step Supervision

    Authors: Deepanway Ghosal, Somak Aditya, Monojit Choudhury

    Abstract: The Natural Language Inference (NLI) task often requires reasoning over multiple steps to reach the conclusion. While the necessity of generating such intermediate steps (instead of a summary explanation) has gained popular support, it is unclear how to generate such steps without complete end-to-end supervision and how such generated steps can be further utilized. In this work, we train a sequenc… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

  13. arXiv:2203.13926  [pdf, other

    cs.CL cs.AI

    CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues

    Authors: Deepanway Ghosal, Siqi Shen, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: This paper addresses the problem of dialogue reasoning with contextualized commonsense inference. We curate CICERO, a dataset of dyadic conversations with five types of utterance-level reasoning-based inferences: cause, subsequent event, prerequisite, motivation, and emotional reaction. The dataset contains 53,105 of such inferences from 5,672 dialogues. We use this dataset to solve relevant gener… ▽ More

    Submitted 6 April, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  14. arXiv:2109.02247  [pdf, other

    cs.CL cs.AI

    STaCK: Sentence Ordering with Temporal Commonsense Knowledge

    Authors: Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: Sentence order prediction is the task of finding the correct order of sentences in a randomly ordered document. Correctly ordering the sentences requires an understanding of coherence with respect to the chronological sequence of events described in the text. Document-level contextual understanding and commonsense knowledge centered around these events are often essential in uncovering this cohere… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted as a full paper at EMNLP 2021

  15. arXiv:2106.11791  [pdf, other

    cs.CL cs.AI

    Exemplars-guided Empathetic Response Generation Controlled by the Elements of Human Communication

    Authors: Navonil Majumder, Deepanway Ghosal, Devamanyu Hazarika, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria

    Abstract: The majority of existing methods for empathetic response generation rely on the emotion of the context to generate empathetic responses. However, empathy is much more than generating responses with an appropriate emotion. It also often entails subtle expressions of understanding and personal resonance with the situation of the other interlocutor. Unfortunately, such qualities are difficult to quan… ▽ More

    Submitted 4 August, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

  16. arXiv:2106.00510  [pdf, other

    cs.CL cs.AI cs.LG

    CIDER: Commonsense Inference for Dialogue Explanation and Reasoning

    Authors: Deepanway Ghosal, Pengfei Hong, Siqi Shen, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: Commonsense inference to understand and explain human language is a fundamental research problem in natural language processing. Explaining human conversations poses a great challenge as it requires contextual understanding, planning, inference, and several aspects of reasoning including causal, temporal, and commonsense reasoning. In this work, we introduce CIDER -- a manually curated dataset tha… ▽ More

    Submitted 29 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: SIGDIAL 2021

  17. arXiv:2012.14996  [pdf, other

    cs.NI

    TCP D*: A Low Latency First Congestion Control Algorithm

    Authors: Taran Lynn, Dipak Ghosal

    Abstract: The choice of feedback mechanism between delay and packet loss has long been a point of contention in TCP congestion control. This has partly been resolved, as it has become increasingly evident that delay based methods are needed to facilitate modern interactive web applications. However, what has not been resolved is what control should be used, with the two candidates being the congestion windo… ▽ More

    Submitted 29 December, 2020; originally announced December 2020.

  18. arXiv:2012.11820  [pdf, other

    cs.CL

    Recognizing Emotion Cause in Conversations

    Authors: Soujanya Poria, Navonil Majumder, Devamanyu Hazarika, Deepanway Ghosal, Rishabh Bhardwaj, Samson Yu Bai Jian, Pengfei Hong, Romila Ghosh, Abhinaba Roy, Niyati Chhaya, Alexander Gelbukh, Rada Mihalcea

    Abstract: We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines. The dataset is available at https://github.com/declare-lab/RECCON. Introduction: Recognizing the cause behind emotions in text is a fundamental yet under-explored area of research in NL… ▽ More

    Submitted 28 July, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: https://github.com/declare-lab/RECCON, Accepted at Cognitive Computation

  19. arXiv:2012.06236  [pdf, other

    cs.CL

    Improving Zero Shot Learning Baselines with Commonsense Knowledge

    Authors: Abhinaba Roy, Deepanway Ghosal, Erik Cambria, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: Zero shot learning -- the problem of training and testing on a completely disjoint set of classes -- relies greatly on its ability to transfer knowledge from train classes to test classes. Traditionally semantic embeddings consisting of human defined attributes (HA) or distributed word embeddings (DWE) are used to facilitate this transfer by improving the association between visual and semantic em… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

  20. arXiv:2011.09954  [pdf, other

    cs.CL cs.LG

    Persuasive Dialogue Understanding: the Baselines and Negative Results

    Authors: Hui Chen, Deepanway Ghosal, Navonil Majumder, Amir Hussain, Soujanya Poria

    Abstract: Persuasion aims at forming one's opinion and action via a series of persuasive messages containing persuader's strategies. Due to its potential application in persuasive dialogue systems, the task of persuasive strategy recognition has gained much attention lately. Previous methods on user intent recognition in dialogue systems adopt recurrent neural network (RNN) or convolutional neural network (… ▽ More

    Submitted 22 November, 2020; v1 submitted 19 November, 2020; originally announced November 2020.

    Comments: 12 pages, 5 figures

  21. arXiv:2010.02795  [pdf, other

    cs.CL

    COSMIC: COmmonSense knowledge for eMotion Identification in Conversations

    Authors: Deepanway Ghosal, Navonil Majumder, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria

    Abstract: In this paper, we address the task of utterance level emotion recognition in conversations using commonsense knowledge. We propose COSMIC, a new framework that incorporates different elements of commonsense such as mental states, events, and causal relations, and build upon them to learn interactions between interlocutors participating in a conversation. Current state-of-the-art methods often enco… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  22. arXiv:2010.01454  [pdf, other

    cs.CL

    MIME: MIMicking Emotions for Empathetic Response Generation

    Authors: Navonil Majumder, Pengfei Hong, Shanshan Peng, Jiankun Lu, Deepanway Ghosal, Alexander Gelbukh, Rada Mihalcea, Soujanya Poria

    Abstract: Current approaches to empathetic response generation view the set of emotions expressed in the input text as a flat structure, where all the emotions are treated uniformly. We argue that empathetic responses often mimic the emotion of the user to a varying degree, depending on its positivity or negativity and content. We show that the consideration of this polarity-based emotion clusters and emoti… ▽ More

    Submitted 3 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  23. arXiv:2009.13902  [pdf, other

    cs.CL

    Utterance-level Dialogue Understanding: An Empirical Study

    Authors: Deepanway Ghosal, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: The recent abundance of conversational data on the Web and elsewhere calls for effective NLP systems for dialog understanding. Complete utterance-level understanding often requires context understanding, defined by nearby utterances. In recent years, a number of approaches have been proposed for various utterance-level dialogue understanding tasks. Most of these approaches account for the context… ▽ More

    Submitted 22 October, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

  24. arXiv:2005.12770  [pdf, other

    cs.CV cs.LG eess.IV

    Visual Interest Prediction with Attentive Multi-Task Transfer Learning

    Authors: Deepanway Ghosal, Maheshkumar H. Kolekar

    Abstract: Visual interest & affect prediction is a very interesting area of research in the area of computer vision. In this paper, we propose a transfer learning and attention mechanism based neural network model to predict visual interest & affective dimensions in digital photos. Learning the multi-dimensional affects is addressed through a multi-task learning framework. With various experiments we show t… ▽ More

    Submitted 27 May, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

  25. arXiv:2005.00791  [pdf, other

    cs.CL

    KinGDOM: Knowledge-Guided DOMain adaptation for sentiment analysis

    Authors: Deepanway Ghosal, Devamanyu Hazarika, Abhinaba Roy, Navonil Majumder, Rada Mihalcea, Soujanya Poria

    Abstract: Cross-domain sentiment analysis has received significant attention in recent years, prompted by the need to combat the domain gap between different applications that make use of sentiment analysis. In this paper, we take a novel perspective on this task by exploring the role of external commonsense knowledge. We introduce a new framework, KinGDOM, which utilizes the ConceptNet knowledge graph to e… ▽ More

    Submitted 11 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

  26. arXiv:2002.09825  [pdf, other

    cs.NI

    Model Predictive Congestion Control for TCP Endpoints

    Authors: Taran Lynn, Dipak Ghosal, Nathan Hanford

    Abstract: A common problem in science networks and private wide area networks (WANs) is that of achieving predictable data transfers of multiple concurrent flows by maintaining specific pacing rates for each. We address this problem by developing a control algorithm based on concepts from model predictive control (MPC) to produce flows with smooth pacing rates and round trip times (RTTs). In the proposed ap… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

    Comments: 13 pages, 13 figures

  27. arXiv:1908.11540  [pdf, other

    cs.CL cs.LG

    DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation

    Authors: Deepanway Ghosal, Navonil Majumder, Soujanya Poria, Niyati Chhaya, Alexander Gelbukh

    Abstract: Emotion recognition in conversation (ERC) has received much attention, lately, from researchers due to its potential widespread applications in diverse areas, such as health-care, education, and human resources. In this paper, we present Dialogue Graph Convolutional Network (DialogueGCN), a graph neural network based approach to ERC. We leverage self and inter-speaker dependency of the interlocuto… ▽ More

    Submitted 30 August, 2019; originally announced August 2019.

    Comments: Accepted at EMNLP 2019

  28. arXiv:1905.05812  [pdf, other

    cs.CL

    Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis

    Authors: Md Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, Pushpak Bhattacharyya

    Abstract: Related tasks often have inter-dependence on each other and perform better when solved in a joint framework. In this paper, we present a deep multi-task learning framework that jointly performs sentiment and emotion analysis both. The multi-modal inputs (i.e., text, acoustic and visual frames) of a video convey diverse and distinctive information, and usually do not have equal contribution in the… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: Accepted for publication in NAACL:HLT-2019

  29. arXiv:1808.01216  [pdf, other

    cs.CL

    A Multi-task Ensemble Framework for Emotion, Sentiment and Intensity Prediction

    Authors: Md Shad Akhtar, Deepanway Ghosal, Asif Ekbal, Pushpak Bhattacharyya, Sadao Kurohashi

    Abstract: In this paper, through multi-task ensemble framework we address three problems of emotion and sentiment analysis i.e. "emotion classification & intensity", "valence, arousal & dominance for emotion" and "valence & arousal} for sentiment". The underlying problems cover two granularities (i.e. coarse-grained and fine-grained) and a diverse range of domains (i.e. tweets, Facebook posts, news headline… ▽ More

    Submitted 15 October, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

  30. arXiv:1803.05080  [pdf, ps, other

    cs.NI

    A Survey of Multimedia Streaming in LTE Cellular Networks

    Authors: Ahmed Ahmedin, Amitabha Ghosh, Dipak Ghosal

    Abstract: With the growing of Long Term Evolution (LTE) cellular networks and the increase in the demand of the video services, it is vital to consider the challenges in the streaming services from a different perspective. A perspective that focuses on the streaming services in light of cellular networks challenges, both per layer basis and across multiple layers as well. In this tutorial, we highlight the… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

  31. arXiv:1003.5897  [pdf

    cs.CV

    Development of a Multi-User Recognition Engine for Handwritten Bangla Basic Characters and Digits

    Authors: Sandip Rakshit, Debkumar Ghosal, Tanmoy Das, Subhrajit Dutta, Subhadip Basu

    Abstract: The objective of the paper is to recognize handwritten samples of basic Bangla characters using Tesseract open source Optical Character Recognition (OCR) engine under Apache License 2.0. Handwritten data samples containing isolated Bangla basic characters and digits were collected from different users. Tesseract is trained with user-specific data samples of document pages to generate separate user… ▽ More

    Submitted 30 March, 2010; originally announced March 2010.

    Comments: Proc. (CD) Int. Conf. on Information Technology and Business Intelligence (2009)