-
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning
Authors:
Abisek Rajakumar Kalarani,
Pushpak Bhattacharyya,
Niyati Chhaya,
Sumit Shekhar
Abstract:
Well-formed context aware image captions and tags in enterprise content such as marketing material are critical to ensure their brand presence and content recall. Manual creation and updates to ensure the same is non trivial given the scale and the tedium towards this task. We propose a new unified Vision-Language (VL) model based on the One For All (OFA) model, with a focus on context-assisted im…
▽ More
Well-formed context aware image captions and tags in enterprise content such as marketing material are critical to ensure their brand presence and content recall. Manual creation and updates to ensure the same is non trivial given the scale and the tedium towards this task. We propose a new unified Vision-Language (VL) model based on the One For All (OFA) model, with a focus on context-assisted image captioning where the caption is generated based on both the image and its context. Our approach aims to overcome the context-independent (image and text are treated independently) nature of the existing approaches. We exploit context by pretraining our model with datasets of three tasks: news image captioning where the news article is the context, contextual visual entailment, and keyword extraction from the context. The second pretraining task is a new VL task, and we construct and release two datasets for the task with 1.1M and 2.2K data instances. Our system achieves state-of-the-art results with an improvement of up to 8.34 CIDEr score on the benchmark news image captioning datasets. To the best of our knowledge, ours is the first effort at incorporating contextual information in pretraining the models for the VL tasks.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
EmpathBERT: A BERT-based Framework for Demographic-aware Empathy Prediction
Authors:
Bhanu Prakash Reddy Guda,
Aparna Garimella,
Niyati Chhaya
Abstract:
Affect preferences vary with user demographics, and tapping into demographic information provides important cues about the users' language preferences. In this paper, we utilize the user demographics, and propose EmpathBERT, a demographic-aware framework for empathy prediction based on BERT. Through several comparative experiments, we show that EmpathBERT surpasses traditional machine learning and…
▽ More
Affect preferences vary with user demographics, and tapping into demographic information provides important cues about the users' language preferences. In this paper, we utilize the user demographics, and propose EmpathBERT, a demographic-aware framework for empathy prediction based on BERT. Through several comparative experiments, we show that EmpathBERT surpasses traditional machine learning and deep learning models, and illustrate the importance of user demographics to predict empathy and distress in user responses to stimulative news articles. We also highlight the importance of affect information in the responses by developing affect-aware models to predict user demographic attributes.
△ Less
Submitted 30 January, 2021;
originally announced February 2021.
-
Recognizing Emotion Cause in Conversations
Authors:
Soujanya Poria,
Navonil Majumder,
Devamanyu Hazarika,
Deepanway Ghosal,
Rishabh Bhardwaj,
Samson Yu Bai Jian,
Pengfei Hong,
Romila Ghosh,
Abhinaba Roy,
Niyati Chhaya,
Alexander Gelbukh,
Rada Mihalcea
Abstract:
We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines. The dataset is available at https://github.com/declare-lab/RECCON.
Introduction: Recognizing the cause behind emotions in text is a fundamental yet under-explored area of research in NL…
▽ More
We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines. The dataset is available at https://github.com/declare-lab/RECCON.
Introduction: Recognizing the cause behind emotions in text is a fundamental yet under-explored area of research in NLP. Advances in this area hold the potential to improve interpretability and performance in affect-based models. Identifying emotion causes at the utterance level in conversations is particularly challenging due to the intermingling dynamics among the interlocutors.
Method: We introduce the task of Recognizing Emotion Cause in CONversations with an accompanying dataset named RECCON, containing over 1,000 dialogues and 10,000 utterance cause-effect pairs. Furthermore, we define different cause types based on the source of the causes, and establish strong Transformer-based baselines to address two different sub-tasks on this dataset: causal span extraction and causal emotion entailment.
Result: Our Transformer-based baselines, which leverage contextual pre-trained embeddings, such as RoBERTa, outperform the state-of-the-art emotion cause extraction approaches
Conclusion: We introduce a new task highly relevant for (explainable) emotion-aware artificial intelligence: recognizing emotion cause in conversations, provide a new highly challenging publicly available dialogue-level dataset for this task, and give strong baseline results on this dataset.
△ Less
Submitted 28 July, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
An Integrated Approach for Improving Brand Consistency of Web Content: Modeling, Analysis and Recommendation
Authors:
Soumyadeep Roy,
Shamik Sural,
Niyati Chhaya,
Anandhavelu Natarajan,
Niloy Ganguly
Abstract:
A consumer-dependent (business-to-consumer) organization tends to present itself as possessing a set of human qualities, which is termed as the brand personality of the company. The perception is impressed upon the consumer through the content, be it in the form of advertisement, blogs or magazines, produced by the organization. A consistent brand will generate trust and retain customers over time…
▽ More
A consumer-dependent (business-to-consumer) organization tends to present itself as possessing a set of human qualities, which is termed as the brand personality of the company. The perception is impressed upon the consumer through the content, be it in the form of advertisement, blogs or magazines, produced by the organization. A consistent brand will generate trust and retain customers over time as they develop an affinity towards regularity and common patterns. However, maintaining a consistent messaging tone for a brand has become more challenging with the virtual explosion in the amount of content which needs to be authored and pushed to the Internet to maintain an edge in the era of digital marketing. To understand the depth of the problem, we collect around 300K web page content from around 650 companies. We develop trait-specific classification models by considering the linguistic features of the content. The classifier automatically identifies the web articles which are not consistent with the mission and vision of a company and further helps us to discover the conditions under which the consistency cannot be maintained. To address the brand inconsistency issue, we then develop a sentence ranking system that outputs the top three sentences that need to be changed for making a web article more consistent with the company's brand personality.
△ Less
Submitted 14 August, 2021; v1 submitted 19 November, 2020;
originally announced November 2020.
-
CaM-Gen:Causally-aware Metric-guided Text Generation
Authors:
Navita Goyal,
Roodram Paneri,
Ayush Agarwal,
Udit Kalani,
Abhilasha Sancheti,
Niyati Chhaya
Abstract:
Content is created for a well-defined purpose, often described by a metric or signal represented in the form of structured information. The relationship between the goal (metrics) of target content and the content itself is non-trivial. While large-scale language models show promising text generation capabilities, guiding the generated text with external metrics is challenging. These metrics and c…
▽ More
Content is created for a well-defined purpose, often described by a metric or signal represented in the form of structured information. The relationship between the goal (metrics) of target content and the content itself is non-trivial. While large-scale language models show promising text generation capabilities, guiding the generated text with external metrics is challenging. These metrics and content tend to have inherent relationships and not all of them may be of consequence. We introduce CaM-Gen: Causally aware Generative Networks guided by user-defined target metrics incorporating the causal relationships between the metric and content features. We leverage causal inference techniques to identify causally significant aspects of a text that lead to the target metric and then explicitly guide generative models towards these by a feedback mechanism. We propose this mechanism for variational autoencoder and Transformer-based generative models. The proposed models beat baselines in terms of the target metric control while maintaining fluency and language quality of the generated text. To the best of our knowledge, this is one of the early attempts at controlled generation incorporating a metric guide using causal inference.
△ Less
Submitted 25 March, 2022; v1 submitted 24 October, 2020;
originally announced October 2020.
-
"To Target or Not to Target": Identification and Analysis of Abusive Text Using Ensemble of Classifiers
Authors:
Gaurav Verma,
Niyati Chhaya,
Vishwa Vinay
Abstract:
With rising concern around abusive and hateful behavior on social media platforms, we present an ensemble learning method to identify and analyze the linguistic properties of such content. Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language. The proposed approach provides c…
▽ More
With rising concern around abusive and hateful behavior on social media platforms, we present an ensemble learning method to identify and analyze the linguistic properties of such content. Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language. The proposed approach provides comparable results to the existing state-of-the-art on the Twitter Abusive Behavior dataset (Founta et al. 2018) without using any user or network-related information; solely relying on textual properties. We believe that the presented insights and discussion of shortcomings of current approaches will highlight potential directions for future research.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Multi-label Categorization of Accounts of Sexism using a Neural Framework
Authors:
Pulkit Parikh,
Harika Abburi,
Pinkesh Badjatiya,
Radhika Krishnan,
Niyati Chhaya,
Manish Gupta,
Vasudeva Varma
Abstract:
Sexism, an injustice that subjects women and girls to enormous suffering, manifests in blatant as well as subtle ways. In the wake of growing documentation of experiences of sexism on the web, the automatic categorization of accounts of sexism has the potential to assist social scientists and policy makers in studying and countering sexism better. The existing work on sexism classification, which…
▽ More
Sexism, an injustice that subjects women and girls to enormous suffering, manifests in blatant as well as subtle ways. In the wake of growing documentation of experiences of sexism on the web, the automatic categorization of accounts of sexism has the potential to assist social scientists and policy makers in studying and countering sexism better. The existing work on sexism classification, which is different from sexism detection, has certain limitations in terms of the categories of sexism used and/or whether they can co-occur. To the best of our knowledge, this is the first work on the multi-label classification of sexism of any kind(s), and we contribute the largest dataset for sexism categorization. We develop a neural solution for this multi-label classification that can combine sentence representations obtained using models such as BERT with distributional and linguistic word embeddings using a flexible, hierarchical architecture involving recurrent components and optional convolutional ones. Further, we leverage unlabeled accounts of sexism to infuse domain-specific elements into our framework. The best proposed method outperforms several deep learning as well as traditional machine learning baselines by an appreciable margin.
△ Less
Submitted 18 November, 2019; v1 submitted 10 October, 2019;
originally announced October 2019.
-
Affect Enriched Word Embeddings for News Information Retrieval
Authors:
Tommaso Teofili,
Niyati Chhaya
Abstract:
Distributed representations of words have shown to be useful to improve the effectiveness of IR systems in many sub-tasks like query expansion, retrieval and ranking. Algorithms like word2vec, GloVe and others are also key factors in many improvements in different NLP tasks. One common issue with such embedding models is that words like happy and sad appear in similar contexts and hence are wrongl…
▽ More
Distributed representations of words have shown to be useful to improve the effectiveness of IR systems in many sub-tasks like query expansion, retrieval and ranking. Algorithms like word2vec, GloVe and others are also key factors in many improvements in different NLP tasks. One common issue with such embedding models is that words like happy and sad appear in similar contexts and hence are wrongly clustered close in the embedding space. In this paper we leverage Aff2Vec, a set of word embeddings models which include affect information, in order to better capture the affect aspect in news text to achieve better results in information retrieval tasks, also such embeddings are less hit by the synonym/antonym issue. We evaluate their effectiveness on two IR related tasks (query expansion and ranking) over the New York Times dataset (TREC-core '17) comparing them against other word embeddings based models and classic ranking models.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation
Authors:
Deepanway Ghosal,
Navonil Majumder,
Soujanya Poria,
Niyati Chhaya,
Alexander Gelbukh
Abstract:
Emotion recognition in conversation (ERC) has received much attention, lately, from researchers due to its potential widespread applications in diverse areas, such as health-care, education, and human resources. In this paper, we present Dialogue Graph Convolutional Network (DialogueGCN), a graph neural network based approach to ERC. We leverage self and inter-speaker dependency of the interlocuto…
▽ More
Emotion recognition in conversation (ERC) has received much attention, lately, from researchers due to its potential widespread applications in diverse areas, such as health-care, education, and human resources. In this paper, we present Dialogue Graph Convolutional Network (DialogueGCN), a graph neural network based approach to ERC. We leverage self and inter-speaker dependency of the interlocutors to model conversational context for emotion recognition. Through the graph network, DialogueGCN addresses context propagation issues present in the current RNN-based methods. We empirically show that this method alleviates such issues, while outperforming the current state of the art on a number of benchmark emotion classification datasets.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
Variational Fusion for Multimodal Sentiment Analysis
Authors:
Navonil Majumder,
Soujanya Poria,
Gangeshwar Krishnamurthy,
Niyati Chhaya,
Rada Mihalcea,
Alexander Gelbukh
Abstract:
Multimodal fusion is considered a key step in multimodal tasks such as sentiment analysis, emotion detection, question answering, and others. Most of the recent work on multimodal fusion does not guarantee the fidelity of the multimodal representation with respect to the unimodal representations. In this paper, we propose a variational autoencoder-based approach for modality fusion that minimizes…
▽ More
Multimodal fusion is considered a key step in multimodal tasks such as sentiment analysis, emotion detection, question answering, and others. Most of the recent work on multimodal fusion does not guarantee the fidelity of the multimodal representation with respect to the unimodal representations. In this paper, we propose a variational autoencoder-based approach for modality fusion that minimizes information loss between unimodal and multimodal representations. We empirically show that this method outperforms the state-of-the-art methods by a significant margin on several popular datasets.
△ Less
Submitted 13 August, 2019;
originally announced August 2019.
-
Sentiment and Sarcasm Classification with Multitask Learning
Authors:
Navonil Majumder,
Soujanya Poria,
Haiyun Peng,
Niyati Chhaya,
Erik Cambria,
Alexander Gelbukh
Abstract:
Sentiment classification and sarcasm detection are both important natural language processing (NLP) tasks. Sentiment is always coupled with sarcasm where intensive emotion is expressed. Nevertheless, most literature considers them as two separate tasks. We argue that knowledge in sarcasm detection can also be beneficial to sentiment classification and vice versa. We show that these two tasks are c…
▽ More
Sentiment classification and sarcasm detection are both important natural language processing (NLP) tasks. Sentiment is always coupled with sarcasm where intensive emotion is expressed. Nevertheless, most literature considers them as two separate tasks. We argue that knowledge in sarcasm detection can also be beneficial to sentiment classification and vice versa. We show that these two tasks are correlated, and present a multi-task learning-based framework using a deep neural network that models this correlation to improve the performance of both tasks in a multi-task learning setting. Our method outperforms the state of the art by 3-4% in the benchmark dataset.
△ Less
Submitted 8 March, 2019; v1 submitted 23 January, 2019;
originally announced January 2019.
-
Aff2Vec: Affect--Enriched Distributional Word Representations
Authors:
Sopan Khosla,
Niyati Chhaya,
Kushal Chawla
Abstract:
Human communication includes information, opinions, and reactions. Reactions are often captured by the affective-messages in written as well as verbal communications. While there has been work in affect modeling and to some extent affective content generation, the area of affective word distributions in not well studied. Synsets and lexica capture semantic relationships across words. These models…
▽ More
Human communication includes information, opinions, and reactions. Reactions are often captured by the affective-messages in written as well as verbal communications. While there has been work in affect modeling and to some extent affective content generation, the area of affective word distributions in not well studied. Synsets and lexica capture semantic relationships across words. These models however lack in encoding affective or emotional word interpretations. Our proposed model, Aff2Vec provides a method for enriched word embeddings that are representative of affective interpretations of words. Aff2Vec outperforms the state--of--the--art in intrinsic word-similarity tasks. Further, the use of Aff2Vec representations outperforms baseline embeddings in downstream natural language understanding tasks including sentiment analysis, personality detection, and frustration prediction.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.