Search | arXiv e-print repository

A Study on Bias Detection and Classification in Natural Language Processing

Authors: Ana Sofia Evans, Helena Moniz, Luísa Coheur

Abstract: Human biases have been shown to influence the performance of models and algorithms in various fields, including Natural Language Processing. While the study of this phenomenon is garnering focus in recent years, the available resources are still relatively scarce, often focusing on different forms or manifestations of biases. The aim of our work is twofold: 1) gather publicly-available datasets an… ▽ More Human biases have been shown to influence the performance of models and algorithms in various fields, including Natural Language Processing. While the study of this phenomenon is garnering focus in recent years, the available resources are still relatively scarce, often focusing on different forms or manifestations of biases. The aim of our work is twofold: 1) gather publicly-available datasets and determine how to better combine them to effectively train models in the task of hate speech detection and classification; 2) analyse the main issues with these datasets, such as scarcity, skewed resources, and reliance on non-persistent data. We discuss these issues in tandem with the development of our experiments, in which we show that the combinations of different datasets greatly impact the models' performance. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 31 pages, 15 Tables, 4 Figures

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2407.14538 [pdf, other]

Alea-BFT: Practical Asynchronous Byzantine Fault Tolerance

Authors: Diogo S. Antunes, Afonso N. Oliveira, André Breda, Matheus Guilherme Franco, Henrique Moniz, Rodrigo Rodrigues

Abstract: Traditional Byzantine Fault Tolerance (BFT) state machine replication protocols assume a partial synchrony model, leading to a design where a leader replica drives the protocol and is replaced after a timeout. Recently, we witnessed a surge of asynchronous BFT protocols, which use randomization to remove the need for bounds on message delivery times, making them more resilient to adverse network c… ▽ More Traditional Byzantine Fault Tolerance (BFT) state machine replication protocols assume a partial synchrony model, leading to a design where a leader replica drives the protocol and is replaced after a timeout. Recently, we witnessed a surge of asynchronous BFT protocols, which use randomization to remove the need for bounds on message delivery times, making them more resilient to adverse network conditions. However, existing research proposals still fall short of gaining practical adoption, plausibly because they are not able to combine good performance with a simple design that can be readily understood and adopted. In this paper, we present Alea-BFT, a simple and highly efficient asynchronous BFT protocol, which is gaining practical adoption, namely in Ethereum distributed validators. Alea-BFT brings the key design insight from classical protocols of concentrating part of the work on a single designated replica and incorporates this principle in a simple two-stage pipelined design, with an efficient broadcast led by the designated replica, followed by an inexpensive binary agreement. The evaluation of our research prototype implementation and two real-world integrations in cryptocurrency ecosystems shows excellent performance, improving on the fastest protocol (Dumbo-NG) in terms of latency and displaying good performance under faults. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.02071

ACM Class: C.2.4; D.4.5

Journal ref: In21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24) 2024 (pp. 313-328)

arXiv:2407.03818 [pdf, other]

ConText at WASSA 2024 Empathy and Personality Shared Task: History-Dependent Embedding Utterance Representations for Empathy and Emotion Prediction in Conversations

Authors: Patrícia Pereira, Helena Moniz, Joao Paulo Carvalho

Abstract: Empathy and emotion prediction are key components in the development of effective and empathetic agents, amongst several other applications. The WASSA shared task on empathy and emotion prediction in interactions presents an opportunity to benchmark approaches to these tasks. Appropriately selecting and representing the historical context is crucial in the modelling of empathy and emotion in conve… ▽ More Empathy and emotion prediction are key components in the development of effective and empathetic agents, amongst several other applications. The WASSA shared task on empathy and emotion prediction in interactions presents an opportunity to benchmark approaches to these tasks. Appropriately selecting and representing the historical context is crucial in the modelling of empathy and emotion in conversations. In our submissions, we model empathy, emotion polarity and emotion intensity of each utterance in a conversation by feeding the utterance to be classified together with its conversational context, i.e., a certain number of previous conversational turns, as input to an encoder Pre-trained Language Model, to which we append a regression head for prediction. We also model perceived counterparty empathy of each interlocutor by feeding all utterances from the conversation and a token identifying the interlocutor for which we are predicting the empathy. Our system officially ranked $1^{st}$ at the CONV-turn track and $2^{nd}$ at the CONV-dialog track. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: WASSA'24

arXiv:2311.13910 [pdf, other]

Dialogue Quality and Emotion Annotations for Customer Support Conversations

Authors: John Mendonça, Patrícia Pereira, Miguel Menezes, Vera Cabarrão, Ana C. Farinha, Helena Moniz, João Paulo Carvalho, Alon Lavie, Isabel Trancoso

Abstract: Task-oriented conversational datasets often lack topic variability and linguistic diversity. However, with the advent of Large Language Models (LLMs) pretrained on extensive, multilingual and diverse text data, these limitations seem overcome. Nevertheless, their generalisability to different languages and domains in dialogue applications remains uncertain without benchmarking datasets. This paper… ▽ More Task-oriented conversational datasets often lack topic variability and linguistic diversity. However, with the advent of Large Language Models (LLMs) pretrained on extensive, multilingual and diverse text data, these limitations seem overcome. Nevertheless, their generalisability to different languages and domains in dialogue applications remains uncertain without benchmarking datasets. This paper presents a holistic annotation approach for emotion and conversational quality in the context of bilingual customer support conversations. By performing annotations that take into consideration the complete instances that compose a conversation, one can form a broader perspective of the dialogue as a whole. Furthermore, it provides a unique and valuable resource for the development of text classification models. To this end, we present benchmarks for Emotion Recognition and Dialogue Quality Estimation and show that further research is needed to leverage these models in a production setting. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Accepted at GEM (EMNLP Workshop)

arXiv:2309.04292 [pdf, other]

Fuzzy Fingerprinting Transformer Language-Models for Emotion Recognition in Conversations

Authors: Patrícia Pereira, Rui Ribeiro, Helena Moniz, Luisa Coheur, Joao Paulo Carvalho

Abstract: Fuzzy Fingerprints have been successfully used as an interpretable text classification technique, but, like most other techniques, have been largely surpassed in performance by Large Pre-trained Language Models, such as BERT or RoBERTa. These models deliver state-of-the-art results in several Natural Language Processing tasks, namely Emotion Recognition in Conversations (ERC), but suffer from the… ▽ More Fuzzy Fingerprints have been successfully used as an interpretable text classification technique, but, like most other techniques, have been largely surpassed in performance by Large Pre-trained Language Models, such as BERT or RoBERTa. These models deliver state-of-the-art results in several Natural Language Processing tasks, namely Emotion Recognition in Conversations (ERC), but suffer from the lack of interpretability and explainability. In this paper, we propose to combine the two approaches to perform ERC, as a means to obtain simpler and more interpretable Large Language Models-based classifiers. We propose to feed the utterances and their previous conversational turns to a pre-trained RoBERTa, obtaining contextual embedding utterance representations, that are then supplied to an adapted Fuzzy Fingerprint classification module. We validate our approach on the widely used DailyDialog ERC benchmark dataset, in which we obtain state-of-the-art level results using a much lighter model. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: FUZZ-IEEE 2023

arXiv:2308.16797 [pdf, other]

Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

Authors: John Mendonça, Patrícia Pereira, Helena Moniz, João Paulo Carvalho, Alon Lavie, Isabel Trancoso

Abstract: Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English. At the same time, ensuring metrics are invariant to semantically similar responses is also an overlooked topic. In order to achieve the desired properties of robustness and multilinguality for dialogue evaluation metrics, we propose… ▽ More Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English. At the same time, ensuring metrics are invariant to semantically similar responses is also an overlooked topic. In order to achieve the desired properties of robustness and multilinguality for dialogue evaluation metrics, we propose a novel framework that takes advantage of the strengths of current evaluation models with the newly-established paradigm of prompting Large Language Models (LLMs). Empirical results show our framework achieves state of the art results in terms of mean Spearman correlation scores across several benchmarks and ranks first place on both the Robust and Multilingual tasks of the DSTC11 Track 4 "Automatic Evaluation Metrics for Open-Domain Dialogue Systems", proving the evaluation capabilities of prompted LLMs. △ Less

Submitted 8 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: DSTC11 best paper for Track 4

arXiv:2304.08216 [pdf, other]

Context-Dependent Embedding Utterance Representations for Emotion Recognition in Conversations

Authors: Patrícia Pereira, Helena Moniz, Isabel Dias, Joao Paulo Carvalho

Abstract: Emotion Recognition in Conversations (ERC) has been gaining increasing importance as conversational agents become more and more common. Recognizing emotions is key for effective communication, being a crucial component in the development of effective and empathetic conversational agents. Knowledge and understanding of the conversational context are extremely valuable for identifying the emotions o… ▽ More Emotion Recognition in Conversations (ERC) has been gaining increasing importance as conversational agents become more and more common. Recognizing emotions is key for effective communication, being a crucial component in the development of effective and empathetic conversational agents. Knowledge and understanding of the conversational context are extremely valuable for identifying the emotions of the interlocutor. We thus approach Emotion Recognition in Conversations leveraging the conversational context, i.e., taking into attention previous conversational turns. The usual approach to model the conversational context has been to produce context-independent representations of each utterance and subsequently perform contextual modeling of these. Here we propose context-dependent embedding representations of each utterance by leveraging the contextual representational power of pre-trained transformer language models. In our approach, we feed the conversational context appended to the utterance to be classified as input to the RoBERTa encoder, to which we append a simple classification module, thus discarding the need to deal with context after obtaining the embeddings since these constitute already an efficient representation of such context. We also investigate how the number of introduced conversational turns influences our model performance. The effectiveness of our approach is validated on the open-domain DailyDialog dataset and on the task-oriented EmoWOZ dataset. △ Less

Submitted 3 June, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: WASSA'23

arXiv:2212.10201 [pdf, other]

A Simple Feature Method for Prosody Rhythm Comparison

Authors: Mariana Julião, Alberto Abad, Helena Moniz

Abstract: Of all components of Prosody, Rhythm has been regarded as the hardest to address, as it is utterly linked to Pitch and Intensity. Nevertheless, Rhythm is a very good indicator of a speaker's fluency in a foreign language or even of some diseases. Canonical ways to measure Rhythm, such as $ΔC$ or $\%V$, involve a cumbersome process of segment alignment, often leading to modest and questionable resu… ▽ More Of all components of Prosody, Rhythm has been regarded as the hardest to address, as it is utterly linked to Pitch and Intensity. Nevertheless, Rhythm is a very good indicator of a speaker's fluency in a foreign language or even of some diseases. Canonical ways to measure Rhythm, such as $ΔC$ or $\%V$, involve a cumbersome process of segment alignment, often leading to modest and questionable results. Perceptively, however, rhythm does not sound as difficult, as humans can grasp it even when the text is not fully intelligible. In this work, we develop an empirical and unsupervised method of rhythm assessment, which does not rely on the content. We have created a fixed-length representation of each utterance, Peak Embedding (PE), which codifies the proportional distance between peaks of the chosen Low-Level Descriptors. Clustering pairs of small sentence-like units, we have attained averages of 0.444 for Silhouette Coefficient using PE with Loudness, and 0.979 for Global Separability Index with a combination of PE with Pitch and Loudness. Clustering same-structure words, we have attained averages of 0.196 for Silhouette Coefficient and 0.864 for Global Separability Index for PE with Loudness. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2211.09172 [pdf, other]

Deep Emotion Recognition in Textual Conversations: A Survey

Authors: Patrícia Pereira, Helena Moniz, Joao Paulo Carvalho

Abstract: While Emotion Recognition in Conversations (ERC) has seen a tremendous advancement in the last few years, new applications and implementation scenarios present novel challenges and opportunities. These range from leveraging the conversational context, speaker and emotion dynamics modelling, to interpreting common sense expressions, informal language and sarcasm, addressing challenges of real time… ▽ More While Emotion Recognition in Conversations (ERC) has seen a tremendous advancement in the last few years, new applications and implementation scenarios present novel challenges and opportunities. These range from leveraging the conversational context, speaker and emotion dynamics modelling, to interpreting common sense expressions, informal language and sarcasm, addressing challenges of real time ERC, recognizing emotion causes, different taxonomies across datasets, multilingual ERC to interpretability. This survey starts by introducing ERC, elaborating on the challenges and opportunities pertaining to this task. It proceeds with a description of the emotion taxonomies and a variety of ERC benchmark datasets employing such taxonomies. This is followed by descriptions of the most prominent works in ERC with explanations of the Deep Learning architectures employed. Then, it provides advisable ERC practices towards better frameworks, elaborating on methods to deal with subjectivity in annotations and modelling and methods to deal with the typically unbalanced ERC datasets. Finally, it presents systematic review tables comparing several works regarding the methods used and their performance. The survey highlights the advantage of leveraging techniques to address unbalanced data, the exploration of mixed emotions and the benefits of incorporating annotation subjectivity in the learning phase. △ Less

Submitted 22 May, 2024; v1 submitted 16 November, 2022; originally announced November 2022.

arXiv:2202.02071 [pdf, other]

Alea-BFT: Practical Asynchronous Byzantine Fault Tolerance

Authors: Afonso Oliveira, Henrique Moniz, Rodrigo Rodrigues

Abstract: Traditional Byzantine Fault Tolerance (BFT) state machine replication protocols assume a partial synchrony model, leading to a design where a leader replica drives the protocol and is replaced after a timeout. Recently, we witnessed a surge of asynchronous BFT protocols that use randomization to remove the assumptions of bounds on message delivery times, making them more resilient to adverse netwo… ▽ More Traditional Byzantine Fault Tolerance (BFT) state machine replication protocols assume a partial synchrony model, leading to a design where a leader replica drives the protocol and is replaced after a timeout. Recently, we witnessed a surge of asynchronous BFT protocols that use randomization to remove the assumptions of bounds on message delivery times, making them more resilient to adverse network conditions. However, these protocols still fall short of being practical across a broad range of scenarios due to their cubic communication costs, use of expensive primitives, and overall protocol complexity. In this paper, we present Alea-BFT, the first asynchronous BFT protocol to achieve quadratic communication complexity, allowing it to scale to large networks. Alea-BFT brings the key design insight from classical protocols of concentrating part of the work on a single designated replica, and incorporates this principle in a two stage pipelined design, with an efficient broadcast led by the designated replica followed by an inexpensive binary agreement. We evaluated our prototype implementation across 10 sites in 4 continents, and our results show significant scalability gains from the proposed design. △ Less

Submitted 4 February, 2022; originally announced February 2022.

arXiv:2102.13030 [pdf, other]

Retrieval Augmentation for Deep Neural Networks

Authors: Rita Parada Ramos, Patrícia Pereira, Helena Moniz, Joao Paulo Carvalho, Bruno Martins

Abstract: Deep neural networks have achieved state-of-the-art results in various vision and/or language tasks. Despite the use of large training datasets, most models are trained by iterating over single input-output pairs, discarding the remaining examples for the current prediction. In this work, we actively exploit the training data, using the information from nearest training examples to aid the predict… ▽ More Deep neural networks have achieved state-of-the-art results in various vision and/or language tasks. Despite the use of large training datasets, most models are trained by iterating over single input-output pairs, discarding the remaining examples for the current prediction. In this work, we actively exploit the training data, using the information from nearest training examples to aid the prediction both during training and testing. Specifically, our approach uses the target of the most similar training example to initialize the memory state of an LSTM model, or to guide attention mechanisms. We apply this approach to image captioning and sentiment analysis, respectively through image and text retrieval. Results confirm the effectiveness of the proposed approach for the two tasks, on the widely used Flickr8 and IMDB datasets. Our code is publicly available at http://github.com/RitaRamo/retrieval-augmentation-nn. △ Less

Submitted 26 April, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

Comments: Accepted at IJCNN 2021

arXiv:2002.03613 [pdf, ps, other]

The Istanbul BFT Consensus Algorithm

Authors: Henrique Moniz

Abstract: This paper presents IBFT, a simple and elegant Byzantine fault-tolerant consensus algorithm that is used to implement state machine replication in the \emph{Quorum} blockchain. IBFT assumes a partially synchronous communication model, where safety does not depend on any timing assumptions and only liveness depends on periods of synchrony. The algorithm is deterministic, leader-based, and optimally… ▽ More This paper presents IBFT, a simple and elegant Byzantine fault-tolerant consensus algorithm that is used to implement state machine replication in the \emph{Quorum} blockchain. IBFT assumes a partially synchronous communication model, where safety does not depend on any timing assumptions and only liveness depends on periods of synchrony. The algorithm is deterministic, leader-based, and optimally resilient - tolerating $f$ faulty processes out of $n$, where $n \geq 3f+1$. During periods of good communication, IBFT achieves termination in three message delays and has $O(n^2)$ total communication complexity. △ Less

Submitted 19 May, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

arXiv:1503.09144 [pdf, ps, other]

Towards Using Machine Translation Techniques to Induce Multilingual Lexica of Discourse Markers

Authors: António Lopes, David Martins de Matos, Vera Cabarrão, Ricardo Ribeiro, Helena Moniz, Isabel Trancoso, Ana Isabel Mata

Abstract: Discourse markers are universal linguistic events subject to language variation. Although an extensive literature has already reported language specific traits of these events, little has been said on their cross-language behavior and on building an inventory of multilingual lexica of discourse markers. This work describes new methods and approaches for the description, classification, and annotat… ▽ More Discourse markers are universal linguistic events subject to language variation. Although an extensive literature has already reported language specific traits of these events, little has been said on their cross-language behavior and on building an inventory of multilingual lexica of discourse markers. This work describes new methods and approaches for the description, classification, and annotation of discourse markers in the specific domain of the Europarl corpus. The study of discourse markers in the context of translation is crucial due to the idiomatic nature of these structures. Multilingual lexica together with the functional analysis of such structures are useful tools for the hard task of translating discourse markers into possible equivalents from one language to another. Using Daniel Marcu's validated discourse markers for English, extracted from the Brown Corpus, our purpose is to build multilingual lexica of discourse markers for other languages, based on machine translation techniques. The major assumption in this study is that the usage of a discourse marker is independent of the language, i.e., the rhetorical function of a discourse marker in a sentence in one language is equivalent to the rhetorical function of the same discourse marker in another language. △ Less

Submitted 31 March, 2015; originally announced March 2015.

Comments: 6 pages

ACM Class: I.2.7

Showing 1–13 of 13 results for author: Moniz, H