Search | arXiv e-print repository

XNLIeu: a dataset for cross-lingual NLI in Basque

Authors: Maite Heredia, Julen Etxaniz, Muitze Zulaika, Xabier Saralegi, Jeremy Barnes, Aitor Soroa

Abstract: XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Natural Language Understanding (NLU) capabilities across languages. In this paper, we expand XNLI to include Basque, a low-resource language that can greatly benefit from transfer-learning approaches. The new dataset, dubbed XNLIeu, has been developed by first machine-translating the English XNLI cor… ▽ More XNLI is a popular Natural Language Inference (NLI) benchmark widely used to evaluate cross-lingual Natural Language Understanding (NLU) capabilities across languages. In this paper, we expand XNLI to include Basque, a low-resource language that can greatly benefit from transfer-learning approaches. The new dataset, dubbed XNLIeu, has been developed by first machine-translating the English XNLI corpus into Basque, followed by a manual post-edition step. We have conducted a series of experiments using mono- and multilingual LLMs to assess a) the effect of professional post-edition on the MT system; b) the best cross-lingual strategy for NLI in Basque; and c) whether the choice of the best cross-lingual strategy is influenced by the fact that the dataset is built by translation. The results show that post-edition is necessary and that the translate-train cross-lingual strategy obtains better results overall, although the gain is lower when tested in a dataset that has been built natively from scratch. Our code and datasets are publicly available under open licenses. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024

arXiv:2402.03223 [pdf, other]

doi 10.1145/3589335.3651902

English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts

Authors: Patrick Bareiß, Roman Klinger, Jeremy Barnes

Abstract: Emotion classification in text is a challenging task due to the processes involved when interpreting a textual description of a potential emotion stimulus. In addition, the set of emotion categories is highly domain-specific. For instance, literature analysis might require the use of aesthetic emotions (e.g., finding something beautiful), and social media analysis could benefit from fine-grained s… ▽ More Emotion classification in text is a challenging task due to the processes involved when interpreting a textual description of a potential emotion stimulus. In addition, the set of emotion categories is highly domain-specific. For instance, literature analysis might require the use of aesthetic emotions (e.g., finding something beautiful), and social media analysis could benefit from fine-grained sets (e.g., separating anger from annoyance) than only those that represent basic categories as they have been proposed by Paul Ekman (anger, disgust, fear, joy, surprise, sadness). This renders the task an interesting field for zero-shot classifications, in which the label set is not known at model development time. Unfortunately, most resources for emotion analysis are English, and therefore, most studies on emotion analysis have been performed in English, including those that involve prompting language models for text labels. This leaves us with a research gap that we address in this paper: In which language should we prompt for emotion labels on non-English texts? This is particularly of interest when we have access to a multilingual large language model, because we could request labels with English prompts even for non-English data. Our experiments with natural language inference-based language models show that it is consistently better to use English prompts even if the data is in a different language. △ Less

Submitted 7 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: published at the PromptEng workshop at TheWebConf

arXiv:2308.01982 [pdf, other]

Predicting Ki67, ER, PR, and HER2 Statuses from H&E-stained Breast Cancer Images

Authors: Amir Akbarnejad, Nilanjan Ray, Penny J. Barnes, Gilbert Bigras

Abstract: Despite the advances in machine learning and digital pathology, it is not yet clear if machine learning methods can accurately predict molecular information merely from histomorphology. In a quest to answer this question, we built a large-scale dataset (185538 images) with reliable measurements for Ki67, ER, PR, and HER2 statuses. The dataset is composed of mirrored images of H\&E and correspondin… ▽ More Despite the advances in machine learning and digital pathology, it is not yet clear if machine learning methods can accurately predict molecular information merely from histomorphology. In a quest to answer this question, we built a large-scale dataset (185538 images) with reliable measurements for Ki67, ER, PR, and HER2 statuses. The dataset is composed of mirrored images of H\&E and corresponding images of immunohistochemistry (IHC) assays (Ki67, ER, PR, and HER2. These images are mirrored through registration. To increase reliability, individual pairs were inspected and discarded if artifacts were present (tissue folding, bubbles, etc). Measurements for Ki67, ER and PR were determined by calculating H-Score from image analysis. HER2 measurement is based on binary classification: 0 and 1+ (IHC scores representing a negative subset) vs 3+ (IHC score positive subset). Cases with IHC equivocal score (2+) were excluded. We show that a standard ViT-based pipeline can achieve prediction performances around 90% in terms of Area Under the Curve (AUC) when trained with a proper labeling protocol. Finally, we shed light on the ability of the trained classifiers to localize relevant regions, which encourages future work to improve the localizations. Our proposed dataset is publicly available: https://ihc4bc.github.io/ △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2301.12872 [pdf, other]

doi 10.1051/0004-6361/202245092

A Machine Learning approach for correcting radial velocities using physical observables

Authors: M. Perger, G. Anglada-Escudé, D. Baroch, M. Lafarga, I. Ribas, J. C. Morales, E. Herrero, P. J. Amado, J. R. Barnes, J. A. Caballero, S. V. Jeffers, A. Quirrenbach, A. Reiners

Abstract: Precision radial velocity (RV) measurements continue to be a key tool to detect and characterise extrasolar planets. While instrumental precision keeps improving, stellar activity remains a barrier to obtain reliable measurements below 1-2 m/s accuracy. Using simulations and real data, we investigate the capabilities of a Deep Neural Network approach to produce activity free Doppler measurements o… ▽ More Precision radial velocity (RV) measurements continue to be a key tool to detect and characterise extrasolar planets. While instrumental precision keeps improving, stellar activity remains a barrier to obtain reliable measurements below 1-2 m/s accuracy. Using simulations and real data, we investigate the capabilities of a Deep Neural Network approach to produce activity free Doppler measurements of stars. As case studies we use observations of two known stars (Eps Eridani and AUMicroscopii), both with clear signals of activity induced RV variability. Synthetic data using the starsim code are generated for the observables (inputs) and the resulting RV signal (labels), and used to train a Deep Neural Network algorithm. We identify an architecture consisting of convolutional and fully connected layers that is adequate to the task. The indices investigated are mean line-profile parameters (width, bisector, contrast) and multi-band photometry. We demonstrate that the RV-independent approach can drastically reduce spurious Doppler variability from known physical effects such as spots, rotation and convective blueshift. We identify the combinations of activity indices with most predictive power. When applied to real observations, we observe a good match of the correction with the observed variability, but we also find that the noise reduction is not as good as in the simulations, probably due to the lack of detail in the simulated physics. We demonstrate that a model-driven machine learning approach is sufficient to clean Doppler signals from activity induced variability for well known physical effects. There are dozens of known activity related observables whose inversion power remains unexplored indicating that the use of additional indicators, more complete models, and more observations with optimised sampling strategies can lead to significant improvements in our detrending capabilities. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Journal ref: A&A 672, A118 (2023)

arXiv:2210.06150 [pdf, other]

Annotating Norwegian Language Varieties on Twitter for Part-of-Speech

Authors: Petter Mæhlum, Andre Kåsen, Samia Touileb, Jeremy Barnes

Abstract: Norwegian Twitter data poses an interesting challenge for Natural Language Processing (NLP) tasks. These texts are difficult for models trained on standardized text in one of the two Norwegian written forms (Bokmål and Nynorsk), as they contain both the typical variation of social media text, as well as a large amount of dialectal variety. In this paper we present a novel Norwegian Twitter dataset… ▽ More Norwegian Twitter data poses an interesting challenge for Natural Language Processing (NLP) tasks. These texts are difficult for models trained on standardized text in one of the two Norwegian written forms (Bokmål and Nynorsk), as they contain both the typical variation of social media text, as well as a large amount of dialectal variety. In this paper we present a novel Norwegian Twitter dataset annotated with POS-tags. We show that models trained on Universal Dependency (UD) data perform worse when evaluated against this dataset, and that models trained on Bokmål generally perform better than those trained on Nynorsk. We also see that performance on dialectal tweets is comparable to the written standards for some models. Finally we perform a detailed analysis of the errors that models commonly make on this data. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted at the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (Vardial2022). Collocated with COLING2022

arXiv:2203.13209 [pdf, other]

Direct parsing to sentiment graphs

Authors: David Samuel, Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, Erik Velldal

Abstract: This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. We advance the state of the art on 4 out of 5 standard benchmark sets. We release the source code, models and predictions. This paper demonstrates how a graph-based semantic parser can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. We advance the state of the art on 4 out of 5 standard benchmark sets. We release the source code, models and predictions. △ Less

Submitted 26 April, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: Accepted to ACL 2022

arXiv:2105.14504 [pdf, other]

Structured Sentiment Analysis as Dependency Graph Parsing

Authors: Jeremy Barnes, Robin Kurtz, Stephan Oepen, Lilja Øvrelid, Erik Velldal

Abstract: Structured sentiment analysis attempts to extract full opinion tuples from a text, but over time this task has been subdivided into smaller and smaller sub-tasks, e,g,, target extraction or targeted polarity classification. We argue that this division has become counterproductive and propose a new unified framework to remedy the situation. We cast the structured sentiment problem as dependency gra… ▽ More Structured sentiment analysis attempts to extract full opinion tuples from a text, but over time this task has been subdivided into smaller and smaller sub-tasks, e,g,, target extraction or targeted polarity classification. We argue that this division has become counterproductive and propose a new unified framework to remedy the situation. We cast the structured sentiment problem as dependency graph parsing, where the nodes are spans of sentiment holders, targets and expressions, and the arcs are the relations between them. We perform experiments on five datasets in four languages (English, Norwegian, Basque, and Catalan) and show that this approach leads to strong improvements over state-of-the-art baselines. Our analysis shows that refining the sentiment graphs with syntactic dependency information further improves results. △ Less

Submitted 30 May, 2021; originally announced May 2021.

Comments: Accepted at ACL-IJCNLP 2021

arXiv:2105.07400 [pdf, other]

The interplay between language similarity and script on a novel multi-layer Algerian dialect corpus

Authors: Samia Touileb, Jeremy Barnes

Abstract: Recent years have seen a rise in interest for cross-lingual transfer between languages with similar typology, and between languages of various scripts. However, the interplay between language similarity and difference in script on cross-lingual transfer is a less studied problem. We explore this interplay on cross-lingual transfer for two supervised tasks, namely part-of-speech tagging and sentime… ▽ More Recent years have seen a rise in interest for cross-lingual transfer between languages with similar typology, and between languages of various scripts. However, the interplay between language similarity and difference in script on cross-lingual transfer is a less studied problem. We explore this interplay on cross-lingual transfer for two supervised tasks, namely part-of-speech tagging and sentiment analysis. We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts, as well as annotations for sentiment and topic categories. We perform baseline experiments by fine-tuning multi-lingual language models. We further explore the effect of script vs. language similarity in cross-lingual transfer by fine-tuning multi-lingual models on languages which are a) typologically distinct, but use the same script, b) typologically similar, but use a distinct script, or c) are typologically similar and use the same script. We find there is a delicate relationship between script and typology for part-of-speech, while sentiment analysis is less sensitive. △ Less

Submitted 31 May, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

Comments: Accepted at Findings of ACL: ACL2021

arXiv:2104.09683 [pdf, other]

doi 10.18653/v1/2021.acl-demo.40

skweak: Weak Supervision Made Easy for NLP

Authors: Pierre Lison, Jeremy Barnes, Aliaksandr Hubin

Abstract: We present skweak, a versatile, Python-based software toolkit enabling NLP developers to apply weak supervision to a wide range of NLP tasks. Weak supervision is an emerging machine learning paradigm based on a simple idea: instead of labelling data points by hand, we use labelling functions derived from domain knowledge to automatically obtain annotations for a given dataset. The resulting labels… ▽ More We present skweak, a versatile, Python-based software toolkit enabling NLP developers to apply weak supervision to a wide range of NLP tasks. Weak supervision is an emerging machine learning paradigm based on a simple idea: instead of labelling data points by hand, we use labelling functions derived from domain knowledge to automatically obtain annotations for a given dataset. The resulting labels are then aggregated with a generative model that estimates the accuracy (and possible confusions) of each labelling function. The skweak toolkit makes it easy to implement a large spectrum of labelling functions (such as heuristics, gazetteers, neural models or linguistic constraints) on text data, apply them on a corpus, and aggregate their results in a fully unsupervised fashion. skweak is especially designed to facilitate the use of weak supervision for NLP tasks such as text classification and sequence labelling. We illustrate the use of skweak for NER and sentiment analysis. skweak is released under an open-source license and is available at: https://github.com/NorskRegnesentral/skweak △ Less

Submitted 19 April, 2021; originally announced April 2021.

arXiv:2104.08281 [pdf, other]

doi 10.1029/2021MS002573

Controlled abstention neural networks for identifying skillful predictions for classification problems

Authors: Elizabeth A. Barnes, Randal J. Barnes

Abstract: The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed "forecasts of opportunity." When these opportunities are not present, scientists need prediction systems that a… ▽ More The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed "forecasts of opportunity." When these opportunities are not present, scientists need prediction systems that are capable of saying "I don't know." We introduce a novel loss function, termed the "NotWrong loss", that allows neural networks to identify forecasts of opportunity for classification problems. The NotWrong loss introduces an abstention class that allows the network to identify the more confident samples and abstain (say "I don't know") on the less confident samples. The abstention loss is designed to abstain on a user-defined fraction of the samples via a PID controller. Unlike many machine learning methods used to reject samples post-training, the NotWrong loss is applied during training to preferentially learn from the more confident samples. We show that the NotWrong loss outperforms other existing loss functions for multiple climate use cases. The implementation of the proposed loss function is straightforward in most network architectures designed for classification as it only requires the addition of an abstention class to the output layer and modification of the loss function. △ Less

Submitted 16 April, 2021; originally announced April 2021.

Comments: submitted to the Journal of Advances in Earth System Modeling. arXiv admin note: substantial text overlap with arXiv:2104.08236

arXiv:2104.08236 [pdf, other]

doi 10.1029/2021MS002575

Controlled abstention neural networks for identifying skillful predictions for regression problems

Authors: Elizabeth A. Barnes, Randal J. Barnes

Abstract: The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed "forecasts of opportunity". When these opportunities are not present, scientists need prediction systems that a… ▽ More The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed "forecasts of opportunity". When these opportunities are not present, scientists need prediction systems that are capable of saying "I don't know." We introduce a novel loss function, termed "abstention loss", that allows neural networks to identify forecasts of opportunity for regression problems. The abstention loss works by incorporating uncertainty in the network's prediction to identify the more confident samples and abstain (say "I don't know") on the less confident samples. The abstention loss is designed to determine the optimal abstention fraction, or abstain on a user-defined fraction via a PID controller. Unlike many methods for attaching uncertainty to neural network predictions post-training, the abstention loss is applied during training to preferentially learn from the more confident samples. The abstention loss is built upon a standard computer science method. While the standard approach is itself a simple yet powerful tool for incorporating uncertainty in regression problems, we demonstrate that the abstention loss outperforms this more standard method for the synthetic climate use cases explored here. The implementation of proposed loss function is straightforward in most network architectures designed for regression, as it only requires modification of the output layer and loss function. △ Less

Submitted 16 April, 2021; originally announced April 2021.

Comments: submitted to the Journal of Advances of Earth System Modeling

arXiv:2104.06546 [pdf, other]

Large-Scale Contextualised Language Modelling for Norwegian

Authors: Andrey Kutuzov, Jeremy Barnes, Erik Velldal, Lilja Øvrelid, Stephan Oepen

Abstract: We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo an… ▽ More We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see http://norlm.nlpl.eu △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: Accepted to NoDaLiDa'2021

arXiv:2104.04989 [pdf, other]

NorDial: A Preliminary Corpus of Written Norwegian Dialect Use

Authors: Jeremy Barnes, Petter Mæhlum, Samia Touileb

Abstract: Norway has a large amount of dialectal variation, as well as a general tolerance to its use in the public sphere. There are, however, few available resources to study this variation and its change over time and in more informal areas, \eg on social media. In this paper, we propose a first step to creating a corpus of dialectal variation of written Norwegian. We collect a small corpus of tweets and… ▽ More Norway has a large amount of dialectal variation, as well as a general tolerance to its use in the public sphere. There are, however, few available resources to study this variation and its change over time and in more informal areas, \eg on social media. In this paper, we propose a first step to creating a corpus of dialectal variation of written Norwegian. We collect a small corpus of tweets and manually annotate them as Bokmål, Nynorsk, any dialect, or a mix. We further perform preliminary experiments with state-of-the-art models, as well as an analysis of the data to expand this corpus in the future. Finally, we make the annotations and models available for future work. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: Accepted to NoDaLiDa 2021

arXiv:2102.00299 [pdf, other]

If you've got it, flaunt it: Making the most of fine-grained sentiment annotations

Authors: Jeremy Barnes, Lilja Øvrelid, Erik Velldal

Abstract: Fine-grained sentiment analysis attempts to extract sentiment holders, targets and polar expressions and resolve the relationship between them, but progress has been hampered by the difficulty of annotation. Targeted sentiment analysis, on the other hand, is a more narrow task, focusing on extracting sentiment targets and classifying their polarity.In this paper, we explore whether incorporating h… ▽ More Fine-grained sentiment analysis attempts to extract sentiment holders, targets and polar expressions and resolve the relationship between them, but progress has been hampered by the difficulty of annotation. Targeted sentiment analysis, on the other hand, is a more narrow task, focusing on extracting sentiment targets and classifying their polarity.In this paper, we explore whether incorporating holder and expression information can improve target extraction and classification and perform experiments on eight English datasets. We conclude that jointly predicting target and polarity BIO labels improves target extraction, and that augmenting the input text with gold expressions generally improves targeted polarity classification. This highlights the potential importance of annotating expressions for fine-grained sentiment datasets. At the same time, our results show that performance of current models for predicting polar expressions is poor, hampering the benefit of this information in practice. △ Less

Submitted 30 January, 2021; originally announced February 2021.

Comments: To appear in EACL 2021

arXiv:2010.08318 [pdf, other]

Multi-task Learning of Negation and Speculation for Targeted Sentiment Classification

Authors: Andrew Moore, Jeremy Barnes

Abstract: The majority of work in targeted sentiment analysis has concentrated on finding better methods to improve the overall results. Within this paper we show that these models are not robust to linguistic phenomena, specifically negation and speculation. In this paper, we propose a multi-task learning method to incorporate information from syntactic and semantic auxiliary tasks, including negation and… ▽ More The majority of work in targeted sentiment analysis has concentrated on finding better methods to improve the overall results. Within this paper we show that these models are not robust to linguistic phenomena, specifically negation and speculation. In this paper, we propose a multi-task learning method to incorporate information from syntactic and semantic auxiliary tasks, including negation and speculation scope detection, to create English-language models that are more robust to these phenomena. Further we create two challenge datasets to evaluate model performance on negated and speculative samples. We find that multi-task models and transfer learning via language modelling can improve performance on these challenge datasets, but the overall performances indicate that there is still much room for improvement. We release both the datasets and the source code at https://github.com/jerbarnes/multitask_negation_for_targeted_sentiment. △ Less

Submitted 31 March, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: To appear at NAACL 2021 (long)

arXiv:2004.14723 [pdf, other]

Named Entity Recognition without Labelled Data: A Weak Supervision Approach

Authors: Pierre Lison, Aliaksandr Hubin, Jeremy Barnes, Samia Touileb

Abstract: Named Entity Recognition (NER) performance often degrades rapidly when applied to target domains that differ from the texts observed during training. When in-domain labelled data is available, transfer learning techniques can be used to adapt existing NER models to the target domain. But what should one do when there is no hand-labelled data for the target domain? This paper presents a simple but… ▽ More Named Entity Recognition (NER) performance often degrades rapidly when applied to target domains that differ from the texts observed during training. When in-domain labelled data is available, transfer learning techniques can be used to adapt existing NER models to the target domain. But what should one do when there is no hand-labelled data for the target domain? This paper presents a simple but powerful approach to learn NER models in the absence of labelled data through weak supervision. The approach relies on a broad spectrum of labelling functions to automatically annotate texts from the target domain. These annotations are then merged together using a hidden Markov model which captures the varying accuracies and confusions of the labelling functions. A sequence labelling model can finally be trained on the basis of this unified annotation. We evaluate the approach on two English datasets (CoNLL 2003 and news articles from Reuters and Bloomberg) and demonstrate an improvement of about 7 percentage points in entity-level $F_1$ scores compared to an out-of-domain neural NER model. △ Less

Submitted 30 April, 2020; originally announced April 2020.

Comments: Accepted to ACL 2020 (long paper)

arXiv:2004.04103 [pdf, ps, other]

Cross-lingual Emotion Intensity Prediction

Authors: Irean Navas Alejo, Toni Badia, Jeremy Barnes

Abstract: Emotion intensity prediction determines the degree or intensity of an emotion that the author expresses in a text, extending previous categorical approaches to emotion detection. While most previous work on this topic has concentrated on English texts, other languages would also benefit from fine-grained emotion classification, preferably without having to recreate the amount of annotated data ava… ▽ More Emotion intensity prediction determines the degree or intensity of an emotion that the author expresses in a text, extending previous categorical approaches to emotion detection. While most previous work on this topic has concentrated on English texts, other languages would also benefit from fine-grained emotion classification, preferably without having to recreate the amount of annotated data available in English in each new language. Consequently, we explore cross-lingual transfer approaches for fine-grained emotion detection in Spanish and Catalan tweets. To this end we annotate a test set of Spanish and Catalan tweets using Best-Worst scaling. We compare six cross-lingual approaches, e.g., machine translation and cross-lingual embeddings, which have varying requirements for parallel data -- from millions of parallel sentences to completely unsupervised. The results show that on this data, methods with low parallel-data requirements perform surprisingly better than methods that use more parallel data, which we explain through an in-depth error analysis. We make the dataset and the code available at \url{https://github.com/jerbarnes/fine-grained_cross-lingual_emotion} △ Less

Submitted 24 November, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

Comments: Accepted in PEOPLES 2020 Workshop

arXiv:2002.08131 [pdf, other]

A Systematic Comparison of Architectures for Document-Level Sentiment Classification

Authors: Jeremy Barnes, Vinit Ravishankar, Lilja Øvrelid, Erik Velldal

Abstract: Documents are composed of smaller pieces - paragraphs, sentences, and tokens - that have complex relationships between one another. Sentiment classification models that take into account the structure inherent in these documents have a theoretical advantage over those that do not. At the same time, transfer learning models based on language model pretraining have shown promise for document classif… ▽ More Documents are composed of smaller pieces - paragraphs, sentences, and tokens - that have complex relationships between one another. Sentiment classification models that take into account the structure inherent in these documents have a theoretical advantage over those that do not. At the same time, transfer learning models based on language model pretraining have shown promise for document classification. However, these two paradigms have not been systematically compared and it is not clear under which circumstances one approach is better than the other. In this work we empirically compare hierarchical models and transfer learning for document-level sentiment classification. We show that non-trivial hierarchical models outperform previous baselines and transfer learning on document-level sentiment classification in five languages. △ Less

Submitted 2 February, 2022; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: 5 pages, 2 figures

arXiv:1911.12722 [pdf, other]

A Fine-Grained Sentiment Dataset for Norwegian

Authors: Lilja Øvrelid, Petter Mæhlum, Jeremy Barnes, Erik Velldal

Abstract: We introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed des… ▽ More We introduce NoReC_fine, a dataset for fine-grained sentiment analysis in Norwegian, annotated with respect to polar expressions, targets and holders of opinion. The underlying texts are taken from a corpus of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. We here present a detailed description of this annotation effort. We provide an overview of the developed annotation guidelines, illustrated with examples, and present an analysis of inter-annotator agreement. We also report the first experimental results on the dataset, intended as a preliminary benchmark for further experiments. △ Less

Submitted 6 April, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

Comments: Accepted for LREC 2020

arXiv:1909.10576 [pdf, other]

The Potential Short- and Long-Term Disruptions and Transformative Impacts of 5G and Beyond Wireless Networks: Lessons Learnt from the Development of a 5G Testbed Environment

Authors: Mohmammad N. Patwary, Syed Junaid Nawaz, Md. Abdur Rahman, Shree Krishna Sharma, Md Mamunur Rashid, Stuart J. Barnes

Abstract: The anticipated deployment cost of 5G communication networks in the UK is predicted to be in between £30bn- £50bn, whereas the current annual capital expenditure of the mobile network operators (MNOs) is £2.5bn. This prospect has vastly impacted and has become one of the major delaying factors for building the 5G physical infrastructure, whereas other areas of 5G developments are progressing at th… ▽ More The anticipated deployment cost of 5G communication networks in the UK is predicted to be in between £30bn- £50bn, whereas the current annual capital expenditure of the mobile network operators (MNOs) is £2.5bn. This prospect has vastly impacted and has become one of the major delaying factors for building the 5G physical infrastructure, whereas other areas of 5G developments are progressing at their speed. In this paper, an extensive study is conducted to explore the possibilities of reducing the 5G deployment cost and developing business models. This study suggests that the use of existing public infrastructure has a great potential to contribute to a reduction of about 40% to 60% in the anticipated cost. Also, the recent Ofcom initiative of location-based licensing of radio spectrum is reviewed. Our study suggests that simplification of infrastructure and spectrum will encourage the exponential growth of scenario-specific cellular networks and will potentially disrupt the current business models of telecommunication stakeholders -- specifically MNOs and TowerCos. Moreover, the dense network connectivity will encourage extensive data harvesting as a business opportunity and function within small and medium-sized enterprises as well as within large social networks. Consequently, the rise of new infrastructures and spectrum stakeholders is anticipated. This will fuel the development of a 5G data exchange ecosystem where data transactions are deemed to be high-value business commodities. Also, a data economy-oriented business model is proposed. The study found that with the potential commodification of data and data transactions along with the low-cost physical infrastructure and spectrum, the 5G network will introduce significant disruption in the Telco business ecosystem. △ Less

Submitted 31 May, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

Comments: 22 pages, 9 figures, 11 tables

arXiv:1906.10519 [pdf, other]

Embedding Projection for Targeted Cross-Lingual Sentiment: Model Comparisons and a Real-World Study

Authors: Jeremy Barnes, Roman Klinger

Abstract: Sentiment analysis benefits from large, hand-annotated resources in order to train and test machine learning models, which are often data hungry. While some languages, e.g., English, have a vast array of these resources, most under-resourced languages do not, especially for fine-grained sentiment tasks, such as aspect-level or targeted sentiment analysis. To improve this situation, we propose a cr… ▽ More Sentiment analysis benefits from large, hand-annotated resources in order to train and test machine learning models, which are often data hungry. While some languages, e.g., English, have a vast array of these resources, most under-resourced languages do not, especially for fine-grained sentiment tasks, such as aspect-level or targeted sentiment analysis. To improve this situation, we propose a cross-lingual approach to sentiment analysis that is applicable to under-resourced languages and takes into account target-level information. This model incorporates sentiment information into bilingual distributional representations, by jointly optimizing them for semantics and sentiment, showing state-of-the-art performance at sentence-level when combined with machine translation. The adaptation to targeted sentiment analysis on multiple domains shows that our model outperforms other projection-based bilingual embedding methods on binary targeted sentiment tasks. Our analysis on ten languages demonstrates that the amount of unlabeled monolingual data has surprisingly little effect on the sentiment results. As expected, the choice of annotated source language for projection to a target leads to better results for source-target language pairs which are similar. Therefore, our results suggest that more efforts should be spent on the creation of resources for less similar languages to those which are resource-rich already. Finally, a domain mismatch leads to a decreased performance. This suggests resources in any language should ideally cover varieties of domains. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: Submitted to Journal of Artificial Intelligence Research (41 pages, 51 with references). arXiv admin note: text overlap with arXiv:1805.09016

arXiv:1906.07610 [pdf, other]

doi 10.1017/S1351324920000510

Improving Sentiment Analysis with Multi-task Learning of Negation

Authors: Jeremy Barnes, Erik Velldal, Lilja Øvrelid

Abstract: Sentiment analysis is directly affected by compositional phenomena in language that act on the prior polarity of the words and phrases found in the text. Negation is the most prevalent of these phenomena and in order to correctly predict sentiment, a classifier must be able to identify negation and disentangle the effect that its scope has on the final polarity of a text. This paper proposes a mul… ▽ More Sentiment analysis is directly affected by compositional phenomena in language that act on the prior polarity of the words and phrases found in the text. Negation is the most prevalent of these phenomena and in order to correctly predict sentiment, a classifier must be able to identify negation and disentangle the effect that its scope has on the final polarity of a text. This paper proposes a multi-task approach to explicitly incorporate information about negation in sentiment analysis, which we show outperforms learning negation implicitly in a data-driven manner. We describe our approach, a cascading neural architecture with selective sharing of LSTM layers, and show that explicitly training the model with negation as an auxiliary task helps improve the main task of sentiment analysis. The effect is demonstrated across several different standard English-language data sets for both tasks and we analyze several aspects of our system related to its performance, varying types and amounts of input data and different multi-task setups. △ Less

Submitted 1 October, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: Under submission for Journal of Natural Language Engineering special issue on Negation. 30 pages with references

Journal ref: Nat. Lang. Eng. 27 (2021) 249-269

arXiv:1906.07599 [pdf, other]

LTG-Oslo Hierarchical Multi-task Network: The importance of negation for document-level sentiment in Spanish

Authors: Jeremy Barnes

Abstract: This paper details LTG-Oslo team's participation in the sentiment track of the NEGES 2019 evaluation campaign. We participated in the task with a hierarchical multi-task network, which used shared lower-layers in a deep BiLSTM to predict negation, while the higher layers were dedicated to predicting document-level sentiment. The multi-task component shows promise as a way to incorporate informatio… ▽ More This paper details LTG-Oslo team's participation in the sentiment track of the NEGES 2019 evaluation campaign. We participated in the task with a hierarchical multi-task network, which used shared lower-layers in a deep BiLSTM to predict negation, while the higher layers were dedicated to predicting document-level sentiment. The multi-task component shows promise as a way to incorporate information on negation into deep neural sentiment classifiers, despite the fact that the absolute results on the test set were relatively low for a binary classification task. △ Less

Submitted 18 June, 2019; originally announced June 2019.

Comments: Accepted in NEGES (Negation in Spanish) workshop at SEPLN 2019

arXiv:1906.05889 [pdf, ps, other]

On the Effect of Word Order on Cross-lingual Sentiment Analysis

Authors: Àlex R. Atrio, Toni Badia, Jeremy Barnes

Abstract: Current state-of-the-art models for sentiment analysis make use of word order either explicitly by pre-training on a language modeling objective or implicitly by using recurrent neural networks (RNNs) or convolutional networks (CNNs). This is a problem for cross-lingual models that use bilingual embeddings as features, as the difference in word order between source and target languages is not reso… ▽ More Current state-of-the-art models for sentiment analysis make use of word order either explicitly by pre-training on a language modeling objective or implicitly by using recurrent neural networks (RNNs) or convolutional networks (CNNs). This is a problem for cross-lingual models that use bilingual embeddings as features, as the difference in word order between source and target languages is not resolved. In this work, we explore reordering as a pre-processing step for sentence-level cross-lingual sentiment classification with two language combinations (English-Spanish, English-Catalan). We find that while reordering helps both models, CNNS are more sensitive to local reorderings, while global reordering benefits RNNs. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Comments: Accepted to SEPLN 2019

arXiv:1906.05887 [pdf, other]

Sentiment analysis is not solved! Assessing and probing sentiment classification

Authors: Jeremy Barnes, Lilja Øvrelid, Erik Velldal

Abstract: Neural methods for SA have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English a… ▽ More Neural methods for SA have led to quantitative improvements over previous approaches, but these advances are not always accompanied with a thorough analysis of the qualitative differences. Therefore, it is not clear what outstanding conceptual challenges for sentiment analysis remain. In this work, we attempt to discover what challenges still prove a problem for sentiment classifiers for English and to provide a challenging dataset. We collect the subset of sentences that an (oracle) ensemble of state-of-the-art sentiment classifiers misclassify and then annotate them for 18 linguistic and paralinguistic phenomena, such as negation, sarcasm, modality, etc. The dataset is available at https://github.com/ltgoslo/assessing_and_probing_sentiment. Finally, we provide a case study that demonstrates the usefulness of the dataset to probe the performance of a given sentiment classifier with respect to linguistic phenomena. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Comments: Accepted to BlackBoxNLP Workshop at ACL 2019

arXiv:1806.04381 [pdf, other]

Projecting Embeddings for Domain Adaptation: Joint Modeling of Sentiment Analysis in Diverse Domains

Authors: Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde

Abstract: Domain adaptation for sentiment analysis is challenging due to the fact that supervised classifiers are very sensitive to changes in domain. The two most prominent approaches to this problem are structural correspondence learning and autoencoders. However, they either require long training times or suffer greatly on highly divergent domains. Inspired by recent advances in cross-lingual sentiment a… ▽ More Domain adaptation for sentiment analysis is challenging due to the fact that supervised classifiers are very sensitive to changes in domain. The two most prominent approaches to this problem are structural correspondence learning and autoencoders. However, they either require long training times or suffer greatly on highly divergent domains. Inspired by recent advances in cross-lingual sentiment analysis, we provide a novel perspective and cast the domain adaptation problem as an embedding projection task. Our model takes as input two mono-domain embedding spaces and learns to project them to a bi-domain space, which is jointly optimized to (1) project across domains and to (2) predict sentiment. We perform domain adaptation experiments on 20 source-target domain pairs for sentiment classification and report novel state-of-the-art results on 11 domain pairs, including the Amazon domain adaptation datasets and SemEval 2013 and 2016 datasets. Our analysis shows that our model performs comparably to state-of-the-art approaches on domains that are similar, while performing significantly better on highly divergent domains. Our code is available at https://github.com/jbarnesspain/domain_blse △ Less

Submitted 13 June, 2018; v1 submitted 12 June, 2018; originally announced June 2018.

Comments: Accepted to COLING 2018

arXiv:1805.09016 [pdf, other]

Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages

Authors: Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde

Abstract: Sentiment analysis in low-resource languages suffers from a lack of annotated corpora to estimate high-performing models. Machine translation and bilingual word embeddings provide some relief through cross-lingual sentiment approaches. However, they either require large amounts of parallel data or do not sufficiently capture sentiment information. We introduce Bilingual Sentiment Embeddings (BLSE)… ▽ More Sentiment analysis in low-resource languages suffers from a lack of annotated corpora to estimate high-performing models. Machine translation and bilingual word embeddings provide some relief through cross-lingual sentiment approaches. However, they either require large amounts of parallel data or do not sufficiently capture sentiment information. We introduce Bilingual Sentiment Embeddings (BLSE), which jointly represent sentiment information in a source and target language. This model only requires a small bilingual lexicon, a source-language corpus annotated for sentiment, and monolingual word embeddings for each language. We perform experiments on three language combinations (Spanish, Catalan, Basque) for sentence-level cross-lingual sentiment classification and find that our model significantly outperforms state-of-the-art methods on four out of six experimental setups, as well as capturing complementary information to machine translation. Our analysis of the resulting embedding space provides evidence that it represents sentiment information in the resource-poor target language without any annotated data in that language. △ Less

Submitted 23 May, 2018; originally announced May 2018.

Comments: Accepted to ACL 2018 (Long Papers)

arXiv:1803.08614 [pdf, ps, other]

MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification

Authors: Jeremy Barnes, Patrik Lambert, Toni Badia

Abstract: While sentiment analysis has become an established field in the NLP community, research into languages other than English has been hindered by the lack of resources. Although much research in multi-lingual and cross-lingual sentiment analysis has focused on unsupervised or semi-supervised approaches, these still require a large number of resources and do not reach the performance of supervised app… ▽ More While sentiment analysis has become an established field in the NLP community, research into languages other than English has been hindered by the lack of resources. Although much research in multi-lingual and cross-lingual sentiment analysis has focused on unsupervised or semi-supervised approaches, these still require a large number of resources and do not reach the performance of supervised approaches. With this in mind, we introduce two datasets for supervised aspect-level sentiment analysis in Basque and Catalan, both of which are under-resourced languages. We provide high-quality annotations and benchmarks with the hope that they will be useful to the growing community of researchers working on these languages. △ Less

Submitted 22 March, 2018; originally announced March 2018.

Comments: Accepted at LREC 2018

arXiv:1709.04219 [pdf, other]

Assessing State-of-the-Art Sentiment Models on State-of-the-Art Sentiment Datasets

Authors: Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde

Abstract: There has been a good amount of progress in sentiment analysis over the past 10 years, including the proposal of new methods and the creation of benchmark datasets. In some papers, however, there is a tendency to compare models only on one or two datasets, either because of time restraints or because the model is tailored to a specific task. Accordingly, it is hard to understand how well a certain… ▽ More There has been a good amount of progress in sentiment analysis over the past 10 years, including the proposal of new methods and the creation of benchmark datasets. In some papers, however, there is a tendency to compare models only on one or two datasets, either because of time restraints or because the model is tailored to a specific task. Accordingly, it is hard to understand how well a certain model generalizes across different tasks and datasets. In this paper, we contribute to this situation by comparing several models on six different benchmarks, which belong to different domains and additionally have different levels of granularity (binary, 3-class, 4-class and 5-class). We show that Bi-LSTMs perform well across datasets and that both LSTMs and Bi-LSTMs are particularly good at fine-grained sentiment tasks (i. e., with more than two classes). Incorporating sentiment information into word embeddings during training gives good results for datasets that are lexically similar to the training data. With our experiments, we contribute to a better understanding of the performance of different model architectures on different data sets. Consequently, we detect novel state-of-the-art results on the SenTube datasets. △ Less

Submitted 13 September, 2017; originally announced September 2017.

Comments: Presented at WASSA 2017

Journal ref: In Proceedings of WASSA (2017). 2 - 12

Showing 1–29 of 29 results for author: Barnes, J