Zum Hauptinhalt springen

Showing 1–50 of 81 results for author: Zubiaga, A

.
  1. arXiv:2407.01293  [pdf, other

    cs.SI

    Applying the Ego Network Model to Cross-Target Stance Detection

    Authors: Jack Tacchi, Parisa Jamadi Khiabani, Arkaitz Zubiaga, Chiara Boldrini, Andrea Passarella

    Abstract: Understanding human interactions and social structures is an incredibly important task, especially in such an interconnected world. One task that facilitates this is Stance Detection, which predicts the opinion or attitude of a text towards a target entity. Traditionally, this has often been done mainly via the use of text-based approaches, however, recent work has produced a model (CT-TN) that le… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ASONAM 2024

  2. arXiv:2406.18297  [pdf, other

    cs.CL

    FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning

    Authors: Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga

    Abstract: The rapid dissemination of information through social media and the Internet has posed a significant challenge for fact-checking, among others in identifying check-worthy claims that fact-checkers should pay attention to, i.e. filtering claims needing fact-checking from a large pool of sentences. This challenge has stressed the need to focus on determining the priority of claims, specifically whic… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.08201  [pdf, other

    cs.SI

    HTIM: Hybrid Text-Interaction Modeling for Broadening Political Leaning Inference in Social Media

    Authors: Joseba Fernandez de Landa, Arkaitz Zubiaga, Rodrigo Agerri

    Abstract: Political leaning can be defined as the inclination of an individual towards certain political orientations that align with their personal beliefs. Political leaning inference has traditionally been framed as a binary classification problem, namely, to distinguish between left vs. right or conservative vs liberal. Furthermore, although some recent work considers political leaning inference in a mu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2404.14339  [pdf, other

    cs.CL

    Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation

    Authors: Bharathi A, Arkaitz Zubiaga

    Abstract: Stance detection has been widely studied as the task of determining if a social media post is positive, negative or neutral towards a specific issue, such as support towards vaccines. Research in stance detection has however often been limited to a single language and, where more than one language has been studied, research has focused on few-shot settings, overlooking the challenges of developing… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  5. arXiv:2403.05216  [pdf, other

    cs.CL cs.SI

    SocialPET: Socially Informed Pattern Exploiting Training for Few-Shot Stance Detection in Social Media

    Authors: Parisa Jamadi Khiabani, Arkaitz Zubiaga

    Abstract: Stance detection, as the task of determining the viewpoint of a social media post towards a target as 'favor' or 'against', has been understudied in the challenging yet realistic scenario where there is limited labeled data for a certain target. Our work advances research in few-shot stance detection by introducing SocialPET, a socially informed approach to leveraging language models for the task.… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  6. arXiv:2402.16458  [pdf, other

    cs.CL

    ID-XCB: Data-independent Debiasing for Fair and Accurate Transformer-based Cyberbullying Detection

    Authors: Peiling Yi, Arkaitz Zubiaga

    Abstract: Swear words are a common proxy to collect datasets with cyberbullying incidents. Our focus is on measuring and mitigating biases derived from spurious associations between swear words and incidents occurring as a result of such data collection strategies. After demonstrating and quantifying these biases, we introduce ID-XCB, the first data-independent debiasing technique that combines adversarial… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  7. arXiv:2401.16282  [pdf, other

    cs.CL cs.AI cs.LG

    MAPLE: Micro Analysis of Pairwise Language Evolution for Few-Shot Claim Verification

    Authors: Xia Zeng, Arkaitz Zubiaga

    Abstract: Claim verification is an essential step in the automated fact-checking pipeline which assesses the veracity of a claim against a piece of evidence. In this work, we explore the potential of few-shot claim verification, where only very limited data is available for supervision. We propose MAPLE (Micro Analysis of Pairwise Language Evolution), a pioneering approach that explores the alignment betwee… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: accepted by EACL Findings 2024

  8. arXiv:2401.11972  [pdf, other

    cs.CL

    Synergizing Machine Learning & Symbolic Methods: A Survey on Hybrid Approaches to Natural Language Processing

    Authors: Rrubaa Panchendrarajan, Arkaitz Zubiaga

    Abstract: The advancement of machine learning and symbolic approaches have underscored their strengths and weaknesses in Natural Language Processing (NLP). While machine learning approaches are powerful in identifying patterns in data, they often fall short in learning commonsense and the factual knowledge required for the NLP tasks. Meanwhile, the symbolic methods excel in representing knowledge-rich data.… ▽ More

    Submitted 18 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Revised according to review comments

  9. Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research

    Authors: Rrubaa Panchendrarajan, Arkaitz Zubiaga

    Abstract: Automated fact-checking has drawn considerable attention over the past few decades due to the increase in the diffusion of misinformation on online platforms. This is often carried out as a sequence of tasks comprising (i) the detection of sentences circulating in online platforms which constitute claims needing verification, followed by (ii) the verification process of those claims. This survey f… ▽ More

    Submitted 18 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted revision

  10. arXiv:2401.09244  [pdf, other

    cs.CL

    Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges

    Authors: Aiqi Jiang, Arkaitz Zubiaga

    Abstract: The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 35 pages, 7 figures

  11. arXiv:2312.01738  [pdf, other

    cs.SI cs.CY

    Generalizing Political Leaning Inference to Multi-Party Systems: Insights from the UK Political Landscape

    Authors: Joseba Fernandez de Landa, Arkaitz Zubiaga, Rodrigo Agerri

    Abstract: An ability to infer the political leaning of social media users can help in gathering opinion polls thereby leading to a better understanding of public opinion. While there has been a body of research attempting to infer the political leaning of social media users, this has been typically simplified as a binary classification problem (e.g. left vs right) and has been limited to a single location,… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  12. arXiv:2310.04910  [pdf, other

    cs.CL cs.AI

    Faithful Knowledge Graph Explanations for Commonsense Reasoning

    Authors: Weihe Zhai, Arkaitz Zubiaga

    Abstract: The fusion of language models (LMs) and knowledge graphs (KGs) is widely used in commonsense question answering, but generating faithful explanations remains challenging. Current methods often overlook path decoding faithfulness, leading to divergence between graph encoder outputs and model predictions. We identify confounding effects and LM-KG misalignment as key factors causing spurious explanat… ▽ More

    Submitted 22 June, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

  13. arXiv:2305.02224  [pdf, ps, other

    cs.HC

    Some Observations on Fact-Checking Work with Implications for Computational Support

    Authors: Rob Procter, Miguel Arana-Catania, Yulan He, Maria Liakata, Arkaitz Zubiaga, Elena Kochkina, Runcong Zhao

    Abstract: Social media and user-generated content (UGC) have become increasingly important features of journalistic work in a number of different ways. However, the growth of misinformation means that news organisations have had devote more and more resources to determining its veracity and to publishing corrections if it is found to be misleading. In this work, we present the results of interviews with eig… ▽ More

    Submitted 6 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: 11 pages. International AAAI Conference on Web and Social Media, Mediate 2023: News Media and Computational Journalism Workshop

    ACM Class: H.1.2; H.5.2

  14. arXiv:2303.01241  [pdf, other

    cs.CL cs.LG

    PANACEA: An Automated Misinformation Detection System on COVID-19

    Authors: Runcong Zhao, Miguel Arana-Catania, Lixing Zhu, Elena Kochkina, Lin Gui, Arkaitz Zubiaga, Rob Procter, Maria Liakata, Yulan He

    Abstract: In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporti… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

  15. Cluster-based Deep Ensemble Learning for Emotion Classification in Internet Memes

    Authors: Xiaoyu Guo, Jing Ma, Arkaitz Zubiaga

    Abstract: Memes have gained popularity as a means to share visual ideas through the Internet and social media by mixing text, images and videos, often for humorous purposes. Research enabling automated analysis of memes has gained attention in recent years, including among others the task of classifying the emotion expressed in memes. In this paper, we propose a novel model, cluster-based deep ensemble lear… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  16. arXiv:2302.08326  [pdf, other

    cs.CL

    NUAA-QMUL-AIIT at Memotion 3: Multi-modal Fusion with Squeeze-and-Excitation for Internet Meme Emotion Analysis

    Authors: Xiaoyu Guo, Jing Ma, Arkaitz Zubiaga

    Abstract: This paper describes the participation of our NUAA-QMUL-AIIT team in the Memotion 3 shared task on meme emotion analysis. We propose a novel multi-modal fusion method, Squeeze-and-Excitation Fusion (SEFusion), and embed it into our system for emotion classification in memes. SEFusion is a simple fusion method that employs fully connected layers, reshaping, and matrix multiplication. SEFusion learn… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  17. arXiv:2301.04535  [pdf, other

    cs.CL cs.SI

    Few-shot Learning for Cross-Target Stance Detection by Aggregating Multimodal Embeddings

    Authors: Parisa Jamadi Khiabani, Arkaitz Zubiaga

    Abstract: Despite the increasing popularity of the stance detection task, existing approaches are predominantly limited to using the textual content of social media posts for the classification, overlooking the social nature of the task. The stance detection task becomes particularly challenging in cross-target classification scenarios, where even in few-shot training settings the model needs to predict the… ▽ More

    Submitted 31 March, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: To appear in IEEE Transactions on Computational Social Systems

  18. arXiv:2212.10405  [pdf, other

    cs.CL cs.SI

    AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection

    Authors: Wenjie Yin, Vibhor Agarwal, Aiqi Jiang, Arkaitz Zubiaga, Nishanth Sastry

    Abstract: Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integra… ▽ More

    Submitted 10 January, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: accepted at ICWSM 2023

    Journal ref: 17th International AAAI Conference on Web and Social Media (ICWSM 2023). Please cite accordingly

  19. arXiv:2212.08514  [pdf, other

    cs.CL

    Check-worthy Claim Detection across Topics for Automated Fact-checking

    Authors: Amani S. Abumansour, Arkaitz Zubiaga

    Abstract: An important component of an automated fact-checking system is the claim check-worthiness detection system, which ranks sentences by prioritising them based on their need to be checked. Despite a body of research tackling the task, previous research has overlooked the challenging nature of identifying check-worthy claims across different topics. In this paper, we assess and quantify the challenge… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  20. arXiv:2211.08447  [pdf, other

    cs.CL cs.SI

    SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media

    Authors: Aiqi Jiang, Arkaitz Zubiaga

    Abstract: The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting… ▽ More

    Submitted 30 March, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: accepted at ICWSM 2023

  21. arXiv:2208.08749  [pdf, other

    cs.CL cs.AI

    Active PETs: Active Data Annotation Prioritisation for Few-Shot Claim Verification with Pattern Exploiting Training

    Authors: Xia Zeng, Arkaitz Zubiaga

    Abstract: To mitigate the impact of the scarcity of labelled data on fact-checking systems, we focus on few-shot claim verification. Despite recent work on few-shot classification by proposing advanced language models, there is a dearth of research in data annotation prioritisation that improves the selection of the few shots to be labelled for optimal model performance. We propose Active PETs, a novel weig… ▽ More

    Submitted 11 October, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

  22. arXiv:2207.10639  [pdf

    cs.CL

    Session-based Cyberbullying Detection in Social Media: A Survey

    Authors: Peiling Yi, Arkaitz Zubiaga

    Abstract: Cyberbullying is a pervasive problem in online social media, where a bully abuses a victim through a social media session. By investigating cyberbullying perpetrated through social media sessions, recent research has looked into mining patterns and features for modeling and understanding the two defining characteristics of cyberbullying: repetitive behavior and power imbalance. In this survey pape… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

  23. arXiv:2205.05646  [pdf, other

    cs.CL cs.AI cs.LG

    Aggregating Pairwise Semantic Differences for Few-Shot Claim Veracity Classification

    Authors: Xia Zeng, Arkaitz Zubiaga

    Abstract: As part of an automated fact-checking pipeline, the claim veracity classification task consists in determining if a claim is supported by an associated piece of evidence. The complexity of gathering labelled claim-evidence pairs leads to a scarcity of datasets, particularly when dealing with new domains. In this paper, we introduce SEED, a novel vector-based method to few-shot claim veracity class… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  24. arXiv:2205.05435  [pdf

    cs.CL cs.AI

    Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

    Authors: Rabab Alkhalifa, Elena Kochkina, Arkaitz Zubiaga

    Abstract: Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task… ▽ More

    Submitted 19 November, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

  25. arXiv:2205.02596  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims

    Authors: M. Arana-Catania, Elena Kochkina, Arkaitz Zubiaga, Maria Liakata, Rob Procter, Yulan He

    Abstract: We present a comprehensive work on automated veracity assessment from dataset creation to developing novel methods based on Natural Language Inference (NLI), focusing on misinformation related to the COVID-19 pandemic. We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19 and their respective information sources. The dataset construction inc… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 16 pages, 1 figure, 8 tables, presented in NAACL 2022

  26. arXiv:2205.01374  [pdf, other

    cs.CL cs.SI

    Hidden behind the obvious: misleading keywords and implicitly abusive language on social media

    Authors: Wenjie Yin, Arkaitz Zubiaga

    Abstract: While social media offers freedom of self-expression, abusive language carry significant negative social impact. Driven by the importance of the issue, research in the automated detection of abusive language has witnessed growth and improvement. However, these detection models display a reliance on strongly indicative keywords, such as slurs and profanity. This means that they can falsely (1a) mis… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Accepted for publication in Online Social Networks and Media

  27. arXiv:2204.00334  [pdf, other

    cs.CL

    Cyberbullying detection across social media platforms via platform-aware adversarial encoding

    Authors: Peiling Yi, Arkaitz Zubiaga

    Abstract: Despite the increasing interest in cyberbullying detection, existing efforts have largely been limited to experiments on a single platform and their generalisability across different social media platforms have received less attention. We propose XP-CB, a novel cross-platform framework based on Transformers and adversarial learning. XP-CB can enhance a Transformer leveraging unlabelled data from t… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

  28. arXiv:2111.03612  [pdf

    cs.CL cs.LG

    Sexism Identification in Tweets and Gabs using Deep Neural Networks

    Authors: Amikul Kalra, Arkaitz Zubiaga

    Abstract: Through anonymisation and accessibility, social media platforms have facilitated the proliferation of hate speech, prompting increased research in developing automatic methods to identify these texts. This paper explores the classification of sexism in text using a variety of deep neural network model architectures such as Long-Short-Term Memory (LSTMs) and Convolutional Neural Networks (CNNs). Th… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: 8 pages

  29. arXiv:2111.00981  [pdf

    cs.CL

    Cross-lingual Hate Speech Detection using Transformer Models

    Authors: Teodor Tiţa, Arkaitz Zubiaga

    Abstract: Hate speech detection within a cross-lingual setting represents a paramount area of interest for all medium and large-scale online platforms. Failing to properly address this issue on a global scale has already led over time to morally questionable real-life events, human deaths, and the perpetuation of hate itself. This paper illustrates the capabilities of fine-tuned altered multi-lingual Transf… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: 7 pages

  30. arXiv:2109.11427  [pdf, ps, other

    cs.CL

    Automated Fact-Checking: A Survey

    Authors: Xia Zeng, Amani S. Abumansour, Arkaitz Zubiaga

    Abstract: As online false information continues to grow, automated fact-checking has gained an increasing amount of attention in recent years. Researchers in the field of Natural Language Processing (NLP) have contributed to the task by building fact-checking datasets, devising automated fact-checking pipelines and proposing NLP methods to further research in the development of different components. This pa… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

  31. arXiv:2109.01537  [pdf, other

    cs.CL cs.AI cs.DB cs.MM

    A Longitudinal Multi-modal Dataset for Dementia Monitoring and Diagnosis

    Authors: Dimitris Gkoumas, Bo Wang, Adam Tsakalidis, Maria Wolters, Arkaitz Zubiaga, Matthew Purver, Maria Liakata

    Abstract: Dementia affects cognitive functions of adults, including memory, language, and behaviour. Standard diagnostic biomarkers such as MRI are costly, whilst neuropsychological tests suffer from sensitivity issues in detecting dementia onset. The analysis of speech and language has emerged as a promising and non-intrusive technology to diagnose and monitor dementia. Currently, most work in this directi… ▽ More

    Submitted 23 December, 2023; v1 submitted 3 September, 2021; originally announced September 2021.

  32. arXiv:2109.00475  [pdf, other

    cs.CL

    Capturing Stance Dynamics in Social Media: Open Challenges and Research Directions

    Authors: Rabab Alkhalifa, Arkaitz Zubiaga

    Abstract: Social media platforms provide a goldmine for mining public opinion on issues of wide societal interest and impact. Opinion mining is a problem that can be operationalised by capturing and aggregating the stance of individual social media posts as supporting, opposing or being neutral towards the issue at hand. While most prior work in stance detection has investigated datasets that cover short pe… ▽ More

    Submitted 26 November, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

  33. arXiv:2108.13898  [pdf

    cs.SI cs.CL cs.DL

    The emojification of sentiment on social media: Collection and analysis of a longitudinal Twitter sentiment dataset

    Authors: Wenjie Yin, Rabab Alkhalifa, Arkaitz Zubiaga

    Abstract: Social media, as a means for computer-mediated communication, has been extensively used to study the sentiment expressed by users around events or topics. There is however a gap in the longitudinal study of how sentiment evolved in social media over the years. To fill this gap, we develop TM-Senti, a new large-scale, distantly supervised Twitter sentiment dataset with over 184 million tweets and c… ▽ More

    Submitted 13 February, 2023; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: corrected typo in appendix

  34. Opinions are Made to be Changed: Temporally Adaptive Stance Classification

    Authors: Rabab Alkhalifa, Elena Kochkina, Arkaitz Zubiaga

    Abstract: Given the rapidly evolving nature of social media and people's views, word usage changes over time. Consequently, the performance of a classifier trained on old textual data can drop dramatically when tested on newer data. While research in stance classification has advanced in recent years, no effort has been invested in making these classifiers have persistent performance over time. To study thi… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

  35. Weakly Supervised Cross-platform Teenager Detection with Adversarial BERT

    Authors: Peiling Yi, Arkaitz Zubiaga

    Abstract: Teenager detection is an important case of the age detection task in social media, which aims to detect teenage users to protect them from negative influences. The teenager detection task suffers from the scarcity of labelled data, which exacerbates the ability to perform well across social media platforms. To further research in teenager detection in settings where no labelled data is available f… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: 6 pages,1 figure,3 tables

  36. Cross-lingual Capsule Network for Hate Speech Detection in Social Media

    Authors: Aiqi Jiang, Arkaitz Zubiaga

    Abstract: Most hate speech detection research focuses on a single language, generally English, which limits their generalisability to other languages. In this paper we investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another. We propose a cross-lingual capsule network learning model coupled with extra domain-specific le… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: 7 pages, 1 figure, 4 tables

  37. arXiv:2108.03070  [pdf, other

    cs.CL

    SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection

    Authors: Aiqi Jiang, Xiaohan Yang, Yang Liu, Arkaitz Zubiaga

    Abstract: Online sexism has become an increasing concern in social media platforms as it has affected the healthy development of the Internet and can have negative effects in society. While research in the sexism detection domain is growing, most of this research focuses on English as the language and on Twitter as the platform. Our objective here is to broaden the scope of this research by considering the… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: 44 pages, 5 figure, 9 tables

  38. arXiv:2104.11572  [pdf, other

    cs.CL

    QMUL-SDS at SCIVER: Step-by-Step Binary Classification for Scientific Claim Verification

    Authors: Xia Zeng, Arkaitz Zubiaga

    Abstract: Scientific claim verification is a unique challenge that is attracting increasing interest. The SCIVER shared task offers a benchmark scenario to test and compare claim verification approaches by participating teams and consists in three steps: relevant abstract selection, rationale selection and label prediction. In this paper, we present team QMUL-SDS's participation in the shared task. We propo… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

  39. arXiv:2103.00508  [pdf

    cs.CL cs.CY cs.LG

    Citizen Participation and Machine Learning for a Better Democracy

    Authors: M. Arana-Catania, F. A. Van Lier, Rob Procter, Nataliya Tkachenko, Yulan He, Arkaitz Zubiaga, Maria Liakata

    Abstract: The development of democratic systems is a crucial task as confirmed by its selection as one of the Millennium Sustainable Development Goals by the United Nations. In this article, we report on the progress of a project that aims to address barriers, one of which is information overload, to achieving effective direct citizen participation in democratic decision-making processes. The main objective… ▽ More

    Submitted 28 February, 2021; originally announced March 2021.

    Comments: 19 pages, 5 figures, 4 tables, to appear in Digital Government: Research and Practice (DGOV)

  40. arXiv:2102.08886  [pdf, ps, other

    cs.CL

    Towards generalisable hate speech detection: a review on obstacles and solutions

    Authors: Wenjie Yin, Arkaitz Zubiaga

    Abstract: Hate speech is one type of harmful online content which directly attacks or promotes hate towards a group or an individual member based on their actual or perceived aspects of identity, such as ethnicity, religion, and sexual orientation. With online hate speech on the rise, its automatic detection as a natural language processing task is gaining increasing interest. However, it is only recently t… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

  41. arXiv:2012.06606  [pdf, other

    cs.CL

    TF-CR: Weighting Embeddings for Text Classification

    Authors: Arkaitz Zubiaga

    Abstract: Text classification, as the task consisting in assigning categories to textual instances, is a very common task in information science. Methods learning distributed representations of words, such as word embeddings, have become popular in recent years as the features to use for text classification tasks. Despite the increasing use of word embeddings for text classification, these are generally use… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

  42. An Online Multilingual Hate speech Recognition System

    Authors: Neeraj Vashistha, Arkaitz Zubiaga, Shanky Sharma

    Abstract: The exponential increase in the use of the Internet and social media over the last two decades has changed human interaction. This has led to many positive outcomes, but at the same time it has brought risks and harms. While the volume of harmful content online, such as hate speech, is not manageable by humans, interest in the academic community to investigate automated means for hate speech detec… ▽ More

    Submitted 22 December, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: 11 pages, 5 figures, appear in Special Issue "Natural Language Processing for Social Media" on MDPI Information 2021, 12(1), 5

    Journal ref: Information 12, no. 1: 5 (2021)

  43. arXiv:2011.02935  [pdf, other

    cs.CL

    QMUL-SDS @ DIACR-Ita: Evaluating Unsupervised Diachronic Lexical Semantics Classification in Italian

    Authors: Rabab Alkhalifa, Adam Tsakalidis, Arkaitz Zubiaga, Maria Liakata

    Abstract: In this paper, we present the results and main findings of our system for the DIACR-ITA 2020 Task. Our system focuses on using variations of training sets and different semantic detection methods. The task involves training, aligning and predicting a word's vector change from two diachronic Italian corpora. We demonstrate that using Temporal Word Embeddings with a Compass C-BOW model is more effec… ▽ More

    Submitted 6 November, 2020; v1 submitted 5 November, 2020; originally announced November 2020.

  44. arXiv:2011.02788  [pdf, ps, other

    cs.CL

    NUAA-QMUL at SemEval-2020 Task 8: Utilizing BERT and DenseNet for Internet Meme Emotion Analysis

    Authors: Xiaoyu Guo, Jing Ma, Arkaitz Zubiaga

    Abstract: This paper describes our contribution to SemEval 2020 Task 8: Memotion Analysis. Our system learns multi-modal embeddings from text and images in order to classify Internet memes by sentiment. Our model learns text embeddings using BERT and extracts features from images with DenseNet, subsequently combining both features through concatenation. We also compare our results with those produced by Den… ▽ More

    Submitted 9 November, 2020; v1 submitted 5 November, 2020; originally announced November 2020.

  45. arXiv:2011.01181  [pdf, other

    cs.CL

    QMUL-SDS @ SardiStance: Leveraging Network Interactions to Boost Performance on Stance Detection using Knowledge Graphs

    Authors: Rabab Alkhalifa, Arkaitz Zubiaga

    Abstract: This paper presents our submission to the SardiStance 2020 shared task, describing the architecture used for Task A and Task B. While our submission for Task A did not exceed the baseline, retraining our model using all the training tweets, showed promising results leading to (f-avg 0.601) using bidirectional LSTM with BERT multilingual embedding for Task A. For our submission for Task B, we ranke… ▽ More

    Submitted 6 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

  46. arXiv:2008.13160  [pdf, other

    cs.CL cs.LG cs.SI

    QMUL-SDS at CheckThat! 2020: Determining COVID-19 Tweet Check-Worthiness Using an Enhanced CT-BERT with Numeric Expressions

    Authors: Rabab Alkhalifa, Theodore Yoong, Elena Kochkina, Arkaitz Zubiaga, Maria Liakata

    Abstract: This paper describes the participation of the QMUL-SDS team for Task 1 of the CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the check-worthiness of tweets about COVID-19 to identify and prioritise tweets that need fact-checking. The overarching aim is to further support ongoing efforts to protect the public from fake news and help people find reliable information. We d… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

  47. arXiv:2006.02104  [pdf, other

    cs.CL cs.IR

    Exploiting Class Labels to Boost Performance on Embedding-based Text Classification

    Authors: Arkaitz Zubiaga

    Abstract: Text classification is one of the most frequent tasks for processing textual data, facilitating among others research from large-scale datasets. Embeddings of different kinds have recently become the de facto standard as features used for text classification. These embeddings have the capacity to capture meanings of words inferred from occurrences in large external collections. While they are buil… ▽ More

    Submitted 1 September, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: CIKM 2020

  48. A Longitudinal Analysis of the Public Perception of the Opportunities and Challenges of the Internet of Things

    Authors: Arkaitz Zubiaga, Rob Procter, Carsten Maple

    Abstract: The Internet of Things (or IoT), which enables the networked interconnection of everyday objects, is becoming increasingly popular in many aspects of our lives ranging from entertainment to health care. While the IoT brings a set of invaluable advantages and opportunities with it, there is also evidence of numerous challenges that are yet to be resolved. This is certainly the case with regard to e… ▽ More

    Submitted 3 December, 2018; originally announced December 2018.

  49. arXiv:1811.05689  [pdf, other

    cs.CL

    Leveraging Aspect Phrase Embeddings for Cross-Domain Review Rating Prediction

    Authors: Aiqi Jiang, Arkaitz Zubiaga

    Abstract: Online review platforms are a popular way for users to post reviews by expressing their opinions towards a product or service, as well as they are valuable for other users and companies to find out the overall opinions of customers. These reviews tend to be accompanied by a rating, where the star rating has become the most common approach for users to give their feedback in a quantitative way, gen… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

    Comments: 16 pages, 1 figure

  50. arXiv:1809.08193  [pdf, other

    cs.CL

    Towards Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection

    Authors: Lev Konstantinovskiy, Oliver Price, Mevan Babakar, Arkaitz Zubiaga

    Abstract: In an effort to assist factcheckers in the process of factchecking, we tackle the claim detection task, one of the necessary stages prior to determining the veracity of a claim. It consists of identifying the set of sentences, out of a long text, deemed capable of being factchecked. This paper is a collaborative work between Full Fact, an independent factchecking charity, and academic partners. Le… ▽ More

    Submitted 17 August, 2020; v1 submitted 21 September, 2018; originally announced September 2018.

    Comments: Accepted for ACM Digital Threats: Research and Practice (DTRAP)