Zum Hauptinhalt springen

Showing 1–50 of 56 results for author: Liakata, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15689  [pdf, other

    cs.CL

    TempoFormer: A Transformer for Temporally-aware Representations in Change Detection

    Authors: Talia Tseriotou, Adam Tsakalidis, Maria Liakata

    Abstract: Dynamic representation learning plays a pivotal role in understanding the evolution of linguistic content over time. On this front both context and time dynamics as well as their interplay are of prime importance. Current approaches model context via pre-trained representations, which are typically temporally agnostic. Previous work on modeling context and temporal dynamics has used recurrent meth… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2405.13984  [pdf, other

    cs.CL cs.MM

    Feedback-aligned Mixed LLMs for Machine Language-Molecule Translation

    Authors: Dimitris Gkoumas, Maria Liakata

    Abstract: The intersection of chemistry and Artificial Intelligence (AI) is an active area of research focused on accelerating scientific discovery. While using large language models (LLMs) with scientific modalities has shown potential, there are significant challenges to address, such as improving training efficiency and dealing with the out-of-distribution problem. Focussing on the task of automated lang… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  3. arXiv:2402.10735  [pdf, other

    cs.CL

    Assessing the Reasoning Abilities of ChatGPT in the Context of Claim Verification

    Authors: John Dougrez-Lewis, Mahmud Elahi Akhter, Yulan He, Maria Liakata

    Abstract: The reasoning capabilities of LLMs are currently hotly debated. We examine the issue from the perspective of claim/rumour verification. We propose the first logical reasoning framework designed to break down any claim or rumour paired with evidence into the atomic reasoning steps necessary for verification. Based on our framework, we curate two annotated collections of such claim/evidence pairs: a… ▽ More

    Submitted 20 March, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 19 pages, 1 figure

  4. arXiv:2401.16240  [pdf, other

    cs.CL cs.AI

    Combining Hierachical VAEs with LLMs for clinically meaningful timeline summarisation in social media

    Authors: Jiayu Song, Jenny Chim, Adam Tsakalidis, Julia Ive, Dana Atzil-Slonim, Maria Liakata

    Abstract: We introduce a hybrid abstractive summarisation approach combining hierarchical VAE with LLMs (LlaMA-2) to produce clinically meaningful summaries from social media user timelines, appropriate for mental health monitoring. The summaries combine two different narrative points of view: clinical insights in third person useful for a clinician are generated by feeding into an LLM specialised clinical… ▽ More

    Submitted 16 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  5. arXiv:2401.12713  [pdf, other

    cs.CL

    Generating Zero-shot Abstractive Explanations for Rumour Verification

    Authors: Iman Munire Bilal, Preslav Nakov, Rob Procter, Maria Liakata

    Abstract: The task of rumour verification in social media concerns assessing the veracity of a claim on the basis of conversation threads that result from it. While previous work has focused on predicting a veracity label, here we reformulate the task to generate model-centric free-text explanations of a rumour's veracity. The approach is model agnostic in that it generalises to any model. Here we propose a… ▽ More

    Submitted 23 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Revised version of the original

  6. arXiv:2312.03523  [pdf, other

    cs.CL

    Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling

    Authors: Talia Tseriotou, Ryan Sze-Yin Chan, Adam Tsakalidis, Iman Munire Bilal, Elena Kochkina, Terry Lyons, Maria Liakata

    Abstract: We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building block… ▽ More

    Submitted 6 February, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: To appear in EACL 2024: System Demonstrations

  7. arXiv:2310.09897  [pdf, other

    cs.CL

    Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia

    Authors: Dimitris Gkoumas, Matthew Purver, Maria Liakata

    Abstract: Dementia is associated with language disorders which impede communication. Here, we automatically learn linguistic disorder patterns by making use of a moderately-sized pre-trained language model and forcing it to focus on reformulated natural language processing (NLP) tasks and associated linguistic patterns. Our experiments show that NLP tasks that encapsulate contextual information and enhance… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: It has been accepted to appear at EMNLP23

  8. arXiv:2310.09623  [pdf, other

    cs.CL

    A Digital Language Coherence Marker for Monitoring Dementia

    Authors: Dimitris Gkoumas, Adam Tsakalidis, Maria Liakata

    Abstract: The use of spontaneous language to derive appropriate digital markers has become an emergent, promising and non-intrusive method to diagnose and monitor dementia. Here we propose methods to capture language coherence as a cost-effective, human-interpretable digital marker for monitoring cognitive changes in people with dementia. We introduce a novel task to learn the temporal logical consistency o… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: It has been accepted to appear at EMNLP23

  9. arXiv:2310.06552  [pdf, other

    cs.AI cs.CL

    Automated clinical coding using off-the-shelf large language models

    Authors: Joseph S. Boyle, Antanas Kascenas, Pat Lok, Maria Liakata, Alison Q. O'Neil

    Abstract: The task of assigning diagnostic ICD codes to patient hospital admissions is typically performed by expert human coders. Efforts towards automated ICD coding are dominated by supervised deep learning models. However, difficulties in learning to predict the large number of rare codes remain a barrier to adoption in clinical practice. In this work, we leverage off-the-shelf pre-trained generative la… ▽ More

    Submitted 13 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted to the NeurIPS 2023 workshop Deep Generative Models For Health (DGM4H). 9 pages, 3 figures

    ACM Class: I.2.7; I.2.8

  10. arXiv:2305.02224  [pdf, ps, other

    cs.HC

    Some Observations on Fact-Checking Work with Implications for Computational Support

    Authors: Rob Procter, Miguel Arana-Catania, Yulan He, Maria Liakata, Arkaitz Zubiaga, Elena Kochkina, Runcong Zhao

    Abstract: Social media and user-generated content (UGC) have become increasingly important features of journalistic work in a number of different ways. However, the growth of misinformation means that news organisations have had devote more and more resources to determining its veracity and to publishing corrections if it is found to be misleading. In this work, we present the results of interviews with eig… ▽ More

    Submitted 6 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: 11 pages. International AAAI Conference on Web and Social Media, Mediate 2023: News Media and Computational Journalism Workshop

    ACM Class: H.1.2; H.5.2

  11. arXiv:2303.05891  [pdf, other

    cs.CL

    Creation and evaluation of timelines for longitudinal user posts

    Authors: Anthony Hills, Adam Tsakalidis, Federico Nanni, Ioannis Zachos, Maria Liakata

    Abstract: There is increasing interest to work with user generated content in social media, especially textual posts over time. Currently there is no consistent way of segmenting user posts into timelines in a meaningful way that improves the quality and cost of manual annotation. Here we propose a set of methods for segmenting longitudinal user posts into timelines likely to contain interesting moments of… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted at EACL 2023 (main, long); camera-ready version

  12. arXiv:2303.01241  [pdf, other

    cs.CL cs.LG

    PANACEA: An Automated Misinformation Detection System on COVID-19

    Authors: Runcong Zhao, Miguel Arana-Catania, Lixing Zhu, Elena Kochkina, Lin Gui, Arkaitz Zubiaga, Rob Procter, Maria Liakata, Yulan He

    Abstract: In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporti… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

  13. arXiv:2211.16971  [pdf, other

    cs.CL cs.LG

    A Pipeline for Generating, Annotating and Employing Synthetic Data for Real World Question Answering

    Authors: Matthew Maufe, James Ravenscroft, Rob Procter, Maria Liakata

    Abstract: Question Answering (QA) is a growing area of research, often used to facilitate the extraction of information from within documents. State-of-the-art QA models are usually pre-trained on domain-general corpora like Wikipedia and thus tend to struggle on out-of-domain documents without fine-tuning. We demonstrate that synthetic domain-specific datasets can be generated easily using domain-general m… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: To be published in the companion proceedings of EMNLP 2022. 17 pages (11 of which are in the appendix), 7 figures (3 of which are in the appendix)

    ACM Class: I.2.7

  14. arXiv:2211.14923  [pdf, other

    cs.CL cs.AI

    Unsupervised Opinion Summarisation in the Wasserstein Space

    Authors: Jiayu Song, Iman Munire Bilal, Adam Tsakalidis, Rob Procter, Maria Liakata

    Abstract: Opinion summarisation synthesises opinions expressed in a group of documents discussing the same topic to produce a single summary. Recent work has looked at opinion summarisation of clusters of social media posts. Such posts are noisy and have unpredictable structure, posing additional challenges for the construction of the summary distribution and the preservation of meaning compared to online r… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

  15. arXiv:2208.04083  [pdf, other

    cs.CL cs.LG

    Template-based Abstractive Microblog Opinion Summarisation

    Authors: Iman Munire Bilal, Bo Wang, Adam Tsakalidis, Dong Nguyen, Rob Procter, Maria Liakata

    Abstract: We introduce the task of microblog opinion summarisation (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarisation dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarising… ▽ More

    Submitted 3 October, 2022; v1 submitted 8 August, 2022; originally announced August 2022.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2022. Pre-MIT Press publication version

  16. arXiv:2208.00033  [pdf, other

    cs.LG cs.AI

    Personalised recommendations of sleep behaviour with neural networks using sleep diaries captured in Sleepio

    Authors: Alejo Nevado-Holgado, Colin Espie, Maria Liakata, Alasdair Henry, Jenny Gu, Niall Taylor, Kate Saunders, Tom Walker, Chris Miller

    Abstract: SleepioTM is a digital mobile phone and web platform that uses techniques from cognitive behavioural therapy (CBT) to improve sleep in people with sleep difficulty. As part of this process, Sleepio captures data about the sleep behaviour of the users that have consented to such data being processed. For neural networks, the scale of the data is an opportunity to train meaningful models translatabl… ▽ More

    Submitted 29 July, 2022; originally announced August 2022.

  17. arXiv:2207.13970  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    PHEMEPlus: Enriching Social Media Rumour Verification with External Evidence

    Authors: John Dougrez-Lewis, Elena Kochkina, M. Arana-Catania, Maria Liakata, Yulan He

    Abstract: Work on social media rumour verification utilises signals from posts, their propagation and users involved. Other lines of work target identifying and fact-checking claims based on information from Wikipedia, or trustworthy news articles without considering social media context. However works combining the information from social media with external evidence from the wider web are lacking. To faci… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: 10 pages, 1 figure, 5 tables, presented in the Fifth Fact Extraction and VERification Workshop (FEVER). 2022

  18. arXiv:2205.05593  [pdf, other

    cs.CL

    Identifying Moments of Change from Longitudinal User Text

    Authors: Adam Tsakalidis, Federico Nanni, Anthony Hills, Jenny Chim, Jiayu Song, Maria Liakata

    Abstract: Identifying changes in individuals' behaviour and mood, as observed via content shared on online platforms, is increasingly gaining importance. Most research to-date on this topic focuses on either: (a) identifying individuals at risk or with a certain mental health condition given a batch of posts or (b) providing equivalent labels at the post level. A disadvantage of such work is the lack of a s… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: Preprint accepted for publication in ACL 2022

  19. arXiv:2205.02596  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims

    Authors: M. Arana-Catania, Elena Kochkina, Arkaitz Zubiaga, Maria Liakata, Rob Procter, Yulan He

    Abstract: We present a comprehensive work on automated veracity assessment from dataset creation to developing novel methods based on Natural Language Inference (NLI), focusing on misinformation related to the COVID-19 pandemic. We first describe the construction of the novel PANACEA dataset consisting of heterogeneous claims on COVID-19 and their respective information sources. The dataset construction inc… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 16 pages, 1 figure, 8 tables, presented in NAACL 2022

  20. arXiv:2110.05847  [pdf, other

    cs.CL cs.CY cs.LG

    Evaluation of Abstractive Summarisation Models with Machine Translation in Deliberative Processes

    Authors: M. Arana-Catania, Rob Procter, Yulan He, Maria Liakata

    Abstract: We present work on summarising deliberative processes for non-English languages. Unlike commonly studied datasets, such as news articles, this deliberation dataset reflects difficulties of combining multiple narratives, mostly of poor grammatical quality, in a single text. We report an extensive evaluation of a wide range of abstractive summarisation models in combination with an off-the-shelf mac… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 8 pages, presented in EMNLP 2021 - New Frontiers in Summarization Workshop

  21. arXiv:2109.01537  [pdf, other

    cs.CL cs.AI cs.DB cs.MM

    A Longitudinal Multi-modal Dataset for Dementia Monitoring and Diagnosis

    Authors: Dimitris Gkoumas, Bo Wang, Adam Tsakalidis, Maria Wolters, Arkaitz Zubiaga, Matthew Purver, Maria Liakata

    Abstract: Dementia affects cognitive functions of adults, including memory, language, and behaviour. Standard diagnostic biomarkers such as MRI are costly, whilst neuropsychological tests suffer from sensitivity issues in detecting dementia onset. The analysis of speech and language has emerged as a promising and non-intrusive technology to diagnose and monitor dementia. Currently, most work in this directi… ▽ More

    Submitted 23 December, 2023; v1 submitted 3 September, 2021; originally announced September 2021.

  22. arXiv:2106.15971  [pdf, ps, other

    cs.CL

    Evaluation of Thematic Coherence in Microblogs

    Authors: Iman Munire Bilal, Bo Wang, Maria Liakata, Rob Procter, Adam Tsakalidis

    Abstract: Collecting together microblogs representing opinions about the same topics within the same timeframe is useful to a number of different tasks and practitioners. A major question is how to evaluate the quality of such thematic clusters. Here we create a corpus of microblog clusters from three different domains and time windows and define the task of evaluating thematic coherence. We provide annotat… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

    Comments: ACL 2021 - Long Paper - Association for Computational Linguistics

  23. arXiv:2103.00508  [pdf

    cs.CL cs.CY cs.LG

    Citizen Participation and Machine Learning for a Better Democracy

    Authors: M. Arana-Catania, F. A. Van Lier, Rob Procter, Nataliya Tkachenko, Yulan He, Arkaitz Zubiaga, Maria Liakata

    Abstract: The development of democratic systems is a crucial task as confirmed by its selection as one of the Millennium Sustainable Development Goals by the United Nations. In this article, we report on the progress of a project that aims to address barriers, one of which is information overload, to achieving effective direct citizen participation in democratic decision-making processes. The main objective… ▽ More

    Submitted 28 February, 2021; originally announced March 2021.

    Comments: 19 pages, 5 figures, 4 tables, to appear in Digital Government: Research and Practice (DGOV)

  24. arXiv:2102.09607  [pdf, ps, other

    cs.LG cs.CL cs.SD eess.AS

    Modelling Paralinguistic Properties in Conversational Speech to Detect Bipolar Disorder and Borderline Personality Disorder

    Authors: Bo Wang, Yue Wu, Nemanja Vaci, Maria Liakata, Terry Lyons, Kate E A Saunders

    Abstract: Bipolar disorder (BD) and borderline personality disorder (BPD) are two chronic mental health conditions that clinicians find challenging to distinguish based on clinical interviews, due to their overlapping symptoms. In this work, we investigate the automatic detection of these two conditions by modelling both verbal and non-verbal cues in a set of interviews. We propose a new approach of modelli… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    MSC Class: 60L10

  25. arXiv:2102.08366  [pdf, other

    cs.CL cs.IR cs.LG

    Boosting Low-Resource Biomedical QA via Entity-Aware Masking Strategies

    Authors: Gabriele Pergola, Elena Kochkina, Lin Gui, Maria Liakata, Yulan He

    Abstract: Biomedical question-answering (QA) has gained increased attention for its capability to provide users with high-quality information from a vast scientific literature. Although an increasing number of biomedical QA datasets has been recently made available, those resources are still rather limited and expensive to produce. Transfer learning via pre-trained language models (LMs) has been shown as a… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: EACL 2021 - Short Paper - European Chapter of the Association for Computational Linguistics

  26. arXiv:2101.12637  [pdf, other

    cs.CL

    CD2CR: Co-reference Resolution Across Documents and Domains

    Authors: James Ravenscroft, Arie Cattan, Amanda Clare, Ido Dagan, Maria Liakata

    Abstract: Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents. Current state-of-the-art models for this task assume that all documents are of the same type (e.g. news articles) or fall under the same theme. However, it is also desirable to perform CDCR across different domains (type or theme). A particular use case… ▽ More

    Submitted 29 January, 2021; originally announced January 2021.

    Comments: 9 pages, 5 figures, accepted at EACL 2021

    ACM Class: I.2.7

  27. arXiv:2011.02935  [pdf, other

    cs.CL

    QMUL-SDS @ DIACR-Ita: Evaluating Unsupervised Diachronic Lexical Semantics Classification in Italian

    Authors: Rabab Alkhalifa, Adam Tsakalidis, Arkaitz Zubiaga, Maria Liakata

    Abstract: In this paper, we present the results and main findings of our system for the DIACR-ITA 2020 Task. Our system focuses on using variations of training sets and different semantic detection methods. The task involves training, aligning and predicting a word's vector change from two diachronic Italian corpora. We demonstrate that using Temporal Word Embeddings with a Compass C-BOW model is more effec… ▽ More

    Submitted 6 November, 2020; v1 submitted 5 November, 2020; originally announced November 2020.

  28. arXiv:2010.12532  [pdf, other

    cs.CL

    GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method

    Authors: Nicole Peinelt, Marek Rei, Maria Liakata

    Abstract: Large pre-trained language models such as BERT have been the driving force behind recent improvements across many NLP tasks. However, BERT is only trained to predict missing words - either behind masks or in the next sentence - and has no knowledge of lexical, syntactic or semantic information beyond what it picks up through unsupervised pre-training. We propose a novel method to explicitly inject… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  29. arXiv:2008.13160  [pdf, other

    cs.CL cs.LG cs.SI

    QMUL-SDS at CheckThat! 2020: Determining COVID-19 Tweet Check-Worthiness Using an Enhanced CT-BERT with Numeric Expressions

    Authors: Rabab Alkhalifa, Theodore Yoong, Elena Kochkina, Arkaitz Zubiaga, Maria Liakata

    Abstract: This paper describes the participation of the QMUL-SDS team for Task 1 of the CLEF 2020 CheckThat! shared task. The purpose of this task is to determine the check-worthiness of tweets about COVID-19 to identify and prioritise tweets that need fact-checking. The overarching aim is to further support ongoing efforts to protect the public from fake news and help people find reliable information. We d… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

  30. arXiv:2008.03408  [pdf, other

    cs.LG cs.CL eess.AS stat.ML

    Learning to Detect Bipolar Disorder and Borderline Personality Disorder with Language and Speech in Non-Clinical Interviews

    Authors: Bo Wang, Yue Wu, Niall Taylor, Terry Lyons, Maria Liakata, Alejo J Nevado-Holgado, Kate E A Saunders

    Abstract: Bipolar disorder (BD) and borderline personality disorder (BPD) are both chronic psychiatric disorders. However, their overlapping symptoms and common comorbidity make it challenging for the clinicians to distinguish the two conditions on the basis of a clinical interview. In this work, we first present a new multi-modal dataset containing interviews involving individuals with BD or BPD being inte… ▽ More

    Submitted 31 May, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

    MSC Class: 60L10

  31. arXiv:2007.14454  [pdf, other

    cs.CL

    Measuring prominence of scientific work in online news as a proxy for impact

    Authors: James Ravenscroft, Amanda Clare, Maria Liakata

    Abstract: The impact made by a scientific paper on the work of other academics has many established metrics, including metrics based on citation counts and social media commenting. However, determination of the impact of a scientific paper on the wider society is less well established. For example, is it important for scientific work to be newsworthy? Here we present a new corpus of newspaper articles linke… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: 13 pages, 5 figures

    ACM Class: I.2.7

  32. arXiv:2007.11314  [pdf, other

    cs.CL

    Better Early than Late: Fusing Topics with Word Embeddings for Neural Question Paraphrase Identification

    Authors: Nicole Peinelt, Dong Nguyen, Maria Liakata

    Abstract: Question paraphrase identification is a key task in Community Question Answering (CQA) to determine if an incoming question has been previously asked. Many current models use word embeddings to identify duplicate questions, but the use of topic models in feature-engineered systems suggests that they can be helpful for this task, too. We therefore propose two ways of merging topics with word embedd… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

  33. arXiv:2005.07174  [pdf, other

    cs.CL cs.LG

    Estimating predictive uncertainty for rumour verification models

    Authors: Elena Kochkina, Maria Liakata

    Abstract: The inability to correctly resolve rumours circulating online can have harmful real-world consequences. We present a method for incorporating model and data uncertainty estimates into natural language processing models for automatic rumour verification. We show that these estimates can be used to filter out model predictions likely to be erroneous, so that these difficult instances can be prioriti… ▽ More

    Submitted 14 May, 2020; originally announced May 2020.

    Comments: Accepted to the Annual Conference of the Association for Computational Linguistics (ACL) 2020

  34. arXiv:2004.13703  [pdf, other

    cs.CL

    Autoencoding Word Representations through Time for Semantic Change Detection

    Authors: Adam Tsakalidis, Maria Liakata

    Abstract: Semantic change detection concerns the task of identifying words whose meaning has changed over time. The current state-of-the-art detects the level of semantic change in a word by comparing its vector representation in two distinct time periods, without considering its evolution through time. In this work, we propose three variants of sequential models for detecting semantically shifted words, ef… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

  35. How we do things with words: Analyzing text as social and cultural data

    Authors: Dong Nguyen, Maria Liakata, Simon DeDeo, Jacob Eisenstein, David Mimno, Rebekah Tromble, Jane Winters

    Abstract: In this article we describe our experiences with computational text analysis. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of best practices for working with thick social and cultural concepts. Our guidance is based on our own experiences an… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

    Journal ref: Front. Artif. Intell. 3:62 (2020)

  36. arXiv:1809.06683  [pdf, other

    cs.CL

    RumourEval 2019: Determining Rumour Veracity and Support for Rumours

    Authors: Genevieve Gorrell, Kalina Bontcheva, Leon Derczynski, Elena Kochkina, Maria Liakata, Arkaitz Zubiaga

    Abstract: This is the proposal for RumourEval-2019, which will run in early 2019 as part of that year's SemEval event. Since the first RumourEval shared task in 2017, interest in automated claim validation has greatly increased, as the dangers of "fake news" have become a mainstream concern. Yet automated support for rumour checking remains in its infancy. For this reason, it is important that a shared task… ▽ More

    Submitted 18 September, 2018; originally announced September 2018.

  37. arXiv:1808.08538  [pdf, other

    cs.CY cs.CL cs.SI

    Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum

    Authors: Adam Tsakalidis, Nikolaos Aletras, Alexandra I. Cristea, Maria Liakata

    Abstract: Modelling user voting intention in social media is an important research area, with applications in analysing electorate behaviour, online political campaigning and advertising. Previous approaches mainly focus on predicting national general elections, which are regularly scheduled and where data of past results and opinion polls are available. However, there is no evidence of how such models woul… ▽ More

    Submitted 26 August, 2018; originally announced August 2018.

    Comments: Preprint accepted for publication in the ACM International Conference on Information and Knowledge Management (CIKM 2018)

  38. arXiv:1807.07351  [pdf, other

    cs.CY cs.CL

    Can We Assess Mental Health through Social Media and Smart Devices? Addressing Bias in Methodology and Evaluation

    Authors: Adam Tsakalidis, Maria Liakata, Theo Damoulas, Alexandra I. Cristea

    Abstract: Predicting mental health from smartphone and social media data on a longitudinal basis has recently attracted great interest, with very promising results being reported across many studies. Such approaches have the potential to revolutionise mental health assessment, if their development and evaluation follows a real world deployment setting. In this work we take a closer look at state-of-the-art… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

    Comments: Preprint accepted for publication in the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2018 Applied Data Science Track)

  39. arXiv:1806.03713  [pdf, other

    cs.CL

    All-in-one: Multi-task Learning for Rumour Verification

    Authors: Elena Kochkina, Maria Liakata, Arkaitz Zubiaga

    Abstract: Automatic resolution of rumours is a challenging task that can be broken down into smaller components that make up a pipeline, including rumour detection, rumour tracking and stance classification, leading to the final outcome of determining the veracity of a rumour. In previous work, these steps in the process of rumour verification have been developed as separate components where the output of o… ▽ More

    Submitted 10 June, 2018; originally announced June 2018.

  40. Discourse-Aware Rumour Stance Classification in Social Media Using Sequential Classifiers

    Authors: Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Isabelle Augenstein

    Abstract: Rumour stance classification, defined as classifying the stance of specific social media posts into one of supporting, denying, querying or commenting on an earlier post, is becoming of increasing interest to researchers. While most previous work has focused on using individual tweets as classifier inputs, here we report on the performance of sequential classifiers that exploit the discourse featu… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.

    Journal ref: Information Processing & Management, Volume 54, Issue 2, March 2018, Pages 273-290

  41. arXiv:1706.05535  [pdf, other

    cs.SI physics.soc-ph

    Urban Analytics: Multiplexed and Dynamic Community Networks

    Authors: Weisi Guo, Guillem Mosquera Donate, Stephen Law, Samuel Johnson, Maria Liakata, Alan Wilson

    Abstract: In the past decade, cities have experienced rapid growth, expansion, and changes in their community structure. Many aspects of critical urban infrastructure are closely coupled with the human communities that they serve. Urban communities are composed of a multiplex of overlapping factors which can be distinguished into cultural, religious, social-economic, political, and geographical layers. In t… ▽ More

    Submitted 11 December, 2018; v1 submitted 17 June, 2017; originally announced June 2017.

  42. arXiv:1704.07221  [pdf, other

    cs.CL cs.AI

    Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM

    Authors: Elena Kochkina, Maria Liakata, Isabelle Augenstein

    Abstract: This paper describes team Turing's submission to SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A). Subtask A addresses the challenge of rumour stance classification, which involves identifying the attitude of Twitter users towards the truthfulness of the rumour they are discussing. Stance classification is considered to be an important s… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

    Comments: SemEval 2017 RumourEval: Determining rumour veracity and support for rumours (SemEval 2017 Task 8, Subtask A)

  43. arXiv:1704.05972  [pdf, ps, other

    cs.CL cs.AI

    SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours

    Authors: Leon Derczynski, Kalina Bontcheva, Maria Liakata, Rob Procter, Geraldine Wong Sak Hoi, Arkaitz Zubiaga

    Abstract: Media is full of false claims. Even Oxford Dictionaries named "post-truth" as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the kind of discourse there is around it. RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset c… ▽ More

    Submitted 19 April, 2017; originally announced April 2017.

  44. arXiv:1704.00656  [pdf, other

    cs.CL cs.HC cs.IR cs.SI

    Detection and Resolution of Rumours in Social Media: A Survey

    Authors: Arkaitz Zubiaga, Ahmet Aker, Kalina Bontcheva, Maria Liakata, Rob Procter

    Abstract: Despite the increasing use of social media platforms for information and news gathering, its unmoderated nature often leads to the emergence and spread of rumours, i.e. pieces of information that are unverified at the time of posting. At the same time, the openness of social media platforms provides opportunities to study how users share and discuss rumours, and to explore how natural language pro… ▽ More

    Submitted 3 April, 2018; v1 submitted 3 April, 2017; originally announced April 2017.

    Comments: ACM Computing Surveys

    Journal ref: ACM Computing Surveys 51, 2, Article 32 (February 2018), 36 pages

  45. arXiv:1702.08388  [pdf, other

    cs.CL cs.SI

    Political Homophily in Independence Movements: Analysing and Classifying Social Media Users by National Identity

    Authors: Arkaitz Zubiaga, Bo Wang, Maria Liakata, Rob Procter

    Abstract: Social media and data mining are increasingly being used to analyse political and societal issues. Here we undertake the classification of social media users as supporting or opposing ongoing independence movements in their territories. Independence movements occur in territories whose citizens have conflicting national identities; users with opposing national identities will then support or oppos… ▽ More

    Submitted 21 March, 2018; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: Accepted for publication in IEEE Intelligent Systems

  46. arXiv:1702.06491  [pdf

    cs.HC cs.SI

    Supporting the use of user generated content in journalistic practice

    Authors: Peter Tolmie, Rob Procter, David William Randall, Mark Rouncefield, Christian Burger, Geraldine Wong Sak Hoi, Arkaitz Zubiaga, Maria Liakata

    Abstract: Social media and user-generated content (UGC) are increasingly important features of journalistic work in a number of different ways. However, their use presents major challenges, not least because information posted on social media is not always reliable and therefore its veracity needs to be checked before it can be considered as fit for use in the reporting of news. We report on the results of… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

    Comments: CHI 2017, best paper award

  47. arXiv:1610.07363  [pdf, other

    cs.CL cs.IR cs.SI

    Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media

    Authors: Arkaitz Zubiaga, Maria Liakata, Rob Procter

    Abstract: Breaking news leads to situations of fast-paced reporting in social media, producing all kinds of updates related to news stories, albeit with the caveat that some of those early updates tend to be rumours, i.e., information with an unverified status at the time of posting. Flagging information that is unverified can be helpful to avoid the spread of information that may turn out to be false. Dete… ▽ More

    Submitted 24 October, 2016; originally announced October 2016.

  48. arXiv:1610.03771  [pdf, other

    cs.CL

    SentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban Neighbourhoods

    Authors: Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, Sebastian Riedel

    Abstract: In this paper, we introduce the task of targeted aspect-based sentiment analysis. The goal is to extract fine-grained information with respect to entities mentioned in user comments. This work extends both aspect-based sentiment analysis that assumes a single entity per document and targeted sentiment analysis that assumes a single sentiment towards a target entity. In particular, we identify the… ▽ More

    Submitted 12 October, 2016; originally announced October 2016.

    Comments: Accepted at COLING 2016

  49. arXiv:1609.09028  [pdf, other

    cs.CL cs.SI

    Stance Classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations

    Authors: Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik

    Abstract: Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest. Here we introduce a novel approach that makes use of the sequence of transitions observed in tree-structured conversation threads in Twitter. The conversation threads are formed by… ▽ More

    Submitted 11 October, 2016; v1 submitted 28 September, 2016; originally announced September 2016.

    Comments: COLING 2016

  50. arXiv:1609.01962  [pdf, other

    cs.CL cs.IR cs.SI

    Using Gaussian Processes for Rumour Stance Classification in Social Media

    Authors: Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Arkaitz Zubiaga, Maria Liakata, Rob Procter

    Abstract: Social media tend to be rife with rumours while new reports are released piecemeal during breaking news. Interestingly, one can mine multiple reactions expressed by social media users in those situations, exploring their stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. In this work, we set out to develop an automated, supervised classi… ▽ More

    Submitted 7 September, 2016; originally announced September 2016.