Zum Hauptinhalt springen

Showing 1–26 of 26 results for author: Purver, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.01866  [pdf, other

    cs.CL cs.LG

    Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

    Authors: Peyman Hosseini, Ignacio Castro, Iacopo Ghinassi, Matthew Purver

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass. However, this paper uncovers a surprising limitation: LLMs fall short when handling long input sequences. We investigate this issue using three datasets and two ta… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 11 pages, 5 figures, 6 tables

    ACM Class: I.2.7

  2. arXiv:2407.16804  [pdf

    cs.LG cs.AI cs.CY cs.ET

    Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges

    Authors: Zahraa Al Sahili, Ioannis Patras, Matthew Purver

    Abstract: The application of machine learning (ML) in detecting, diagnosing, and treating mental health disorders is garnering increasing attention. Traditionally, research has focused on single modalities, such as text from clinical notes, audio from speech samples, or video of interaction patterns. Recently, multimodal ML, which combines information from multiple modalities, has demonstrated significant p… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2406.09070  [pdf, other

    cs.LG cs.AI cs.CV

    EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrapping in Chain of Thoughts

    Authors: Zahraa Al Sahili, Ioannis Patras, Matthew Purver

    Abstract: In the domain of text-to-image generative models, the inadvertent propagation of biases inherent in training datasets poses significant ethical challenges, particularly in the generation of socially sensitive content. This paper introduces EquiPrompt, a novel method employing Chain of Thought (CoT) reasoning to reduce biases in text-to-image generative models. EquiPrompt uses iterative bootstrappi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2404.07036  [pdf, other

    cs.CL

    A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media

    Authors: Jaya Caporusso, Damar Hoogland, Mojca Brglez, Boshko Koloski, Matthew Purver, Senja Pollak

    Abstract: Dehumanisation involves the perception and or treatment of a social group's members as less than human. This phenomenon is rarely addressed with computational linguistic techniques. We adapt a recently proposed approach for English, making it easier to transfer to other languages and to evaluate, introducing a new sentiment resource, the use of zero-shot cross-lingual valence and arousal detection… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: The first authors have contributted equally. Accepted at LREC-COLING

  5. arXiv:2311.13442  [pdf, other

    cs.SI

    Temporal Network Analysis of Email Communication Patterns in a Long Standing Hierarchy

    Authors: Matthew Russell Barnes, Mladen Karan, Stephen McQuistin, Colin Perkins, Gareth Tyson, Matthew Purver, Ignacio Castro, Richard G. Clegg

    Abstract: An important concept in organisational behaviour is how hierarchy affects the voice of individuals, whereby members of a given organisation exhibit differing power relations based on their hierarchical position. Although there have been prior studies of the relationship between hierarchy and voice, they tend to focus on more qualitative small-scale methods and do not account for structural aspects… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  6. arXiv:2310.09897  [pdf, other

    cs.CL

    Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia

    Authors: Dimitris Gkoumas, Matthew Purver, Maria Liakata

    Abstract: Dementia is associated with language disorders which impede communication. Here, we automatically learn linguistic disorder patterns by making use of a moderately-sized pre-trained language model and forcing it to focus on reformulated natural language processing (NLP) tasks and associated linguistic patterns. Our experiments show that NLP tasks that encapsulate contextual information and enhance… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: It has been accepted to appear at EMNLP23

  7. Lon-ea at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction

    Authors: Peyman Hosseini, Mehran Hosseini, Sana Sabah Al-Azzawi, Marcus Liwicki, Ignacio Castro, Matthew Purver

    Abstract: We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output… ▽ More

    Submitted 3 January, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: Accepted in ACL 2023 SemEval Workshop as selected task paper

    ACM Class: I.2.7

  8. arXiv:2211.06053  [pdf, other

    cs.CL

    CoRAL: a Context-aware Croatian Abusive Language Dataset

    Authors: Ravi Shekhar, Mladen Karan, Matthew Purver

    Abstract: In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjec… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: Findings of the ACL: AACL-IJCNLP, 2022

  9. arXiv:2206.09680  [pdf, other

    cs.CL

    Misspelling Semantics In Thai

    Authors: Pakawat Nakwijit, Matthew Purver

    Abstract: User-generated content is full of misspellings. Rather than being just random noise, we hypothesise that many misspellings contain hidden semantics that can be leveraged for language understanding tasks. This paper presents a fine-grained annotated corpus of misspelling in Thai, together with an analysis of misspelling intention and its possible semantics to get a better understanding of the missp… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: To be published in LREC2022

  10. arXiv:2205.02054  [pdf, other

    cs.CL

    Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment

    Authors: Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Purver

    Abstract: In text-to-SQL tasks -- as in much of NLP -- compositional generalization is a major challenge: neural networks struggle with compositional generalization where training and test distributions differ. However, most recent attempts to improve this are based on word-level synthetic data or specific dataset splits to generate compositional biases. In this work, we propose a clause-level compositional… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: To appear in Findings of NAACL 2022

  11. arXiv:2109.10033  [pdf, other

    cs.CL

    Not All Comments are Equal: Insights into Comment Moderation from a Topic-Aware Model

    Authors: Elaine Zosa, Ravi Shekhar, Mladen Karan, Matthew Purver

    Abstract: Moderation of reader comments is a significant problem for online news platforms. Here, we experiment with models for automatic moderation, using a dataset of comments from a popular Croatian newspaper. Our analysis shows that while comments that violate the moderation rules mostly share common linguistic and thematic features, their content varies across the different sections of the newspaper. W… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: Accepted to RANLP 2021

  12. arXiv:2109.05157  [pdf, other

    cs.CL

    Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization

    Authors: Yujian Gan, Xinyun Chen, Matthew Purver

    Abstract: Recently, there has been significant progress in studying neural networks for translating text descriptions into SQL queries under the zero-shot cross-domain setting. Despite achieving good performance on some public benchmarks, we observe that existing text-to-SQL models do not generalize when facing domain knowledge that does not frequently appear in the training data, which may render the worse… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: To appear in EMNLP 2021

  13. arXiv:2109.05153  [pdf, other

    cs.CL

    Natural SQL: Making SQL Easier to Infer from Natural Language Specifications

    Authors: Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, John R. Woodward, John Drake, Qiaofu Zhang

    Abstract: Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: To appear in EMNLP Findings 2021

  14. arXiv:2109.01537  [pdf, other

    cs.CL cs.AI cs.DB cs.MM

    A Longitudinal Multi-modal Dataset for Dementia Monitoring and Diagnosis

    Authors: Dimitris Gkoumas, Bo Wang, Adam Tsakalidis, Maria Wolters, Arkaitz Zubiaga, Matthew Purver, Maria Liakata

    Abstract: Dementia affects cognitive functions of adults, including memory, language, and behaviour. Standard diagnostic biomarkers such as MRI are costly, whilst neuropsychological tests suffer from sensitivity issues in detecting dementia onset. The analysis of speech and language has emerged as a promising and non-intrusive technology to diagnose and monitor dementia. Currently, most work in this directi… ▽ More

    Submitted 23 December, 2023; v1 submitted 3 September, 2021; originally announced September 2021.

  15. arXiv:2107.10614  [pdf, ps, other

    cs.CL

    Evaluation of contextual embeddings on less-resourced languages

    Authors: Matej Ulčar, Aleš Žagar, Carlos S. Armendariz, Andraž Repar, Senja Pollak, Matthew Purver, Marko Robnik-Šikonja

    Abstract: The current dominance of deep neural networks in natural language processing is based on contextual embeddings such as ELMo, BERT, and BERT derivatives. Most existing work focuses on English; in contrast, we present here the first multilingual empirical comparison of two ELMo and several monolingual and multilingual BERT models using 14 tasks in nine languages. In monolingual settings, our analysi… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: 45 pages

  16. arXiv:2106.15684  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

    Authors: Morteza Rohanian, Julian Hough, Matthew Purver

    Abstract: We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: INTERSPEECH 2021. arXiv admin note: substantial text overlap with arXiv:2106.09668

  17. Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech

    Authors: Morteza Rohanian, Julian Hough, Matthew Purver

    Abstract: This paper is a submission to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge, which aims to develop methods that can assist in the automated prediction of severity of Alzheimer's Disease from speech data. We focus on acoustic and natural language features for cognitive impairment detection in spontaneous speech in the context of Alzheimer's Disease Diagnosis and… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Journal ref: Proc. Interspeech 2020, 2187-2191

  18. arXiv:2106.01065  [pdf, other

    cs.CL

    Towards Robustness of Text-to-SQL Models against Synonym Substitution

    Authors: Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Purver, John R. Woodward, Jinxia Xie, Pengsheng Huang

    Abstract: Recently, there has been significant progress in studying neural networks to translate text descriptions into SQL queries. Despite achieving good performance on some public benchmarks, existing text-to-SQL models typically rely on the lexical matching between words in natural language (NL) questions and tokens in table schemas, which may render the models vulnerable to attacks that break the schem… ▽ More

    Submitted 19 June, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: To appear in ACL 2021

  19. arXiv:2008.13121  [pdf, other

    cs.CL cs.IR cs.SI

    Temporal Mental Health Dynamics on Social Media

    Authors: Tom Tabak, Matthew Purver

    Abstract: We describe a set of experiments for building a temporal mental health dynamics system. We utilise a pre-existing methodology for distant-supervision of mental health data mining from social media platforms and deploy the system during the global COVID-19 pandemic as a case study. Despite the challenging nature of the task, we produce encouraging results, both explicit to the global pandemic and i… ▽ More

    Submitted 2 September, 2020; v1 submitted 30 August, 2020; originally announced August 2020.

    ACM Class: I.2.7

  20. arXiv:2004.00881  [pdf, other

    cs.CL

    How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context

    Authors: Jey Han Lau, Carlos S. Armendariz, Shalom Lappin, Matthew Purver, Chang Shu

    Abstract: We study the influence of context on sentence acceptability. First we compare the acceptability ratings of sentences judged in isolation, with a relevant context, and with an irrelevant context. Our results show that context induces a cognitive load for humans, which compresses the distribution of ratings. Moreover, in relevant contexts we observe a discourse coherence effect which uniformly raise… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Comments: 14 pages. Author's final version, accepted for publication in Transactions of the Association for Computational Linguistics

    ACM Class: I.2.7

  21. arXiv:1912.05320  [pdf, other

    cs.CL

    CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

    Authors: Carlos Santos Armendariz, Matthew Purver, Matej Ulčar, Senja Pollak, Nikola Ljubešić, Marko Robnik-Šikonja, Mark Granroth-Wilding, Kristiina Vaik

    Abstract: State of the art natural language processing tools are built on context-dependent word embeddings, but no direct method for evaluating these representations currently exists. Standard tasks and datasets for intrinsic evaluation of embeddings are based on judgements of similarity, but ignore context; standard tasks for word sense disambiguation take account of context but do not provide continuous… ▽ More

    Submitted 29 October, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    ACM Class: I.2.7

    Journal ref: Proceedings of the 12th Language Resources and Evaluation Conference (2020) 5878-5886

  22. arXiv:1811.00614  [pdf, ps, other

    cs.CL cs.AI

    Exploring Semantic Incrementality with Dynamic Syntax and Vector Space Semantics

    Authors: Mehrnoosh Sadrzadeh, Matthew Purver, Julian Hough, Ruth Kempson

    Abstract: One of the fundamental requirements for models of semantic processing in dialogue is incrementality: a model must reflect how people interpret and generate language at least on a word-by-word basis, and handle phenomena such as fragments, incomplete and jointly-produced utterances. We show that the incremental word-by-word parsing process of Dynamic Syntax (DS) can be assigned a compositional dist… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: accepted in SemDial 2018: https://semdial.hypotheses.org/program/accepted-papers

    MSC Class: 03B65 ACM Class: I.2.7

  23. Words, Concepts, and the Geometry of Analogy

    Authors: Stephen McGregor, Matthew Purver, Geraint Wiggins

    Abstract: This paper presents a geometric approach to the problem of modelling the relationship between words and concepts, focusing in particular on analogical phenomena in language and cognition. Grounded in recent theories regarding geometric conceptual spaces, we begin with an analysis of existing static distributional semantic models and move on to an exploration of a dynamic approach to using high di… ▽ More

    Submitted 3 August, 2016; originally announced August 2016.

    Comments: In Proceedings SLPCS 2016, arXiv:1608.01018

    Journal ref: EPTCS 221, 2016, pp. 39-48

  24. arXiv:1408.6788  [pdf, ps, other

    cs.CL

    Strongly Incremental Repair Detection

    Authors: Julian Hough, Matthew Purver

    Abstract: We present STIR (STrongly Incremental Repair detection), a system that detects speech repairs and edit terms on transcripts incrementally with minimal latency. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the different stages of repairs. Results on the Switchboard disfluency tagged corpus show utterance-final… ▽ More

    Submitted 29 August, 2014; v1 submitted 28 August, 2014; originally announced August 2014.

    Comments: 12 pages, 6 figures, EMNLP conference long paper 2014

  25. arXiv:1408.6179  [pdf, ps, other

    cs.CL

    Evaluating Neural Word Representations in Tensor-Based Compositional Settings

    Authors: Dmitrijs Milajevs, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Matthew Purver

    Abstract: We provide a comparative study between neural word representations and traditional vector spaces based on co-occurrence counts, in a number of compositional tasks. We use three different semantic spaces and implement seven tensor-based compositional models, which we then test (together with simpler additive and multiplicative approaches) in tasks involving verb disambiguation and sentence similari… ▽ More

    Submitted 26 August, 2014; originally announced August 2014.

    Comments: To be published in EMNLP 2014

  26. arXiv:1312.6635  [pdf, other

    cs.SI physics.soc-ph

    Topic and Sentiment Analysis on OSNs: a Case Study of Advertising Strategies on Twitter

    Authors: Shana Dacres, Hamed Haddadi, Matthew Purver

    Abstract: Social media have substantially altered the way brands and businesses advertise: Online Social Networks provide brands with more versatile and dynamic channels for advertisement than traditional media (e.g., TV and radio). Levels of engagement in such media are usually measured in terms of content adoption (e.g., likes and retweets) and sentiment, around a given topic. However, sentiment analysis… ▽ More

    Submitted 23 December, 2013; originally announced December 2013.

    ACM Class: H.3.1; I.2.7