Search | arXiv e-print repository

arXiv:2408.01866 [pdf, other]

Efficient Solutions For An Intriguing Failure of LLMs: Long Context Window Does Not Mean LLMs Can Analyze Long Sequences Flawlessly

Authors: Peyman Hosseini, Ignacio Castro, Iacopo Ghinassi, Matthew Purver

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass. However, this paper uncovers a surprising limitation: LLMs fall short when handling long input sequences. We investigate this issue using three datasets and two ta… ▽ More Large Language Models (LLMs) have demonstrated remarkable capabilities in comprehending and analyzing lengthy sequential inputs, owing to their extensive context windows that allow processing millions of tokens in a single forward pass. However, this paper uncovers a surprising limitation: LLMs fall short when handling long input sequences. We investigate this issue using three datasets and two tasks (sentiment analysis and news categorization) across various LLMs, including Claude 3, Gemini Pro, GPT 3.5 Turbo, Llama 3 Instruct, and Mistral Instruct models. To address this limitation, we propose and evaluate ad-hoc solutions that substantially enhance LLMs' performance on long input sequences by up to 50%, while reducing API cost and latency by up to 93% and 50%, respectively. △ Less

Submitted 3 August, 2024; originally announced August 2024.

Comments: 11 pages, 5 figures, 6 tables

ACM Class: I.2.7

arXiv:2407.16804 [pdf]

Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges

Authors: Zahraa Al Sahili, Ioannis Patras, Matthew Purver

Abstract: The application of machine learning (ML) in detecting, diagnosing, and treating mental health disorders is garnering increasing attention. Traditionally, research has focused on single modalities, such as text from clinical notes, audio from speech samples, or video of interaction patterns. Recently, multimodal ML, which combines information from multiple modalities, has demonstrated significant p… ▽ More The application of machine learning (ML) in detecting, diagnosing, and treating mental health disorders is garnering increasing attention. Traditionally, research has focused on single modalities, such as text from clinical notes, audio from speech samples, or video of interaction patterns. Recently, multimodal ML, which combines information from multiple modalities, has demonstrated significant promise in offering novel insights into human behavior patterns and recognizing mental health symptoms and risk factors. Despite its potential, multimodal ML in mental health remains an emerging field, facing several complex challenges before practical applications can be effectively developed. This survey provides a comprehensive overview of the data availability and current state-of-the-art multimodal ML applications for mental health. It discusses key challenges that must be addressed to advance the field. The insights from this survey aim to deepen the understanding of the potential and limitations of multimodal ML in mental health, guiding future research and development in this evolving domain. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2406.09070 [pdf, other]

EquiPrompt: Debiasing Diffusion Models via Iterative Bootstrapping in Chain of Thoughts

Authors: Zahraa Al Sahili, Ioannis Patras, Matthew Purver

Abstract: In the domain of text-to-image generative models, the inadvertent propagation of biases inherent in training datasets poses significant ethical challenges, particularly in the generation of socially sensitive content. This paper introduces EquiPrompt, a novel method employing Chain of Thought (CoT) reasoning to reduce biases in text-to-image generative models. EquiPrompt uses iterative bootstrappi… ▽ More In the domain of text-to-image generative models, the inadvertent propagation of biases inherent in training datasets poses significant ethical challenges, particularly in the generation of socially sensitive content. This paper introduces EquiPrompt, a novel method employing Chain of Thought (CoT) reasoning to reduce biases in text-to-image generative models. EquiPrompt uses iterative bootstrapping and bias-aware exemplar selection to balance creativity and ethical responsibility. It integrates iterative reasoning refinement with controlled evaluation techniques, addressing zero-shot CoT issues in sensitive contexts. Experiments on several generation tasks show EquiPrompt effectively lowers bias while maintaining generative quality, advancing ethical AI and socially responsible creative processes.Code will be publically available. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2404.07036 [pdf, other]

A Computational Analysis of the Dehumanisation of Migrants from Syria and Ukraine in Slovene News Media

Authors: Jaya Caporusso, Damar Hoogland, Mojca Brglez, Boshko Koloski, Matthew Purver, Senja Pollak

Abstract: Dehumanisation involves the perception and or treatment of a social group's members as less than human. This phenomenon is rarely addressed with computational linguistic techniques. We adapt a recently proposed approach for English, making it easier to transfer to other languages and to evaluate, introducing a new sentiment resource, the use of zero-shot cross-lingual valence and arousal detection… ▽ More Dehumanisation involves the perception and or treatment of a social group's members as less than human. This phenomenon is rarely addressed with computational linguistic techniques. We adapt a recently proposed approach for English, making it easier to transfer to other languages and to evaluate, introducing a new sentiment resource, the use of zero-shot cross-lingual valence and arousal detection, and a new method for statistical significance testing. We then apply it to study attitudes to migration expressed in Slovene newspapers, to examine changes in the Slovene discourse on migration between the 2015-16 migration crisis following the war in Syria and the 2022-23 period following the war in Ukraine. We find that while this discourse became more negative and more intense over time, it is less dehumanising when specifically addressing Ukrainian migrants compared to others. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: The first authors have contributted equally. Accepted at LREC-COLING

arXiv:2311.13442 [pdf, other]

Temporal Network Analysis of Email Communication Patterns in a Long Standing Hierarchy

Authors: Matthew Russell Barnes, Mladen Karan, Stephen McQuistin, Colin Perkins, Gareth Tyson, Matthew Purver, Ignacio Castro, Richard G. Clegg

Abstract: An important concept in organisational behaviour is how hierarchy affects the voice of individuals, whereby members of a given organisation exhibit differing power relations based on their hierarchical position. Although there have been prior studies of the relationship between hierarchy and voice, they tend to focus on more qualitative small-scale methods and do not account for structural aspects… ▽ More An important concept in organisational behaviour is how hierarchy affects the voice of individuals, whereby members of a given organisation exhibit differing power relations based on their hierarchical position. Although there have been prior studies of the relationship between hierarchy and voice, they tend to focus on more qualitative small-scale methods and do not account for structural aspects of the organisation. This paper develops large-scale computational techniques utilising temporal network analysis to measure the effect that organisational hierarchy has on communication patterns within an organisation, focusing on the structure of pairwise interactions between individuals. We focus on one major organisation as a case study - the Internet Engineering Task Force (IETF) - a major technical standards development organisation for the Internet. A particularly useful feature of the IETF is a transparent hierarchy, where participants take on explicit roles (e.g. Area Directors, Working Group Chairs). Its processes are also open, so we have visibility into the communication of people at different hierarchy levels over a long time period. We utilise a temporal network dataset of 989,911 email interactions among 23,741 participants to study how hierarchy impacts communication patterns. We show that the middle levels of the IETF are growing in terms of their dominance in communications. Higher levels consistently experience a higher proportion of incoming communication than lower levels, with higher levels initiating more communications too. We find that communication tends to flow "up" the hierarchy more than "down". Finally, we find that communication with higher-levels is associated with future communication more than for lower-levels, which we interpret as "facilitation". We conclude by discussing the implications this has on patterns within the wider IETF and for other organisations. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2310.09897 [pdf, other]

Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia

Authors: Dimitris Gkoumas, Matthew Purver, Maria Liakata

Abstract: Dementia is associated with language disorders which impede communication. Here, we automatically learn linguistic disorder patterns by making use of a moderately-sized pre-trained language model and forcing it to focus on reformulated natural language processing (NLP) tasks and associated linguistic patterns. Our experiments show that NLP tasks that encapsulate contextual information and enhance… ▽ More Dementia is associated with language disorders which impede communication. Here, we automatically learn linguistic disorder patterns by making use of a moderately-sized pre-trained language model and forcing it to focus on reformulated natural language processing (NLP) tasks and associated linguistic patterns. Our experiments show that NLP tasks that encapsulate contextual information and enhance the gradient signal with linguistic patterns benefit performance. We then use the probability estimates from the best model to construct digital linguistic markers measuring the overall quality in communication and the intensity of a variety of language disorders. We investigate how the digital markers characterize dementia speech from a longitudinal perspective. We find that our proposed communication marker is able to robustly and reliably characterize the language of people with dementia, outperforming existing linguistic approaches; and shows external validity via significant correlation with clinical markers of behaviour. Finally, our proposed linguistic disorder markers provide useful insights into gradual language impairment associated with disease progression. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: It has been accepted to appear at EMNLP23

arXiv:2303.02468 [pdf, other]

doi 10.18653/v1/2023.semeval-1.185

Lon-ea at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction

Authors: Peyman Hosseini, Mehran Hosseini, Sana Sabah Al-Azzawi, Marcus Liwicki, Ignacio Castro, Matthew Purver

Abstract: We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output… ▽ More We study the influence of different activation functions in the output layer of deep neural network models for soft and hard label prediction in the learning with disagreement task. In this task, the goal is to quantify the amount of disagreement via predicting soft labels. To predict the soft labels, we use BERT-based preprocessors and encoders and vary the activation function used in the output layer, while keeping other parameters constant. The soft labels are then used for the hard label prediction. The activation functions considered are sigmoid as well as a step-function that is added to the model post-training and a sinusoidal activation function, which is introduced for the first time in this paper. △ Less

Submitted 3 January, 2024; v1 submitted 4 March, 2023; originally announced March 2023.

Comments: Accepted in ACL 2023 SemEval Workshop as selected task paper

ACM Class: I.2.7

arXiv:2211.06053 [pdf, other]

CoRAL: a Context-aware Croatian Abusive Language Dataset

Authors: Ravi Shekhar, Mladen Karan, Matthew Purver

Abstract: In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjec… ▽ More In light of unprecedented increases in the popularity of the internet and social media, comment moderation has never been a more relevant task. Semi-automated comment moderation systems greatly aid human moderators by either automatically classifying the examples or allowing the moderators to prioritize which comments to consider first. However, the concept of inappropriate content is often subjective, and such content can be conveyed in many subtle and indirect ways. In this work, we propose CoRAL -- a language and culturally aware Croatian Abusive dataset covering phenomena of implicitness and reliance on local and global context. We show experimentally that current models degrade when comments are not explicit and further degrade when language skill and context knowledge are required to interpret the comment. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: Findings of the ACL: AACL-IJCNLP, 2022

arXiv:2206.09680 [pdf, other]

Misspelling Semantics In Thai

Authors: Pakawat Nakwijit, Matthew Purver

Abstract: User-generated content is full of misspellings. Rather than being just random noise, we hypothesise that many misspellings contain hidden semantics that can be leveraged for language understanding tasks. This paper presents a fine-grained annotated corpus of misspelling in Thai, together with an analysis of misspelling intention and its possible semantics to get a better understanding of the missp… ▽ More User-generated content is full of misspellings. Rather than being just random noise, we hypothesise that many misspellings contain hidden semantics that can be leveraged for language understanding tasks. This paper presents a fine-grained annotated corpus of misspelling in Thai, together with an analysis of misspelling intention and its possible semantics to get a better understanding of the misspelling patterns observed in the corpus. In addition, we introduce two approaches to incorporate the semantics of misspelling: Misspelling Average Embedding (MAE) and Misspelling Semantic Tokens (MST). Experiments on a sentiment analysis task confirm our overall hypothesis: additional semantics from misspelling can boost the micro F1 score up to 0.4-2%, while blindly normalising misspelling is harmful and suboptimal. △ Less

Submitted 20 June, 2022; originally announced June 2022.

Comments: To be published in LREC2022

arXiv:2205.02054 [pdf, other]

Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment

Authors: Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Purver

Abstract: In text-to-SQL tasks -- as in much of NLP -- compositional generalization is a major challenge: neural networks struggle with compositional generalization where training and test distributions differ. However, most recent attempts to improve this are based on word-level synthetic data or specific dataset splits to generate compositional biases. In this work, we propose a clause-level compositional… ▽ More In text-to-SQL tasks -- as in much of NLP -- compositional generalization is a major challenge: neural networks struggle with compositional generalization where training and test distributions differ. However, most recent attempts to improve this are based on word-level synthetic data or specific dataset splits to generate compositional biases. In this work, we propose a clause-level compositional example generation method. We first split the sentences in the Spider text-to-SQL dataset into sub-sentences, annotating each sub-sentence with its corresponding SQL clause, resulting in a new dataset Spider-SS. We then construct a further dataset, Spider-CG, by composing Spider-SS sub-sentences in different combinations, to test the ability of models to generalize compositionally. Experiments show that existing models suffer significant performance degradation when evaluated on Spider-CG, even though every sub-sentence is seen during training. To deal with this problem, we modify a number of state-of-the-art models to train on the segmented data of Spider-SS, and we show that this method improves the generalization performance. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: To appear in Findings of NAACL 2022

arXiv:2109.10033 [pdf, other]

Not All Comments are Equal: Insights into Comment Moderation from a Topic-Aware Model

Authors: Elaine Zosa, Ravi Shekhar, Mladen Karan, Matthew Purver

Abstract: Moderation of reader comments is a significant problem for online news platforms. Here, we experiment with models for automatic moderation, using a dataset of comments from a popular Croatian newspaper. Our analysis shows that while comments that violate the moderation rules mostly share common linguistic and thematic features, their content varies across the different sections of the newspaper. W… ▽ More Moderation of reader comments is a significant problem for online news platforms. Here, we experiment with models for automatic moderation, using a dataset of comments from a popular Croatian newspaper. Our analysis shows that while comments that violate the moderation rules mostly share common linguistic and thematic features, their content varies across the different sections of the newspaper. We therefore make our models topic-aware, incorporating semantic features from a topic model into the classification decision. Our results show that topic information improves the performance of the model, increases its confidence in correct outputs, and helps us understand the model's outputs. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: Accepted to RANLP 2021

arXiv:2109.05157 [pdf, other]

Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization

Authors: Yujian Gan, Xinyun Chen, Matthew Purver

Abstract: Recently, there has been significant progress in studying neural networks for translating text descriptions into SQL queries under the zero-shot cross-domain setting. Despite achieving good performance on some public benchmarks, we observe that existing text-to-SQL models do not generalize when facing domain knowledge that does not frequently appear in the training data, which may render the worse… ▽ More Recently, there has been significant progress in studying neural networks for translating text descriptions into SQL queries under the zero-shot cross-domain setting. Despite achieving good performance on some public benchmarks, we observe that existing text-to-SQL models do not generalize when facing domain knowledge that does not frequently appear in the training data, which may render the worse prediction performance for unseen domains. In this work, we investigate the robustness of text-to-SQL models when the questions require rarely observed domain knowledge. In particular, we define five types of domain knowledge and introduce Spider-DK (DK is the abbreviation of domain knowledge), a human-curated dataset based on the Spider benchmark for text-to-SQL translation. NL questions in Spider-DK are selected from Spider, and we modify some samples by adding domain knowledge that reflects real-world question paraphrases. We demonstrate that the prediction accuracy dramatically drops on samples that require such domain knowledge, even if the domain knowledge appears in the training set, and the model provides the correct predictions for related training samples. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: To appear in EMNLP 2021

arXiv:2109.05153 [pdf, other]

Natural SQL: Making SQL Easier to Infer from Natural Language Specifications

Authors: Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, John R. Woodward, John Drake, Qiaofu Zhang

Abstract: Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such… ▽ More Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specifically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are usually hard to find counterparts for in the text descriptions; (2) removing the need for nested subqueries and set operators; and (3) making schema linking easier by reducing the required number of schema items. On Spider, a challenging text-to-SQL benchmark that contains complex and nested SQL queries, we demonstrate that NatSQL outperforms other IRs, and significantly improves the performance of several previous SOTA models. Furthermore, for existing models that do not support executable SQL generation, NatSQL easily enables them to generate executable SQL queries, and achieves the new state-of-the-art execution accuracy. △ Less

Submitted 10 September, 2021; originally announced September 2021.

Comments: To appear in EMNLP Findings 2021

arXiv:2109.01537 [pdf, other]

A Longitudinal Multi-modal Dataset for Dementia Monitoring and Diagnosis

Authors: Dimitris Gkoumas, Bo Wang, Adam Tsakalidis, Maria Wolters, Arkaitz Zubiaga, Matthew Purver, Maria Liakata

Abstract: Dementia affects cognitive functions of adults, including memory, language, and behaviour. Standard diagnostic biomarkers such as MRI are costly, whilst neuropsychological tests suffer from sensitivity issues in detecting dementia onset. The analysis of speech and language has emerged as a promising and non-intrusive technology to diagnose and monitor dementia. Currently, most work in this directi… ▽ More Dementia affects cognitive functions of adults, including memory, language, and behaviour. Standard diagnostic biomarkers such as MRI are costly, whilst neuropsychological tests suffer from sensitivity issues in detecting dementia onset. The analysis of speech and language has emerged as a promising and non-intrusive technology to diagnose and monitor dementia. Currently, most work in this direction ignores the multi-modal nature of human communication and interactive aspects of everyday conversational interaction. Moreover, most studies ignore changes in cognitive status over time due to the lack of consistent longitudinal data. Here we introduce a novel fine-grained longitudinal multi-modal corpus collected in a natural setting from healthy controls and people with dementia over two phases, each spanning 28 sessions. The corpus consists of spoken conversations, a subset of which are transcribed, as well as typed and written thoughts and associated extra-linguistic information such as pen strokes and keystrokes. We present the data collection process and describe the corpus in detail. Furthermore, we establish baselines for capturing longitudinal changes in language across different modalities for two cohorts, healthy controls and people with dementia, outlining future research directions enabled by the corpus. △ Less

Submitted 23 December, 2023; v1 submitted 3 September, 2021; originally announced September 2021.

arXiv:2107.10614 [pdf, ps, other]

Evaluation of contextual embeddings on less-resourced languages

Authors: Matej Ulčar, Aleš Žagar, Carlos S. Armendariz, Andraž Repar, Senja Pollak, Matthew Purver, Marko Robnik-Šikonja

Abstract: The current dominance of deep neural networks in natural language processing is based on contextual embeddings such as ELMo, BERT, and BERT derivatives. Most existing work focuses on English; in contrast, we present here the first multilingual empirical comparison of two ELMo and several monolingual and multilingual BERT models using 14 tasks in nine languages. In monolingual settings, our analysi… ▽ More The current dominance of deep neural networks in natural language processing is based on contextual embeddings such as ELMo, BERT, and BERT derivatives. Most existing work focuses on English; in contrast, we present here the first multilingual empirical comparison of two ELMo and several monolingual and multilingual BERT models using 14 tasks in nine languages. In monolingual settings, our analysis shows that monolingual BERT models generally dominate, with a few exceptions such as the dependency parsing task, where they are not competitive with ELMo models trained on large corpora. In cross-lingual settings, BERT models trained on only a few languages mostly do best, closely followed by massively multilingual BERT models. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 45 pages

arXiv:2106.15684 [pdf, ps, other]

Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs

Authors: Morteza Rohanian, Julian Hough, Matthew Purver

Abstract: We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and… ▽ More We present two multimodal fusion-based deep learning models that consume ASR transcribed speech and acoustic data simultaneously to classify whether a speaker in a structured diagnostic task has Alzheimer's Disease and to what degree, evaluating the ADReSSo challenge 2021 data. Our best model, a BiLSTM with highway layers using words, word probabilities, disfluency features, pause information, and a variety of acoustic features, achieves an accuracy of 84% and RSME error prediction of 4.26 on MMSE cognitive scores. While predicting cognitive decline is more challenging, our models show improvement using the multimodal approach and word probabilities, disfluency and pause information over word-only models. We show considerable gains for AD classification using multimodal fusion and gating, which can effectively deal with noisy inputs from acoustic features and ASR hypotheses. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: INTERSPEECH 2021. arXiv admin note: substantial text overlap with arXiv:2106.09668

arXiv:2106.09668 [pdf, ps, other]

doi 10.21437/Interspeech.2020-2721

Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech

Authors: Morteza Rohanian, Julian Hough, Matthew Purver

Abstract: This paper is a submission to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge, which aims to develop methods that can assist in the automated prediction of severity of Alzheimer's Disease from speech data. We focus on acoustic and natural language features for cognitive impairment detection in spontaneous speech in the context of Alzheimer's Disease Diagnosis and… ▽ More This paper is a submission to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge, which aims to develop methods that can assist in the automated prediction of severity of Alzheimer's Disease from speech data. We focus on acoustic and natural language features for cognitive impairment detection in spontaneous speech in the context of Alzheimer's Disease Diagnosis and the mini-mental state examination (MMSE) score prediction. We proposed a model that obtains unimodal decisions from different LSTMs, one for each modality of text and audio, and then combines them using a gating mechanism for the final prediction. We focused on sequential modelling of text and audio and investigated whether the disfluencies present in individuals' speech relate to the extent of their cognitive impairment. Our results show that the proposed classification and regression schemes obtain very promising results on both development and test sets. This suggests Alzheimer's Disease can be detected successfully with sequence modeling of the speech data of medical sessions. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Journal ref: Proc. Interspeech 2020, 2187-2191

arXiv:2106.01065 [pdf, other]

Towards Robustness of Text-to-SQL Models against Synonym Substitution

Authors: Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Purver, John R. Woodward, Jinxia Xie, Pengsheng Huang

Abstract: Recently, there has been significant progress in studying neural networks to translate text descriptions into SQL queries. Despite achieving good performance on some public benchmarks, existing text-to-SQL models typically rely on the lexical matching between words in natural language (NL) questions and tokens in table schemas, which may render the models vulnerable to attacks that break the schem… ▽ More Recently, there has been significant progress in studying neural networks to translate text descriptions into SQL queries. Despite achieving good performance on some public benchmarks, existing text-to-SQL models typically rely on the lexical matching between words in natural language (NL) questions and tokens in table schemas, which may render the models vulnerable to attacks that break the schema linking mechanism. In this work, we investigate the robustness of text-to-SQL models to synonym substitution. In particular, we introduce Spider-Syn, a human-curated dataset based on the Spider benchmark for text-to-SQL translation. NL questions in Spider-Syn are modified from Spider, by replacing their schema-related words with manually selected synonyms that reflect real-world question paraphrases. We observe that the accuracy dramatically drops by eliminating such explicit correspondence between NL questions and table schemas, even if the synonyms are not adversarially selected to conduct worst-case adversarial attacks. Finally, we present two categories of approaches to improve the model robustness. The first category of approaches utilizes additional synonym annotations for table schemas by modifying the model input, while the second category is based on adversarial training. We demonstrate that both categories of approaches significantly outperform their counterparts without the defense, and the first category of approaches are more effective. △ Less

Submitted 19 June, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: To appear in ACL 2021

arXiv:2008.13121 [pdf, other]

Temporal Mental Health Dynamics on Social Media

Authors: Tom Tabak, Matthew Purver

Abstract: We describe a set of experiments for building a temporal mental health dynamics system. We utilise a pre-existing methodology for distant-supervision of mental health data mining from social media platforms and deploy the system during the global COVID-19 pandemic as a case study. Despite the challenging nature of the task, we produce encouraging results, both explicit to the global pandemic and i… ▽ More We describe a set of experiments for building a temporal mental health dynamics system. We utilise a pre-existing methodology for distant-supervision of mental health data mining from social media platforms and deploy the system during the global COVID-19 pandemic as a case study. Despite the challenging nature of the task, we produce encouraging results, both explicit to the global pandemic and implicit to a global phenomenon, Christmas Depression, supported by the literature. We propose a methodology for providing insight into temporal mental health dynamics to be utilised for strategic decision-making. △ Less

Submitted 2 September, 2020; v1 submitted 30 August, 2020; originally announced August 2020.

ACM Class: I.2.7

arXiv:2004.00881 [pdf, other]

How Furiously Can Colourless Green Ideas Sleep? Sentence Acceptability in Context

Authors: Jey Han Lau, Carlos S. Armendariz, Shalom Lappin, Matthew Purver, Chang Shu

Abstract: We study the influence of context on sentence acceptability. First we compare the acceptability ratings of sentences judged in isolation, with a relevant context, and with an irrelevant context. Our results show that context induces a cognitive load for humans, which compresses the distribution of ratings. Moreover, in relevant contexts we observe a discourse coherence effect which uniformly raise… ▽ More We study the influence of context on sentence acceptability. First we compare the acceptability ratings of sentences judged in isolation, with a relevant context, and with an irrelevant context. Our results show that context induces a cognitive load for humans, which compresses the distribution of ratings. Moreover, in relevant contexts we observe a discourse coherence effect which uniformly raises acceptability. Next, we test unidirectional and bidirectional language models in their ability to predict acceptability ratings. The bidirectional models show very promising results, with the best model achieving a new state-of-the-art for unsupervised acceptability prediction. The two sets of experiments provide insights into the cognitive aspects of sentence processing and central issues in the computational modelling of text and discourse. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: 14 pages. Author's final version, accepted for publication in Transactions of the Association for Computational Linguistics

ACM Class: I.2.7

arXiv:1912.05320 [pdf, other]

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context

Authors: Carlos Santos Armendariz, Matthew Purver, Matej Ulčar, Senja Pollak, Nikola Ljubešić, Marko Robnik-Šikonja, Mark Granroth-Wilding, Kristiina Vaik

Abstract: State of the art natural language processing tools are built on context-dependent word embeddings, but no direct method for evaluating these representations currently exists. Standard tasks and datasets for intrinsic evaluation of embeddings are based on judgements of similarity, but ignore context; standard tasks for word sense disambiguation take account of context but do not provide continuous… ▽ More State of the art natural language processing tools are built on context-dependent word embeddings, but no direct method for evaluating these representations currently exists. Standard tasks and datasets for intrinsic evaluation of embeddings are based on judgements of similarity, but ignore context; standard tasks for word sense disambiguation take account of context but do not provide continuous measures of meaning similarity. This paper describes an effort to build a new dataset, CoSimLex, intended to fill this gap. Building on the standard pairwise similarity task of SimLex-999, it provides context-dependent similarity measures; covers not only discrete differences in word sense but more subtle, graded changes in meaning; and covers not only a well-resourced language (English) but a number of less-resourced languages. We define the task and evaluation metrics, outline the dataset collection methodology, and describe the status of the dataset so far. △ Less

Submitted 29 October, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

ACM Class: I.2.7

Journal ref: Proceedings of the 12th Language Resources and Evaluation Conference (2020) 5878-5886

arXiv:1811.00614 [pdf, ps, other]

Exploring Semantic Incrementality with Dynamic Syntax and Vector Space Semantics

Authors: Mehrnoosh Sadrzadeh, Matthew Purver, Julian Hough, Ruth Kempson

Abstract: One of the fundamental requirements for models of semantic processing in dialogue is incrementality: a model must reflect how people interpret and generate language at least on a word-by-word basis, and handle phenomena such as fragments, incomplete and jointly-produced utterances. We show that the incremental word-by-word parsing process of Dynamic Syntax (DS) can be assigned a compositional dist… ▽ More One of the fundamental requirements for models of semantic processing in dialogue is incrementality: a model must reflect how people interpret and generate language at least on a word-by-word basis, and handle phenomena such as fragments, incomplete and jointly-produced utterances. We show that the incremental word-by-word parsing process of Dynamic Syntax (DS) can be assigned a compositional distributional semantics, with the composition operator of DS corresponding to the general operation of tensor contraction from multilinear algebra. We provide abstract semantic decorations for the nodes of DS trees, in terms of vectors, tensors, and sums thereof; using the latter to model the underspecified elements crucial to assigning partial representations during incremental processing. As a working example, we give an instantiation of this theory using plausibility tensors of compositional distributional semantics, and show how our framework can incrementally assign a semantic plausibility measure as it parses phrases and sentences. △ Less

Submitted 1 November, 2018; originally announced November 2018.

Comments: accepted in SemDial 2018: https://semdial.hypotheses.org/program/accepted-papers

MSC Class: 03B65 ACM Class: I.2.7

arXiv:1608.01403 [pdf, ps, other]

doi 10.4204/EPTCS.221.5

Words, Concepts, and the Geometry of Analogy

Authors: Stephen McGregor, Matthew Purver, Geraint Wiggins

Abstract: This paper presents a geometric approach to the problem of modelling the relationship between words and concepts, focusing in particular on analogical phenomena in language and cognition. Grounded in recent theories regarding geometric conceptual spaces, we begin with an analysis of existing static distributional semantic models and move on to an exploration of a dynamic approach to using high di… ▽ More This paper presents a geometric approach to the problem of modelling the relationship between words and concepts, focusing in particular on analogical phenomena in language and cognition. Grounded in recent theories regarding geometric conceptual spaces, we begin with an analysis of existing static distributional semantic models and move on to an exploration of a dynamic approach to using high dimensional spaces of word meaning to project subspaces where analogies can potentially be solved in an online, contextualised way. The crucial element of this analysis is the positioning of statistics in a geometric environment replete with opportunities for interpretation. △ Less

Submitted 3 August, 2016; originally announced August 2016.

Comments: In Proceedings SLPCS 2016, arXiv:1608.01018

Journal ref: EPTCS 221, 2016, pp. 39-48

arXiv:1408.6788 [pdf, ps, other]

Strongly Incremental Repair Detection

Authors: Julian Hough, Matthew Purver

Abstract: We present STIR (STrongly Incremental Repair detection), a system that detects speech repairs and edit terms on transcripts incrementally with minimal latency. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the different stages of repairs. Results on the Switchboard disfluency tagged corpus show utterance-final… ▽ More We present STIR (STrongly Incremental Repair detection), a system that detects speech repairs and edit terms on transcripts incrementally with minimal latency. STIR uses information-theoretic measures from n-gram models as its principal decision features in a pipeline of classifiers detecting the different stages of repairs. Results on the Switchboard disfluency tagged corpus show utterance-final accuracy on a par with state-of-the-art incremental repair detection methods, but with better incremental accuracy, faster time-to-detection and less computational overhead. We evaluate its performance using incremental metrics and propose new repair processing evaluation standards. △ Less

Submitted 29 August, 2014; v1 submitted 28 August, 2014; originally announced August 2014.

Comments: 12 pages, 6 figures, EMNLP conference long paper 2014

arXiv:1408.6179 [pdf, ps, other]

Evaluating Neural Word Representations in Tensor-Based Compositional Settings

Authors: Dmitrijs Milajevs, Dimitri Kartsaklis, Mehrnoosh Sadrzadeh, Matthew Purver

Abstract: We provide a comparative study between neural word representations and traditional vector spaces based on co-occurrence counts, in a number of compositional tasks. We use three different semantic spaces and implement seven tensor-based compositional models, which we then test (together with simpler additive and multiplicative approaches) in tasks involving verb disambiguation and sentence similari… ▽ More We provide a comparative study between neural word representations and traditional vector spaces based on co-occurrence counts, in a number of compositional tasks. We use three different semantic spaces and implement seven tensor-based compositional models, which we then test (together with simpler additive and multiplicative approaches) in tasks involving verb disambiguation and sentence similarity. To check their scalability, we additionally evaluate the spaces using simple compositional methods on larger-scale tasks with less constrained language: paraphrase detection and dialogue act tagging. In the more constrained tasks, co-occurrence vectors are competitive, although choice of compositional method is important; on the larger-scale tasks, they are outperformed by neural word embeddings, which show robust, stable performance across the tasks. △ Less

Submitted 26 August, 2014; originally announced August 2014.

Comments: To be published in EMNLP 2014

arXiv:1312.6635 [pdf, other]

Topic and Sentiment Analysis on OSNs: a Case Study of Advertising Strategies on Twitter

Authors: Shana Dacres, Hamed Haddadi, Matthew Purver

Abstract: Social media have substantially altered the way brands and businesses advertise: Online Social Networks provide brands with more versatile and dynamic channels for advertisement than traditional media (e.g., TV and radio). Levels of engagement in such media are usually measured in terms of content adoption (e.g., likes and retweets) and sentiment, around a given topic. However, sentiment analysis… ▽ More Social media have substantially altered the way brands and businesses advertise: Online Social Networks provide brands with more versatile and dynamic channels for advertisement than traditional media (e.g., TV and radio). Levels of engagement in such media are usually measured in terms of content adoption (e.g., likes and retweets) and sentiment, around a given topic. However, sentiment analysis and topic identification are both non-trivial tasks. In this paper, using data collected from Twitter as a case study, we analyze how engagement and sentiment in promoted content spread over a 10-day period. We find that promoted tweets lead to higher positive sentiment than promoted trends; although promoted trends pay off in response volume. We observe that levels of engagement for the brand and promoted content are highest on the first day of the campaign, and fall considerably thereafter. However, we show that these insights depend on the use of robust machine learning and natural language processing techniques to gather focused, relevant datasets, and to accurately gauge sentiment, rather than relying on the simple keyword- or frequency-based metrics sometimes used in social media research. △ Less

Submitted 23 December, 2013; originally announced December 2013.

ACM Class: H.3.1; I.2.7

Showing 1–26 of 26 results for author: Purver, M