Zum Hauptinhalt springen

Showing 1–15 of 15 results for author: Rayson, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13638  [pdf, other

    cs.CL cs.AI

    A Comparative Study on Automatic Coding of Medical Letters with Explainability

    Authors: Jamie Glen, Lifeng Han, Paul Rayson, Goran Nenadic

    Abstract: This study aims to explore the implementation of Natural Language Processing (NLP) and machine learning (ML) techniques to automate the coding of medical letters with visualised explainability and light-weighted local computer settings. Currently in clinical settings, coding is a manual process that involves assigning codes to each condition, procedure, and medication in a patient's paperwork (e.g… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: working paper

  2. arXiv:2405.00997  [pdf, other

    cs.CL

    The IgboAPI Dataset: Empowering Igbo Language Technologies through Multi-dialectal Enrichment

    Authors: Chris Chinenye Emezue, Ifeoma Okoh, Chinedu Mbonu, Chiamaka Chukwuneke, Daisy Lal, Ignatius Ezeani, Paul Rayson, Ijemma Onwuzulike, Chukwuma Okeke, Gerald Nweya, Bright Ogbonna, Chukwuebuka Oraegbunam, Esther Chidinma Awo-Ndubuisi, Akudo Amarachukwu Osuagwu, Obioha Nmezi

    Abstract: The Igbo language is facing a risk of becoming endangered, as indicated by a 2025 UNESCO study. This highlights the need to develop language technologies for Igbo to foster communication, learning and preservation. To create robust, impactful, and widely adopted language technologies for Igbo, it is essential to incorporate the multi-dialectal nature of the language. The primary obstacle in achiev… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted to the LREC-COLING 2024 conference

  3. arXiv:2104.11612  [pdf, other

    cs.CL cs.SI

    Understanding who uses Reddit: Profiling individuals with a self-reported bipolar disorder diagnosis

    Authors: Glorianna Jagfeld, Fiona Lobban, Paul Rayson, Steven H. Jones

    Abstract: Recently, research on mental health conditions using public online data, including Reddit, has surged in NLP and health research but has not reported user characteristics, which are important to judge generalisability of findings. This paper shows how existing NLP methods can yield information on clinical, demographic, and identity characteristics of almost 20K Reddit users who self-report a bipol… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: The Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access @NAACL 2021; Visual abstract on p. 14

  4. arXiv:2103.11811  [pdf

    cs.CL cs.AI

    MasakhaNER: Named Entity Recognition for African Languages

    Authors: David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi , et al. (36 additional authors not shown)

    Abstract: We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We… ▽ More

    Submitted 5 July, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted to TACL 2021, pre-MIT Press publication version

  5. arXiv:2102.03324  [pdf, other

    cs.LG stat.ML

    GIBBON: General-purpose Information-Based Bayesian OptimisatioN

    Authors: Henry B. Moss, David S. Leslie, Javier Gonzalez, Paul Rayson

    Abstract: This paper describes a general-purpose extension of max-value entropy search, a popular approach for Bayesian Optimisation (BO). A novel approximation is proposed for the information gain -- an information-theoretic quantity central to solving a range of BO problems, including noisy, multi-fidelity and batch optimisations across both continuous and highly-structured discrete spaces. Previously, th… ▽ More

    Submitted 26 October, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

    Journal ref: Journal of Machine Learning Research 2021

  6. arXiv:2010.05542  [pdf

    cs.CL

    The National Corpus of Contemporary Welsh: Project Report | Y Corpws Cenedlaethol Cymraeg Cyfoes: Adroddiad y Prosiect

    Authors: Dawn Knight, Steve Morris, Tess Fitzpatrick, Paul Rayson, Irena Spasić, Enlli Môn Thomas

    Abstract: This report provides an overview of the CorCenCC project and the online corpus resource that was developed as a result of work on the project. The report lays out the theoretical underpinnings of the research, demonstrating how the project has built on and extended this theory. We also raise and discuss some of the key operational questions that arose during the course of the project, outlining th… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: English-Welsh bilingual project report

  7. arXiv:2010.00979  [pdf, other

    cs.LG cs.AI stat.ML

    BOSS: Bayesian Optimization over String Spaces

    Authors: Henry B. Moss, Daniel Beck, Javier Gonzalez, David S. Leslie, Paul Rayson

    Abstract: This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  8. arXiv:2007.00939  [pdf, other

    cs.LG stat.ML

    BOSH: Bayesian Optimization by Sampling Hierarchically

    Authors: Henry B. Moss, David S. Leslie, Paul Rayson

    Abstract: Deployments of Bayesian Optimization (BO) for functions with stochastic evaluations, such as parameter tuning via cross validation and simulation optimization, typically optimize an average of a fixed set of noisy realizations of the objective function. However, disregarding the true objective function in this manner finds a high-precision optimum of the wrong function. To solve this problem, we p… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

  9. arXiv:2006.12093  [pdf, other

    cs.LG stat.ML

    MUMBO: MUlti-task Max-value Bayesian Optimization

    Authors: Henry B. Moss, David S. Leslie, Paul Rayson

    Abstract: We propose MUMBO, the first high-performing yet computationally efficient acquisition function for multi-task Bayesian optimization. Here, the challenge is to perform efficient optimization by evaluating low-cost functions somehow related to our true target function. This is a broad class of problems including the popular task of multi-fidelity optimization. However, while information-theoretic ac… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  10. arXiv:2004.00648  [pdf, ps, other

    cs.CL cs.LG

    Igbo-English Machine Translation: An Evaluation Benchmark

    Authors: Ignatius Ezeani, Paul Rayson, Ikechukwu Onyenwe, Chinedu Uchechukwu, Mark Hepple

    Abstract: Although researchers and practitioners are pushing the boundaries and enhancing the capacities of NLP tools and methods, works on African languages are lagging. A lot of focus on well resourced languages such as English, Japanese, German, French, Russian, Mandarin Chinese etc. Over 97% of the world's 7000 languages, including African languages, are low resourced for NLP i.e. they have little or no… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

    Comments: 4 pages

  11. arXiv:1906.12230  [pdf, other

    cs.LG cs.CL stat.ML

    FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms

    Authors: Henry B. Moss, Andrew Moore, David S. Leslie, Paul Rayson

    Abstract: We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models. Despite being known to produce unreliable comparisons, it is still common practice to compare model evaluations based on single choices of random seeds. We show that reliable model selection also… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

    Comments: ACL 2019. Code available at: https://github.com/apmoore1/fiesta

  12. arXiv:1903.12271  [pdf

    cs.CL

    In Search of Meaning: Lessons, Resources and Next Steps for Computational Analysis of Financial Discourse

    Authors: Mahmoud El-Haj, Paul Rayson, Martin Walker, Steven Young, Vasiliki Simaki

    Abstract: We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse. We also review common themes and innovations in the literature and assess the incremental contributions of work applying CL methods over manual content analysis. Key conclusions emerging from our analysis are: (a) accounting and finance research is behi… ▽ More

    Submitted 28 March, 2019; originally announced March 2019.

    Comments: 70 page, 18 pages of references, Journal Article

  13. arXiv:1806.07139  [pdf, other

    cs.CL stat.ML

    Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

    Authors: Henry B. Moss, David S. Leslie, Paul Rayson

    Abstract: K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unst… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: COLING 2018. Code available at: https://github.com/henrymoss/COLING2018

  14. arXiv:1806.05219  [pdf, other

    cs.CL

    Bringing replication and reproduction together with generalisability in NLP: Three reproduction studies for Target Dependent Sentiment Analysis

    Authors: Andrew Moore, Paul Rayson

    Abstract: Lack of repeatability and generalisability are two significant threats to continuing scientific development in Natural Language Processing. Language models and learning methods are so complex that scientific conference papers no longer contain enough space for the technical depth required for replication or reproduction. Taking Target Dependent Sentiment Analysis as a case study, we show how recen… ▽ More

    Submitted 6 August, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: COLING 2018. Code available at: https://github.com/apmoore1/Bella

  15. Lancaster A at SemEval-2017 Task 5: Evaluation metrics matter: predicting sentiment from financial news headlines

    Authors: Andrew Moore, Paul Rayson

    Abstract: This paper describes our participation in Task 5 track 2 of SemEval 2017 to predict the sentiment of financial news headlines for a specific company on a continuous scale between -1 and 1. We tackled the problem using a number of approaches, utilising a Support Vector Regression (SVR) and a Bidirectional Long Short-Term Memory (BLSTM). We found an improvement of 4-6% using the LSTM model over the… ▽ More

    Submitted 1 May, 2017; originally announced May 2017.

    Comments: 5 pages, to Appear in the Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval 2017), August 2017, Vancouver, BC