Zum Hauptinhalt springen

Showing 1–21 of 21 results for author: Diethe, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.05980  [pdf, other

    cs.CV cs.AI cs.LG

    Tackling Structural Hallucination in Image Translation with Local Diffusion

    Authors: Seunghoi Kim, Chen Jin, Tom Diethe, Matteo Figini, Henry F. J. Tregidgo, Asher Mullokandov, Philip Teare, Daniel C. Alexander

    Abstract: Recent developments in diffusion models have advanced conditioned image generation, yet they struggle with reconstructing out-of-distribution (OOD) images, such as unseen tumors in medical images, causing "image hallucination" and risking misdiagnosis. We hypothesize such hallucinations result from local OOD regions in the conditional images. We verify that partitioning the OOD region and conducti… ▽ More

    Submitted 17 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  2. arXiv:2401.14442  [pdf, other

    q-bio.QM cs.LG stat.ML

    Improving Antibody Humanness Prediction using Patent Data

    Authors: Talip Ucar, Aubin Ramon, Dino Oglic, Rebecca Croasdale-Wood, Tom Diethe, Pietro Sormanni

    Abstract: We investigate the potential of patent data for improving the antibody humanness prediction using a multi-stage, multi-loss training process. Humanness serves as a proxy for the immunogenic response to antibody therapeutics, one of the major causes of attrition in drug discovery and a challenging obstacle for their use in clinical settings. We pose the initial learning stage as a weakly-supervised… ▽ More

    Submitted 8 June, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: ICML 2024, 14 pages, 6 figures, Code: https://github.com/AstraZeneca/SelfPAD

  3. arXiv:2310.12274  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning

    Authors: Chen Jin, Ryutaro Tanno, Amrutha Saseendran, Tom Diethe, Philip Teare

    Abstract: Textural Inversion, a prompt learning method, learns a singular text embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images. However, identifying multiple unknown object-level concepts within one scene remains a complex challenge. While recent methods have resorted to cropping or masking… ▽ More

    Submitted 24 May, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: ICML 2024; project page: https://astrazeneca.github.io/mcpl.github.io

  4. arXiv:2309.11899  [pdf, other

    cs.CV cs.AI

    Unlocking the Heart Using Adaptive Locked Agnostic Networks

    Authors: Sylwia Majchrowska, Anders Hildeman, Philip Teare, Tom Diethe

    Abstract: Supervised training of deep learning models for medical imaging applications requires a significant amount of labeled data. This is posing a challenge as the images are required to be annotated by medical professionals. To address this limitation, we introduce the Adaptive Locked Agnostic Network (ALAN), a concept involving self-supervised visual feature extraction using a large backbone model to… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: The article was accepted to ICCV 2023 workshop PerDream: PERception, Decision making and REAsoning through Multimodal foundational modeling

  5. arXiv:2103.05276  [pdf, other

    stat.ML cs.LG

    Continual Density Ratio Estimation in an Online Setting

    Authors: Yu Chen, Song Liu, Tom Diethe, Peter Flach

    Abstract: In online applications with streaming data, awareness of how far the training or test set has shifted away from the original dataset can be crucial to the performance of the model. However, we may not have access to historical samples in the data stream. To cope with such situations, we propose a novel method, Continual Density Ratio Estimation (CDRE), for estimating density ratios between the ini… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

  6. arXiv:2008.01505  [pdf, other

    cs.LG stat.ML

    Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams

    Authors: Charlie Dickens, Eric Meissner, Pablo G. Moreno, Tom Diethe

    Abstract: Anomaly detection at scale is an extremely challenging problem of great practicality. When data is large and high-dimensional, it can be difficult to detect which observations do not fit the expected behaviour. Recent work has coalesced on variations of (random) $k$\emph{d-trees} to summarise data for anomaly detection. However, these methods rely on ad-hoc score functions that are not easy to int… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

  7. arXiv:2006.11234  [pdf, other

    stat.ML cs.LG

    Semi-Discriminative Representation Loss for Online Continual Learning

    Authors: Yu Chen, Tom Diethe, Peter Flach

    Abstract: The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tas… ▽ More

    Submitted 14 April, 2022; v1 submitted 19 June, 2020; originally announced June 2020.

  8. arXiv:2006.05188  [pdf, other

    cs.LG cs.AI stat.ML

    Optimal Continual Learning has Perfect Memory and is NP-hard

    Authors: Jeremias Knoblauch, Hisham Husain, Tom Diethe

    Abstract: Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms wo… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: Accepted for publication at ICML (International Conference on Machine Learning) 2020; 13 pages, 8 Figures

  9. arXiv:2003.11498  [pdf, other

    cs.LG stat.ML

    Similarity of Neural Networks with Gradients

    Authors: Shuai Tang, Wesley J. Maddox, Charlie Dickens, Tom Diethe, Andreas Damianou

    Abstract: A suitable similarity index for comparing learnt neural networks plays an important role in understanding the behaviour of the highly-nonlinear functions, and can provide insights on further theoretical analysis and empirical studies. We define two key steps when comparing models: firstly, the representation abstracted from the learnt model, where we propose to leverage both feature vectors and gr… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

  10. arXiv:1910.08917  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Leveraging Hierarchical Representations for Preserving Privacy and Utility in Text

    Authors: Oluwaseyi Feyisetan, Tom Diethe, Thomas Drake

    Abstract: Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of dx-privacy, a generalization of Differential Privacy to non Hamming distance metrics. In this… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: Accepted at ICDM 2019

  11. arXiv:1910.08902  [pdf, ps, other

    cs.LG cs.CL cs.CR stat.ML

    Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations

    Authors: Oluwaseyi Feyisetan, Borja Balle, Thomas Drake, Tom Diethe

    Abstract: Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of dx-privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to v… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

    Comments: Accepted at WSDM 2020

  12. arXiv:1908.02858  [pdf, other

    cs.LG eess.SY stat.ML

    HyperStream: a Workflow Engine for Streaming Data

    Authors: Tom Diethe, Meelis Kull, Niall Twomey, Kacper Sokol, Hao Song, Miquel Perello-Nieto, Emma Tonkin, Peter Flach

    Abstract: This paper describes HyperStream, a large-scale, flexible and robust software package, written in the Python language, for processing streaming data with workflow creation capabilities. HyperStream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments.… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

  13. arXiv:1905.10862  [pdf, other

    stat.ML cs.LG

    Automatic Discovery of Privacy-Utility Pareto Fronts

    Authors: Brendan Avent, Javier Gonzalez, Tom Diethe, Andrei Paleyes, Borja Balle

    Abstract: Differential privacy is a mathematical framework for privacy-preserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this trade-off in advance is essential to decision-makers tasked with deciding how much privacy can be provided in a particular application while maintaining acceptable… ▽ More

    Submitted 21 July, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: Proceedings on Privacy Enhancing Technologies 2020

  14. arXiv:1905.06023  [pdf, other

    stat.ML cs.AI cs.LG

    Distribution Calibration for Regression

    Authors: Hao Song, Tom Diethe, Meelis Kull, Peter Flach

    Abstract: We are concerned with obtaining well-calibrated output distributions from regression models. Such distributions allow us to quantify the uncertainty that the model has regarding the predicted target value. We introduce the novel concept of distribution calibration, and demonstrate its advantages over the existing definition of quantile calibration. We further propose a post-hoc approach to improvi… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: ICML 2019, 10 pages

  15. arXiv:1904.10644  [pdf, other

    cs.LG cs.AI stat.ML

    Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

    Authors: Yu Chen, Tom Diethe, Neil Lawrence

    Abstract: Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: 1). post… ▽ More

    Submitted 24 April, 2019; originally announced April 2019.

    Journal ref: Continual Learning Workshop of 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)

  16. arXiv:1903.11112  [pdf, other

    cs.LG cs.CL stat.ML

    Privacy-preserving Active Learning on Sensitive Data for User Intent Classification

    Authors: Oluwaseyi Feyisetan, Thomas Drake, Borja Balle, Tom Diethe

    Abstract: Active learning holds promise of significantly reducing data annotation costs while maintaining reasonable model performance. However, it requires sending data to annotators for labeling. This presents a possible privacy leak when the training set includes sensitive user data. In this paper, we describe an approach for carrying out privacy preserving active learning with quantifiable guarantees. W… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.

    Comments: To appear at PAL: Privacy-Enhancing Artificial Intelligence and Language Technologies as part of the AAAI Spring Symposium Series (AAAI-SSS 2019)

  17. arXiv:1903.05202  [pdf, other

    stat.ML cs.LG

    Continual Learning in Practice

    Authors: Tom Diethe, Tom Borchert, Eno Thereska, Borja Balle, Neil Lawrence

    Abstract: This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive M… ▽ More

    Submitted 18 March, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: Presented at the NeurIPS 2018 workshop on Continual Learning https://sites.google.com/view/continual2018/home

  18. arXiv:1903.04016  [pdf, other

    stat.ML cs.LG

    $β^3$-IRT: A New Item Response Model and its Applications

    Authors: Yu Chen, Telmo Silva Filho, Ricardo B. C. Prudêncio, Tom Diethe, Peter Flach

    Abstract: Item Response Theory (IRT) aims to assess latent abilities of respondents based on the correctness of their answers in aptitude test items with different difficulty levels. In this paper, we propose the $β^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curve (ICC). In experiments we applied the proposed model to data from an online ex… ▽ More

    Submitted 3 June, 2019; v1 submitted 10 March, 2019; originally announced March 2019.

    Journal ref: AISTATS 2019

  19. arXiv:1702.01209  [pdf, other

    stat.ML cs.HC

    Probabilistic Sensor Fusion for Ambient Assisted Living

    Authors: Tom Diethe, Niall Twomey, Meelis Kull, Peter Flach, Ian Craddock

    Abstract: There is a widely-accepted need to revise current forms of health-care provision, with particular interest in sensing systems in the home. Given a multiple-modality sensor platform with heterogeneous network connectivity, as is under development in the Sensor Platform for HEalthcare in Residential Environment (SPHERE) Interdisciplinary Research Collaboration (IRC), we face specific challenges rela… ▽ More

    Submitted 3 February, 2017; originally announced February 2017.

    Comments: Journal article. 19 pages; 7 figures

  20. arXiv:1603.00797  [pdf, other

    cs.CY cs.HC

    The SPHERE Challenge: Activity Recognition with Multimodal Sensor Data

    Authors: Niall Twomey, Tom Diethe, Meelis Kull, Hao Song, Massimo Camplani, Sion Hannuna, Xenofon Fafoutis, Ni Zhu, Pete Woznowski, Peter Flach, Ian Craddock

    Abstract: This paper outlines the Sensor Platform for HEalthcare in Residential Environment (SPHERE) project and details the SPHERE challenge that will take place in conjunction with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML-PKDD) between March and July 2016. The SPHERE challenge is an activity recognition competition where predictions are made from vid… ▽ More

    Submitted 17 March, 2016; v1 submitted 2 March, 2016; originally announced March 2016.

    Comments: Paper describing dataset. 11 pages; 4 figures

  21. arXiv:1110.4416  [pdf, other

    cs.LG

    Data-dependent kernels in nearly-linear time

    Authors: Guy Lever, Tom Diethe, John Shawe-Taylor

    Abstract: We propose a method to efficiently construct data-dependent kernels which can make use of large quantities of (unlabeled) data. Our construction makes an approximation in the standard construction of semi-supervised kernels in Sindhwani et al. 2005. In typical cases these kernels can be computed in nearly-linear time (in the amount of data), improving on the cubic time of the standard construction… ▽ More

    Submitted 19 October, 2011; originally announced October 2011.