Zum Hauptinhalt springen

Showing 1–23 of 23 results for author: Seedat, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13733  [pdf, other

    cs.LG cs.AI

    You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling

    Authors: Nabeel Seedat, Nicolas Huynh, Fergus Imrie, Mihaela van der Schaar

    Abstract: Pseudo-labeling is a popular semi-supervised learning technique to leverage unlabeled data when labeled samples are scarce. The generation and selection of pseudo-labels heavily rely on labeled data. Existing approaches implicitly assume that the labeled data is gold standard and 'perfect'. However, this can be violated in reality with issues such as mislabeling or ambiguity. We address this overl… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Published in the Journal of Data-centric Machine Learning Research (DMLR) *Seedat & Huynh contributed equally

  2. arXiv:2406.03258  [pdf, other

    stat.ML cs.LG

    Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise

    Authors: Thomas Pouplin, Alan Jeffares, Nabeel Seedat, Mihaela van der Schaar

    Abstract: Constructing valid prediction intervals rather than point estimates is a well-established approach for uncertainty quantification in the regression setting. Models equipped with this capacity output an interval of values in which the ground truth target will fall with some prespecified probability. This is an essential requirement in many real-world applications where simple point predictions' ina… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted at International Conference on Machine Learning (ICML) 2024

  3. arXiv:2403.04551  [pdf, other

    cs.LG

    Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI

    Authors: Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar

    Abstract: Characterizing samples that are difficult to learn from is crucial to developing highly performant ML models. This has led to numerous Hardness Characterization Methods (HCMs) that aim to identify "hard" samples. However, there is a lack of consensus regarding the definition and evaluation of "hardness". Unfortunately, current HCMs have only been evaluated on specific types of hardness and often o… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Published at International Conference on Learning Representations (ICLR) 2024

  4. arXiv:2402.17599  [pdf, other

    cs.LG cs.AI stat.ML

    DAGnosis: Localized Identification of Data Inconsistencies using Structures

    Authors: Nicolas Huynh, Jeroen Berrevoets, Nabeel Seedat, Jonathan Crabbé, Zhaozhi Qian, Mihaela van der Schaar

    Abstract: Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. While recent data-centric methods are able to identify such inconsistencies with respect to the training set, they suffer from two key limitations: (1) suboptimality in settings where features exhibit statistical independencies, due to their usage of compressive… ▽ More

    Submitted 28 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: AISTATS 2024; added correspondance email

  5. arXiv:2402.03921  [pdf, other

    cs.LG cs.AI

    Large Language Models to Enhance Bayesian Optimization

    Authors: Tennison Liu, Nicolás Astorga, Nabeel Seedat, Mihaela van der Schaar

    Abstract: Bayesian optimization (BO) is a powerful approach for optimizing complex and expensive-to-evaluate black-box functions. Its importance is underscored in many applications, notably including hyperparameter tuning, but its efficacy depends on efficiently balancing exploration and exploitation. While there has been substantial progress in BO methods, striking this balance remains a delicate process.… ▽ More

    Submitted 8 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted as Poster at ICLR2024

  6. arXiv:2312.12112  [pdf, other

    cs.LG cs.AI

    Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes

    Authors: Nabeel Seedat, Nicolas Huynh, Boris van Breugel, Mihaela van der Schaar

    Abstract: Machine Learning (ML) in low-data settings remains an underappreciated yet crucial problem. Hence, data augmentation methods to increase the sample size of datasets needed for ML are key to unlocking the transformative potential of ML in data-deprived regions and domains. Unfortunately, the limited training set constrains traditional tabular synthetic data generators in their ability to generate a… ▽ More

    Submitted 30 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Presented at the 41st International Conference on Machine Learning (ICML) 2024. *Seedat & Huynh contributed equally

  7. arXiv:2311.14110  [pdf, other

    cs.LG cs.AI

    When is Off-Policy Evaluation Useful? A Data-Centric Perspective

    Authors: Hao Sun, Alex J. Chan, Nabeel Seedat, Alihan Hüyük, Mihaela van der Schaar

    Abstract: Evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. On the one hand, it brings opportunities for safe policy improvement under high-stakes scenarios like clinical guidelines. On the other hand, such opportunities raise a need for precise off-policy evaluation (OPE). While previous work on OPE focused on improving the algorithm in value esti… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: Off-Policy Evaluation, Data-Centric AI, Data-Centric Reinforcement Learning, Reinforcement Learning

  8. arXiv:2310.18970  [pdf, other

    cs.LG

    TRIAGE: Characterizing and auditing training data for improved regression

    Authors: Nabeel Seedat, Jonathan Crabbé, Zhaozhi Qian, Mihaela van der Schaar

    Abstract: Data quality is crucial for robust machine learning algorithms, with the recent interest in data-centric AI emphasizing the importance of training data characterization. However, current data characterization methods are largely focused on classification settings, with regression settings largely understudied. To address this, we introduce TRIAGE, a novel data characterization framework tailored t… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: Presented at NeurIPS 2023

  9. arXiv:2310.16981  [pdf, other

    cs.LG

    Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark

    Authors: Lasse Hansen, Nabeel Seedat, Mihaela van der Schaar, Andrija Petrovic

    Abstract: Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper addresses this issue by exploring the potential of integrating data-centric AI techniques which profile the data to guide the synthetic data g… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Presented at NeurIPS 2023 (Datasets & Benchmarks). *Hansen & Seedat contributed equally

  10. arXiv:2310.16524  [pdf, other

    cs.LG

    Can You Rely on Your Model Evaluation? Improving Model Evaluation with Synthetic Test Data

    Authors: Boris van Breugel, Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar

    Abstract: Evaluating the performance of machine learning models on diverse and underrepresented subgroups is essential for ensuring fairness and reliability in real-world applications. However, accurately assessing model performance becomes challenging due to two main issues: (1) a scarcity of test data, especially for small subgroups, and (2) possible distributional shifts in the model's deployment setting… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Van Breugel & Seedat contributed equally

  11. arXiv:2306.04663  [pdf, ps, other

    eess.SP cs.LG

    U-PASS: an Uncertainty-guided deep learning Pipeline for Automated Sleep Staging

    Authors: Elisabeth R. M. Heremans, Nabeel Seedat, Bertien Buyse, Dries Testelmans, Mihaela van der Schaar, Maarten De Vos

    Abstract: As machine learning becomes increasingly prevalent in critical fields such as healthcare, ensuring the safety and reliability of machine learning systems becomes paramount. A key component of reliability is the ability to estimate uncertainty, which enables the identification of areas of high and low confidence and helps to minimize the risk of error. In this study, we propose a machine learning p… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  12. arXiv:2302.12238  [pdf, other

    cs.LG stat.ML

    Improving Adaptive Conformal Prediction Using Self-Supervised Learning

    Authors: Nabeel Seedat, Alan Jeffares, Fergus Imrie, Mihaela van der Schaar

    Abstract: Conformal prediction is a powerful distribution-free tool for uncertainty quantification, establishing valid prediction intervals with finite-sample guarantees. To produce valid intervals which are also adaptive to the difficulty of each instance, a common approach is to compute normalized nonconformity scores on a separate calibration set. Self-supervised learning has been effectively utilized in… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted to the International Conference on Artificial Intelligence and Statistics (AISTATS 2023). *Seedat & Jeffares contributed equally

  13. arXiv:2211.05764  [pdf, other

    cs.LG cs.AI cs.CY cs.SE stat.ML

    DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems

    Authors: Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar

    Abstract: While there have been a number of remarkable breakthroughs in machine learning (ML), much of the focus has been placed on model development. However, to truly realize the potential of machine learning in real-world settings, additional aspects must be considered across the ML pipeline. Data-centric AI is emerging as a unifying paradigm that could enable such reliable end-to-end pipelines. However,… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: Main paper: 11 pages, supplementary & case studies follow

    Journal ref: IEEE Transactions on Artificial Intelligence, 2023

  14. arXiv:2210.13043  [pdf, other

    cs.LG cs.AI

    Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data

    Authors: Nabeel Seedat, Jonathan Crabbé, Ioana Bica, Mihaela van der Schaar

    Abstract: High model performance, on average, can hide that models may systematically underperform on subgroups of the data. We consider the tabular setting, which surfaces the unique issue of outcome heterogeneity - this is prevalent in areas such as healthcare, where patients with similar features can have different outcomes, thus making reliable predictions challenging. To tackle this, we propose Data-IQ… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Presented at NeurIPS 2022

  15. arXiv:2207.05161  [pdf, other

    cs.LG cs.AI

    What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization

    Authors: Hao Sun, Boris van Breugel, Jonathan Crabbe, Nabeel Seedat, Mihaela van der Schaar

    Abstract: Uncertainty Quantification (UQ) is essential for creating trustworthy machine learning models. Recent years have seen a steep rise in UQ methods that can flag suspicious examples, however, it is often unclear what exactly these methods identify. In this work, we propose a framework for categorizing uncertain examples flagged by UQ methods in classification tasks. We introduce the confusion density… ▽ More

    Submitted 27 October, 2023; v1 submitted 11 July, 2022; originally announced July 2022.

  16. arXiv:2206.08311  [pdf, other

    cs.LG stat.ML

    Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations

    Authors: Nabeel Seedat, Fergus Imrie, Alexis Bellot, Zhaozhi Qian, Mihaela van der Schaar

    Abstract: Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer ''what-iF'' questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice.… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: Presented at the International Conference on Machine Learning (ICML) 2022

  17. arXiv:2206.06354  [pdf, other

    cs.LG stat.ML

    Differentiable and Transportable Structure Learning

    Authors: Jeroen Berrevoets, Nabeel Seedat, Fergus Imrie, Mihaela van der Schaar

    Abstract: Directed acyclic graphs (DAGs) encode a lot of information about a particular distribution in their structure. However, compute required to infer these structures is typically super-exponential in the number of variables, as inference requires a sweep of a combinatorially large space of potential structures. That is, until recent advances made it possible to search this space using a differentiabl… ▽ More

    Submitted 12 June, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Accepted at the International Conference on Machine Learning (ICML) 2023

  18. arXiv:2205.14761  [pdf, other

    cs.LG cs.CL stat.ML

    Modeling Disagreement in Automatic Data Labelling for Semi-Supervised Learning in Clinical Natural Language Processing

    Authors: Hongshu Liu, Nabeel Seedat, Julia Ive

    Abstract: Computational models providing accurate estimates of their uncertainty are crucial for risk management associated with decision making in healthcare contexts. This is especially true since many state-of-the-art systems are trained using the data which has been labelled automatically (self-supervised mode) and tend to overfit. In this work, we investigate the quality of uncertainty estimates from a… ▽ More

    Submitted 7 June, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

    Comments: 7 pages, *Equal contribution

  19. arXiv:2202.08836  [pdf, other

    cs.LG cs.AI

    Data-SUITE: Data-centric identification of in-distribution incongruous examples

    Authors: Nabeel Seedat, Jonathan Crabbé, Mihaela van der Schaar

    Abstract: Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet equally important problem of characterizing incongruous regions of in-distribution (ID) data, which may arise from feature space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE: a data-centric AI… ▽ More

    Submitted 13 June, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: Presented at the International Conference on Machine Learning (ICML) 2022

  20. arXiv:2007.03995  [pdf, other

    cs.LG cs.CV stat.ML

    MCU-Net: A framework towards uncertainty representations for decision support system patient referrals in healthcare contexts

    Authors: Nabeel Seedat

    Abstract: Incorporating a human-in-the-loop system when deploying automated decision support is critical in healthcare contexts to create trust, as well as provide reliable performance on a patient-to-patient basis. Deep learning methods while having high performance, do not allow for this patient-centered approach due to the lack of uncertainty representation. Thus, we present a framework of uncertainty re… ▽ More

    Submitted 25 August, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: 4 pages, 4 figures,Spotlight Talk at KDD 2020 - Applied Data Science for Healthcare Workshop & presented at ICML 2020: Uncertainty and Robustness in Deep Learning

  21. arXiv:2006.12121  [pdf, other

    cs.CV cs.LG

    Automated machine vision enabled detection of movement disorders from hand drawn spirals

    Authors: Nabeel Seedat, Vered Aharonson, Ilana Schlesinger

    Abstract: A widely used test for the diagnosis of Parkinson's disease (PD) and Essential tremor (ET) is hand-drawn shapes,where the analysis is observationally performed by the examining neurologist. This method is subjective and is prone to bias amongst different physicians. Due to the similarities in the symptoms of the two diseases, they are often misdiagnosed.Studies which attempt to automate the proces… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted IEEE International Conference on Healthcare Informatics 2020 (ICHI 2020), Upcoming Dec 2020, Copyright IEEE - 978-1-5386-5541-2/18/$31.00 Copyright, 2020 IEEE

  22. arXiv:2006.12094  [pdf, other

    cs.LG eess.SP stat.ML

    Machine learning discrimination of Parkinson's Disease stages from walker-mounted sensors data

    Authors: Nabeel Seedat, Vered Aharonson

    Abstract: Clinical methods that assess gait in Parkinson's Disease (PD) are mostly qualitative. Quantitative methods necessitate costly instrumentation or cumbersome wearable devices, which limits their usability. Only few of these methods can discriminate different stages in PD progression. This study applies machine learning methods to discriminate six stages of PD. The data was acquired by low cost walke… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted to AAAI 2020 (New York) - International Workshop on Health Intelligence

  23. arXiv:1911.00104  [pdf, other

    cs.LG cs.CV stat.ML

    Towards calibrated and scalable uncertainty representations for neural networks

    Authors: Nabeel Seedat, Christopher Kanan

    Abstract: For many applications it is critical to know the uncertainty of a neural network's predictions. While a variety of neural network parameter estimation methods have been proposed for uncertainty estimation, they have not been rigorously compared across uncertainty measures. We assess four of these parameter estimation methods to calibrate uncertainty estimation using four different uncertainty meas… ▽ More

    Submitted 3 December, 2019; v1 submitted 27 October, 2019; originally announced November 2019.

    Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019): 4th workshop on Bayesian Deep Learning, Vancouver, Canada