Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Izbicki, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15458  [pdf, other

    cs.LG eess.IV stat.ML

    PersonalizedUS: Interpretable Breast Cancer Risk Assessment with Local Coverage Uncertainty Quantification

    Authors: Alek Fröhlich, Thiago Ramos, Gustavo Cabello, Isabela Buzatto, Rafael Izbicki, Daniel Tiezzi

    Abstract: Correctly assessing the malignancy of breast lesions identified during ultrasound examinations is crucial for effective clinical decision-making. However, the current "golden standard" relies on manual BI-RADS scoring by clinicians, often leading to unnecessary biopsies and a significant mental health burden on patients and their families. In this paper, we introduce PersonalizedUS, an interpretab… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 figure, 2 tables

  2. arXiv:2402.07357  [pdf, other

    stat.ML cs.LG

    Regression Trees for Fast and Adaptive Prediction Intervals

    Authors: Luben M. C. Cabezas, Mateus P. Otto, Rafael Izbicki, Rafael B. Stern

    Abstract: Predictive models make mistakes. Hence, there is a need to quantify the uncertainty associated with their predictions. Conformal inference has emerged as a powerful tool to create statistically valid prediction regions around point predictions, but its naive application to regression problems yields non-adaptive regions. New conformal scores, often relying upon quantile regressors or conditional d… ▽ More

    Submitted 13 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  3. arXiv:2402.05330  [pdf, other

    stat.ML cs.LG

    Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference

    Authors: Luca Masserano, Alex Shen, Michele Doro, Tommaso Dorigo, Rafael Izbicki, Ann B. Lee

    Abstract: An open scientific challenge is how to classify events with reliable measures of uncertainty, when we have a mechanistic model of the data-generating process but the distribution over both labels and latent nuisance parameters is different between train and target data. We refer to this type of distributional shift as generalized label shift (GLS). Direct classification using observed data… ▽ More

    Submitted 1 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: 26 pages, 19 figures, code available at https://github.com/lee-group-cmu/lf2i

  4. arXiv:2401.04612  [pdf, other

    cs.LG

    Distribution-Free Conformal Joint Prediction Regions for Neural Marked Temporal Point Processes

    Authors: Victor Dheur, Tanguy Bosser, Rafael Izbicki, Souhaib Ben Taieb

    Abstract: Sequences of labeled events observed at irregular intervals in continuous time are ubiquitous across various fields. Temporal Point Processes (TPPs) provide a mathematical framework for modeling these sequences, enabling inferences such as predicting the arrival time of future events and their associated label, called mark. However, due to model misspecification or lack of training data, these pro… ▽ More

    Submitted 5 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  5. arXiv:2305.07430  [pdf, ps, other

    stat.ML cs.LG

    Expertise-based Weighting for Regression Models with Noisy Labels

    Authors: Milene Regina dos Santos, Rafael Izbicki

    Abstract: Regression methods assume that accurate labels are available for training. However, in certain scenarios, obtaining accurate labels may not be feasible, and relying on multiple specialists with differing opinions becomes necessary. Existing approaches addressing noisy labels often impose restrictive assumptions on the regression function. In contrast, this paper presents a novel, more flexible app… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  6. arXiv:2304.10283  [pdf, other

    cs.CL stat.ML

    Is augmentation effective to improve prediction in imbalanced text datasets?

    Authors: Gabriel O. Assunção, Rafael Izbicki, Marcos O. Prates

    Abstract: Imbalanced datasets present a significant challenge for machine learning models, often leading to biased predictions. To address this issue, data augmentation techniques are widely used in natural language processing (NLP) to generate new samples for the minority class. However, in this paper, we challenge the common assumption that data augmentation is always necessary to improve predictions on i… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: 21 pages, 5 figures

  7. arXiv:2301.09671  [pdf, other

    stat.ME cs.LG stat.ML

    Flexible conditional density estimation for time series

    Authors: Gustavo Grivol, Rafael Izbicki, Alex A. Okuno, Rafael B. Stern

    Abstract: This paper introduces FlexCodeTS, a new conditional density estimator for time series. FlexCodeTS is a flexible nonparametric conditional density estimator, which can be based on an arbitrary regression method. It is shown that FlexCodeTS inherits the rate of convergence of the chosen regression method. Hence, FlexCodeTS can adapt its convergence by employing the regression method that best fits t… ▽ More

    Submitted 23 January, 2023; originally announced January 2023.

    Comments: 19 pages, 7 figures

    MSC Class: 00-01; 99-00

  8. arXiv:2211.06410  [pdf, other

    stat.ML cs.LG

    RFFNet: Large-Scale Interpretable Kernel Methods via Random Fourier Features

    Authors: Mateus P. Otto, Rafael Izbicki

    Abstract: Kernel methods provide a flexible and theoretically grounded approach to nonlinear and nonparametric learning. While memory and run-time requirements hinder their applicability to large datasets, many low-rank kernel approximations, such as random Fourier features, were recently developed to scale up such kernel methods. However, these scalable approaches are based on approximations of isotropic k… ▽ More

    Submitted 12 April, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: New datasets, ablation studies, and discussion of method's components. 45 pages, 11 figures

  9. arXiv:2209.05371  [pdf, other

    stat.ML cs.LG

    Model interpretation using improved local regression with variable importance

    Authors: Gilson Y. Shimizu, Rafael Izbicki, Andre C. P. L. F. de Carvalho

    Abstract: A fundamental question on the use of ML models concerns the explanation of their predictions for increasing transparency in decision-making. Although several interpretability methods have emerged, some gaps regarding the reliability of their explanations have been identified. For instance, most methods are unstable (meaning that they give very different explanations with small changes in the data)… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

  10. arXiv:2205.15680  [pdf, other

    stat.ML cs.LG

    Simulator-Based Inference with Waldo: Confidence Regions by Leveraging Prediction Algorithms and Posterior Estimators for Inverse Problems

    Authors: Luca Masserano, Tommaso Dorigo, Rafael Izbicki, Mikael Kuusela, Ann B. Lee

    Abstract: Prediction algorithms, such as deep neural networks (DNNs), are used in many domain sciences to directly estimate internal parameters of interest in simulator-based models, especially in settings where the observations include images or complex high-dimensional data. In parallel, modern neural density estimators, such as normalizing flows, are becoming increasingly popular for uncertainty quantifi… ▽ More

    Submitted 13 November, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 15 pages, 10 figures, code available at https://github.com/lee-group-cmu/lf2i

  11. arXiv:2205.14568  [pdf, other

    stat.ML astro-ph.IM cs.LG stat.ME

    Conditionally Calibrated Predictive Distributions by Probability-Probability Map: Application to Galaxy Redshift Estimation and Probabilistic Forecasting

    Authors: Biprateep Dey, David Zhao, Jeffrey A. Newman, Brett H. Andrews, Rafael Izbicki, Ann B. Lee

    Abstract: Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. Much research has been devoted to describing the predictive distribution (PD) $F(y|\mathbf{x})$ of a target variable $y \in \mathbb{R}$ given complex input features $\mathbf{x} \in \mathcal{X}$. However, off-the-shelf PDs (from, e.g., normalizing flows and Bayesian neural networks) often lack conditional c… ▽ More

    Submitted 17 July, 2023; v1 submitted 28 May, 2022; originally announced May 2022.

    Comments: 21 pages, 11 figures. Under review. Code available as a Python package https://github.com/lee-group-cmu/Cal-PIT

  12. arXiv:2205.08340  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    A unified framework for dataset shift diagnostics

    Authors: Felipe Maia Polo, Rafael Izbicki, Evanildo Gomes Lacerda Jr, Juan Pablo Ibieta-Jimenez, Renato Vicente

    Abstract: Supervised learning techniques typically assume training data originates from the target population. Yet, in reality, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their predictors. In this work, we propose a novel and flexible framework called DetectShift that quantifies and tests for multiple dataset shifts, encompassing shifts in t… ▽ More

    Submitted 12 September, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

    Journal ref: Information Sciences (2023): 119612

  13. arXiv:2202.11527  [pdf

    cs.IR cs.LG stat.ME stat.ML

    A new LDA formulation with covariates

    Authors: Gilson Shimizu, Rafael Izbicki, Denis Valle

    Abstract: The Latent Dirichlet Allocation (LDA) model is a popular method for creating mixed-membership clusters. Despite having been originally developed for text analysis, LDA has been used for a wide range of other applications. We propose a new formulation for the LDA model which incorporates covariates. In this model, a negative binomial regression is embedded within LDA, enabling straight-forward inte… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

  14. arXiv:2112.01372  [pdf, other

    stat.ME cs.LG

    Hierarchical clustering: visualization, feature importance and model selection

    Authors: Luben M. C. Cabezas, Rafael Izbicki, Rafael B. Stern

    Abstract: We propose methods for the analysis of hierarchical clustering that fully use the multi-resolution structure provided by a dendrogram. Specifically, we propose a loss for choosing between clustering methods, a feature importance score and a graphical tool for visualizing the segmentation of features in a dendrogram. Current approaches to these tasks lead to loss of information since they require t… ▽ More

    Submitted 27 January, 2023; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: 29 pages, 9 figures, 3 tables

    ACM Class: I.5.3

  15. arXiv:2110.15209  [pdf, other

    astro-ph.IM cs.LG stat.ME stat.ML

    Re-calibrating Photometric Redshift Probability Distributions Using Feature-space Regression

    Authors: Biprateep Dey, Jeffrey A. Newman, Brett H. Andrews, Rafael Izbicki, Ann B. Lee, David Zhao, Markus Michael Rau, Alex I. Malz

    Abstract: Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF)… ▽ More

    Submitted 27 January, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)

  16. arXiv:2109.12029  [pdf, other

    stat.ML cs.LG stat.AP

    Identifying Distributional Differences in Convective Evolution Prior to Rapid Intensification in Tropical Cyclones

    Authors: Trey McNeely, Galen Vincent, Rafael Izbicki, Kimberly M. Wood, Ann B. Lee

    Abstract: Tropical cyclone (TC) intensity forecasts are issued by human forecasters who evaluate spatio-temporal observations (e.g., satellite imagery) and model output (e.g., numerical weather prediction, statistical models) to produce forecasts every 6 hours. Within these time constraints, it can be challenging to draw insight from such data. While high-capacity machine learning methods are well suited fo… ▽ More

    Submitted 30 November, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: 7 pages, 4 figures, Tackling Climate Change with Machine Learning: workshop at NeurIPS 2021

  17. arXiv:2107.03920  [pdf, other

    stat.ML cs.LG

    Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning for Reliable Simulator-Based Inference

    Authors: Niccolò Dalmasso, Luca Masserano, David Zhao, Rafael Izbicki, Ann B. Lee

    Abstract: Many areas of science make extensive use of computer simulators that implicitly encode intractable likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, especially outside asymptotic and low-dimensional regimes. At the same time, traditional LFI methods - such as Approximate Bayesian Computation or mor… ▽ More

    Submitted 19 November, 2023; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: 45 pages, 6 figures, code available at https://github.com/lee-group-cmu/lf2i, supplementary material available at https://lucamasserano.github.io/data/LF2I_supplementary_material.pdf

  18. arXiv:2009.05818  [pdf, other

    cs.LG cs.AI

    MeLIME: Meaningful Local Explanation for Machine Learning Models

    Authors: Tiago Botari, Frederik Hvilshøj, Rafael Izbicki, Andre C. P. L. F. de Carvalho

    Abstract: Most state-of-the-art machine learning algorithms induce black-box models, preventing their application in many sensitive domains. Hence, many methodologies for explaining machine learning models have been proposed to address this problem. In this work, we introduce strategies to improve local explanations taking into account the distribution of the data used to train the black-box models. We show… ▽ More

    Submitted 12 September, 2020; originally announced September 2020.

  19. arXiv:2007.12778  [pdf, other

    stat.ML cs.LG stat.ME

    CD-split and HPD-split: efficient conformal regions in high dimensions

    Authors: Rafael Izbicki, Gilson Shimizu, Rafael B. Stern

    Abstract: Conformal methods create prediction bands that control average coverage assuming solely i.i.d. data. Although the literature has mostly focused on prediction intervals, more general regions can often better represent uncertainty. For instance, a bimodal target is better represented by the union of two intervals. Such prediction regions are obtained by CD-split , which combines the split method and… ▽ More

    Submitted 4 October, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: 34 pages, 15 figures

    MSC Class: 62G15

  20. arXiv:2002.10399  [pdf, other

    stat.ME cs.LG stat.ML

    Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting

    Authors: Niccolò Dalmasso, Rafael Izbicki, Ann B. Lee

    Abstract: Parameter estimation, statistical tests and confidence sets are the cornerstones of classical statistics that allow scientists to make inferences about the underlying process that generated the observed data. A key question is whether one can still construct hypothesis tests and confidence sets with proper coverage and high power in a so-called likelihood-free inference (LFI) setting; that is, a s… ▽ More

    Submitted 13 August, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: 20 pages, 8 figures, 6 tables, 4 algorithm boxes

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2323-2334, 2020

  21. arXiv:1910.05575  [pdf, other

    stat.ME cs.LG stat.ML

    Flexible distribution-free conditional predictive bands using density estimators

    Authors: Rafael Izbicki, Gilson T. Shimizu, Rafael B. Stern

    Abstract: Conformal methods create prediction bands that control average coverage under no assumptions besides i.i.d. data. Besides average coverage, one might also desire to control conditional coverage, that is, coverage for every new testing point. However, without strong assumptions, conditional coverage is unachievable. Given this limitation, the literature has focused on methods with asymptotical cond… ▽ More

    Submitted 9 December, 2019; v1 submitted 12 October, 2019; originally announced October 2019.

  22. arXiv:1910.05206  [pdf, other

    stat.ML cs.LG stat.ME

    NLS: an accurate and yet easy-to-interpret regression method

    Authors: Victor Coscrato, Marco Henrique de Almeida Inácio, Tiago Botari, Rafael Izbicki

    Abstract: An important feature of successful supervised machine learning applications is to be able to explain the predictions given by the regression or classification model being used. However, most state-of-the-art models that have good predictive power lead to predictions that are hard to interpret. Thus, several model-agnostic interpreters have been developed recently as a way of explaining black-box c… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

  23. arXiv:1909.07182  [pdf, other

    stat.ML cs.LG

    Distance Assessment and Hypothesis Testing of High-Dimensional Samples using Variational Autoencoders

    Authors: Marco Henrique de Almeida Inácio, Rafael Izbicki, Bálint Gyires-Tóth

    Abstract: Given two distinct datasets, an important question is if they have arisen from the the same data generating function or alternatively how their data generating functions diverge from one another. In this paper, we introduce an approach for measuring the distance between two datasets with high dimensionality using variational autoencoders. This approach is augmented by a permutation hypothesis test… ▽ More

    Submitted 16 September, 2019; originally announced September 2019.

  24. arXiv:1908.00105  [pdf, other

    stat.ML cs.LG

    Conditional independence testing: a predictive perspective

    Authors: Marco Henrique de Almeida Inácio, Rafael Izbicki, Rafael Bassi Stern

    Abstract: Conditional independence testing is a key problem required by many machine learning and statistics tools. In particular, it is one way of evaluating the usefulness of some features on a supervised prediction problem. We propose a novel conditional independence test in a predictive setting, and show that it achieves better power than competing approaches in several settings. Our approach consists i… ▽ More

    Submitted 31 July, 2019; originally announced August 2019.

  25. arXiv:1907.13525  [pdf, other

    cs.LG cs.AI stat.ML

    Local Interpretation Methods to Machine Learning Using the Domain of the Feature Space

    Authors: Tiago Botari, Rafael Izbicki, Andre C. P. L. F. de Carvalho

    Abstract: As machine learning becomes an important part of many real world applications affecting human lives, new requirements, besides high predictive accuracy, become important. One important requirement is transparency, which has been associated with model interpretability. Many machine learning algorithms induce models difficult to interpret, named black box. Moreover, people have difficulty to trust m… ▽ More

    Submitted 31 July, 2019; originally announced July 2019.

  26. arXiv:1906.09735  [pdf, other

    cs.LG stat.ME stat.ML

    The NN-Stacking: Feature weighted linear stacking through neural networks

    Authors: Victor Coscrato, Marco Henrique de Almeida Inácio, Rafael Izbicki

    Abstract: Stacking methods improve the prediction performance of regression models. A simple way to stack base regressions estimators is by combining them linearly, as done by \citet{breiman1996stacked}. Even though this approach is useful from an interpretative perspective, it often does not lead to high predictive power. We propose the NN-Stacking method (NNS), which generalizes Breiman's method by allowi… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    MSC Class: 62Jxx

  27. arXiv:1807.03929  [pdf, other

    stat.ML cs.LG

    Quantification under prior probability shift: the ratio estimator and its extensions

    Authors: Afonso Fernandes Vaz, Rafael Izbicki, Rafael Bassi Stern

    Abstract: The quantification problem consists of determining the prevalence of a given label in a target population. However, one often has access to the labels in a sample from the training population but not in the target population. A common assumption in this situation is that of prior probability shift, that is, once the labels are known, the distribution of the features is the same in the training and… ▽ More

    Submitted 5 April, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: 33 pages, 15 figures

    MSC Class: 62F12; 62G05; 62G08

  28. arXiv:1405.3292  [pdf, other

    stat.ME cs.LG

    Learning with many experts: model selection and sparsity

    Authors: Rafael Izbicki, Rafael Bassi Stern

    Abstract: Experts classifying data are often imprecise. Recently, several models have been proposed to train classifiers using the noisy labels generated by these experts. How to choose between these models? In such situations, the true labels are unavailable. Thus, one cannot perform model selection using the standard versions of methods such as empirical risk minimization and cross validation. In order to… ▽ More

    Submitted 13 May, 2014; originally announced May 2014.

    Comments: This is the pre-peer reviewed version

    Journal ref: Izbicki, R., Stern, R. B. "Learning with many experts: Model selection and sparsity." Statistical Analysis and Data Mining 6.6 (2013): 565-577