-
Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction
Authors:
Drew T. Nguyen,
Reese Pathak,
Anastasios N. Angelopoulos,
Stephen Bates,
Michael I. Jordan
Abstract:
Decision-making pipelines are generally characterized by tradeoffs among various risk functions. It is often desirable to manage such tradeoffs in a data-adaptive manner. As we demonstrate, if this is done naively, state-of-the art uncertainty quantification methods can lead to significant violations of putative risk guarantees.
To address this issue, we develop methods that permit valid control…
▽ More
Decision-making pipelines are generally characterized by tradeoffs among various risk functions. It is often desirable to manage such tradeoffs in a data-adaptive manner. As we demonstrate, if this is done naively, state-of-the art uncertainty quantification methods can lead to significant violations of putative risk guarantees.
To address this issue, we develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively. Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
To illustrate the benefits of our approach, we carry out numerical experiments on synthetic data and the large-scale vision dataset MS-COCO.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Online conformal prediction with decaying step sizes
Authors:
Anastasios N. Angelopoulos,
Rina Foygel Barber,
Stephen Bates
Abstract:
We introduce a method for online conformal prediction with decaying step sizes. Like previous methods, ours possesses a retrospective guarantee of coverage for arbitrary sequences. However, unlike previous methods, we can simultaneously estimate a population quantile when it exists. Our theory and experiments indicate substantially improved practical properties: in particular, when the distributio…
▽ More
We introduce a method for online conformal prediction with decaying step sizes. Like previous methods, ours possesses a retrospective guarantee of coverage for arbitrary sequences. However, unlike previous methods, we can simultaneously estimate a population quantile when it exists. Our theory and experiments indicate substantially improved practical properties: in particular, when the distribution is stable, the coverage is close to the desired level for every time point, not just on average over the observed sequence.
△ Less
Submitted 28 May, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Delegating Data Collection in Decentralized Machine Learning
Authors:
Nivasini Ananthakrishnan,
Stephen Bates,
Michael I. Jordan,
Nika Haghtalab
Abstract:
Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental information asymmetries that arise in decentralized ML: uncertainty in the assessment of model quality and uncertainty regarding the optimal pe…
▽ More
Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental information asymmetries that arise in decentralized ML: uncertainty in the assessment of model quality and uncertainty regarding the optimal performance of any model. We show that a principal can cope with such asymmetry via simple linear contracts that achieve 1-1/e fraction of the optimal utility. To address the lack of a priori knowledge regarding the optimal performance, we give a convex program that can adaptively and efficiently compute the optimal contract. We also study linear contracts and derive the optimal utility in the more complex setting of multiple interactions.
△ Less
Submitted 2 May, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Incentive-Theoretic Bayesian Inference for Collaborative Science
Authors:
Stephen Bates,
Michael I. Jordan,
Michael Sklar,
Jake A. Soloff
Abstract:
Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing whe…
▽ More
Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal (e.g., a policymaker or regulator) who wishes to make decisions based on the parameter value. The agent chooses whether to run a statistical trial based on their private prior and then the result of the trial is used by the principal to reach a decision. We show how the principal can conduct statistical inference that leverages the information that is revealed by an agent's strategic behavior -- their choice to run a trial or not. In particular, we show how the principal can design a policy to elucidate partial information about the agent's private prior beliefs and use this to control the posterior probability of the null. One implication is a simple guideline for the choice of significance threshold in clinical trials: the type-I error level should be set to be strictly less than the cost of the trial divided by the firm's profit if the trial is successful.
△ Less
Submitted 8 February, 2024; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Class-Conditional Conformal Prediction with Many Classes
Authors:
Tiffany Ding,
Anastasios N. Angelopoulos,
Stephen Bates,
Michael I. Jordan,
Ryan J. Tibshirani
Abstract:
Standard conformal prediction methods provide a marginal coverage guarantee, which means that for a random test point, the conformal prediction set contains the true label with a user-specified probability. In many classification problems, we would like to obtain a stronger guarantee--that for test points of a specific class, the prediction set contains the true label with the same user-chosen pro…
▽ More
Standard conformal prediction methods provide a marginal coverage guarantee, which means that for a random test point, the conformal prediction set contains the true label with a user-specified probability. In many classification problems, we would like to obtain a stronger guarantee--that for test points of a specific class, the prediction set contains the true label with the same user-chosen probability. For the latter goal, existing conformal prediction methods do not work well when there is a limited amount of labeled data per class, as is often the case in real applications where the number of classes is large. We propose a method called clustered conformal prediction that clusters together classes having "similar" conformal scores and performs conformal prediction at the cluster level. Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
△ Less
Submitted 27 October, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Operationalizing Counterfactual Metrics: Incentives, Ranking, and Information Asymmetry
Authors:
Serena Wang,
Stephen Bates,
P. M. Aronow,
Michael I. Jordan
Abstract:
From the social sciences to machine learning, it has been well documented that metrics to be optimized are not always aligned with social welfare. In healthcare, Dranove et al. (2003) showed that publishing surgery mortality metrics actually harmed the welfare of sicker patients by increasing provider selection behavior. We analyze the incentive misalignments that arise from such average treated o…
▽ More
From the social sciences to machine learning, it has been well documented that metrics to be optimized are not always aligned with social welfare. In healthcare, Dranove et al. (2003) showed that publishing surgery mortality metrics actually harmed the welfare of sicker patients by increasing provider selection behavior. We analyze the incentive misalignments that arise from such average treated outcome metrics, and show that the incentives driving treatment decisions would align with maximizing total patient welfare if the metrics (i) accounted for counterfactual untreated outcomes and (ii) considered total welfare instead of averaging over treated patients. Operationalizing this, we show how counterfactual metrics can be modified to behave reasonably in patient-facing ranking systems. Extending to realistic settings when providers observe more about patients than the regulatory agencies do, we bound the decay in performance by the degree of information asymmetry between principal and agent. In doing so, our model connects principal-agent information asymmetry with unobserved heterogeneity in causal inference.
△ Less
Submitted 29 November, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Web and Mobile Platforms for Managing Elections based on IoT And Machine Learning Algorithms
Authors:
G. M. I. K. Galagoda,
W. M. C. A. Karunarathne,
R. S. Bates,
K. M. H. V. P. Gangathilaka,
Kanishka Yapa,
Erandika Gamage
Abstract:
The global pandemic situation has severely affected all countries. As a result, almost all countries had to adjust to online technologies to continue their processes. In addition, Sri Lanka is yearly spending ten billion on elections. We have examined a proper way of minimizing the cost of hosting these events online. To solve the existing problems and increase the time potency and cost reduction…
▽ More
The global pandemic situation has severely affected all countries. As a result, almost all countries had to adjust to online technologies to continue their processes. In addition, Sri Lanka is yearly spending ten billion on elections. We have examined a proper way of minimizing the cost of hosting these events online. To solve the existing problems and increase the time potency and cost reduction we have used IoT and ML-based technologies. IoT-based data will identify, register, and be used to secure from fraud, while ML algorithms manipulate the election data and produce winning predictions, weather-based voters attendance, and election violence. All the data will be saved in cloud computing and a standard database to store and access the data. This study mainly focuses on four aspects of an E-voting system. The most frequent problems across the world in E-voting are the security, accuracy, and reliability of the systems. E-government systems must be secured against various cyber-attacks and ensure that only authorized users can access valuable, and sometimes sensitive information. Being able to access a system without passwords but using biometric details has been there for a while now, however, our proposed system has a different approach to taking the credentials, processing, and combining the images, reformatting and producing the output, and tracking. In addition, we ensure to enhance e-voting safety. While ML-based algorithms use different data sets and provide predictions in advance.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Prediction-Powered Inference
Authors:
Anastasios N. Angelopoulos,
Stephen Bates,
Clara Fannjiang,
Michael I. Jordan,
Tijana Zrnic
Abstract:
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients, without making any assumptions on the ma…
▽ More
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients, without making any assumptions on the machine-learning algorithm that supplies the predictions. Furthermore, more accurate predictions translate to smaller confidence intervals. Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning. The benefits of prediction-powered inference are demonstrated with datasets from proteomics, astronomy, genomics, remote sensing, census analysis, and ecology.
△ Less
Submitted 9 November, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
The Sample Complexity of Online Contract Design
Authors:
Banghua Zhu,
Stephen Bates,
Zhuoran Yang,
Yixin Wang,
Jiantao Jiao,
Michael I. Jordan
Abstract:
We study the hidden-action principal-agent problem in an online setting. In each round, the principal posts a contract that specifies the payment to the agent based on each outcome. The agent then makes a strategic choice of action that maximizes her own utility, but the action is not directly observable by the principal. The principal observes the outcome and receives utility from the agent's cho…
▽ More
We study the hidden-action principal-agent problem in an online setting. In each round, the principal posts a contract that specifies the payment to the agent based on each outcome. The agent then makes a strategic choice of action that maximizes her own utility, but the action is not directly observable by the principal. The principal observes the outcome and receives utility from the agent's choice of action. Based on past observations, the principal dynamically adjusts the contracts with the goal of maximizing her utility.
We introduce an online learning algorithm and provide an upper bound on its Stackelberg regret. We show that when the contract space is $[0,1]^m$, the Stackelberg regret is upper bounded by $\widetilde O(\sqrt{m} \cdot T^{1-1/(2m+1)})$, and lower bounded by $Ω(T^{1-1/(m+2)})$, where $\widetilde O$ omits logarithmic factors. This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design. Moreover, when contracts are restricted to some subset $\mathcal{F} \subset [0,1]^m$, we define an intrinsic dimension of $\mathcal{F}$ that depends on the covering number of the spherical code in the space and bound the regret in terms of this intrinsic dimension. When $\mathcal{F}$ is the family of linear contracts, we show that the Stackelberg regret grows exactly as $Θ(T^{2/3})$.
The contract design problem is challenging because the utility function is discontinuous. Bounding the discretization error in this setting has been an open problem. In this paper, we identify a limited set of directions in which the utility function is continuous, allowing us to design a new discretization method and bound its error. This approach enables the first upper bound with no restrictions on the contract and action space.
△ Less
Submitted 19 May, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Conformal Prediction is Robust to Dispersive Label Noise
Authors:
Shai Feldman,
Bat-Sheva Einbinder,
Stephen Bates,
Anastasios N. Angelopoulos,
Asaf Gendler,
Yaniv Romano
Abstract:
We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. We further extend our theory and formulate the requirements for correctly control…
▽ More
We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. We further extend our theory and formulate the requirements for correctly controlling a general loss function, such as the false negative proportion, with noisy labels. Our theory and experiments suggest that conformal prediction and risk-controlling techniques with noisy labels attain conservative risk over the clean ground truth labels except in adversarial cases. In such cases, we can also correct for noise of bounded size in the conformal prediction algorithm in order to ensure achieving the correct risk of the ground truth labels without score or data regularity.
△ Less
Submitted 19 September, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Conformal Risk Control
Authors:
Anastasios N. Angelopoulos,
Stephen Bates,
Adam Fisch,
Lihua Lei,
Tal Schuster
Abstract:
We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversar…
▽ More
We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. We also introduce extensions of the idea to distribution shift, quantile risk control, multiple and adversarial risk control, and expectations of U-statistics. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.
△ Less
Submitted 29 April, 2023; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Semantic uncertainty intervals for disentangled latent spaces
Authors:
Swami Sankaranarayanan,
Anastasios N. Angelopoulos,
Stephen Bates,
Yaniv Romano,
Phillip Isola
Abstract:
Meaningful uncertainty quantification in computer vision requires reasoning about semantic information -- say, the hair color of the person in a photo or the location of a car on the street. To this end, recent breakthroughs in generative modeling allow us to represent semantic information in disentangled latent spaces, but providing uncertainties on the semantic latent variables has remained chal…
▽ More
Meaningful uncertainty quantification in computer vision requires reasoning about semantic information -- say, the hair color of the person in a photo or the location of a car on the street. To this end, recent breakthroughs in generative modeling allow us to represent semantic information in disentangled latent spaces, but providing uncertainties on the semantic latent variables has remained challenging. In this work, we provide principled uncertainty intervals that are guaranteed to contain the true semantic factors for any underlying generative model. The method does the following: (1) it uses quantile regression to output a heuristic uncertainty interval for each element in the latent space (2) calibrates these uncertainties such that they contain the true value of the latent for a new, unseen input. The endpoints of these calibrated intervals can then be propagated through the generator to produce interpretable uncertainty visualizations for each semantic factor. This technique reliably communicates semantically meaningful, principled, and instance-adaptive uncertainty in inverse problems like image super-resolution and image completion.
△ Less
Submitted 30 November, 2022; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Recommendation Systems with Distribution-Free Reliability Guarantees
Authors:
Anastasios N. Angelopoulos,
Karl Krauth,
Stephen Bates,
Yixin Wang,
Michael I. Jordan
Abstract:
When building recommendation systems, we seek to output a helpful set of items to the user. Under the hood, a ranking model predicts which of two candidate items is better, and we must distill these pairwise comparisons into the user-facing output. However, a learned ranking model is never perfect, so taking its predictions at face value gives no guarantee that the user-facing output is reliable.…
▽ More
When building recommendation systems, we seek to output a helpful set of items to the user. Under the hood, a ranking model predicts which of two candidate items is better, and we must distill these pairwise comparisons into the user-facing output. However, a learned ranking model is never perfect, so taking its predictions at face value gives no guarantee that the user-facing output is reliable. Building from a pre-trained ranking model, we show how to return a set of items that is rigorously guaranteed to contain mostly good items. Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate (FDR), regardless of the (unknown) data distribution. Moreover, our calibration algorithm enables the easy and principled integration of multiple objectives in recommender systems. As an example, we show how to optimize for recommendation diversity subject to a user-specified level of FDR control, circumventing the need to specify ad hoc weights of a diversity loss against an accuracy loss. Throughout, we focus on the problem of learning to rank a set of possible recommendations, evaluating our methods on the Yahoo! Learning to Rank and MSMarco datasets.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Robust Calibration with Multi-domain Temperature Scaling
Authors:
Yaodong Yu,
Stephen Bates,
Yi Ma,
Michael I. Jordan
Abstract:
Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains. Uncertainty quantification is all the more challenging when training distribution and test distribution are different, even the distribution shifts are mild. Despite the ubiquity of distribution shifts in real-world applications, existing uncertainty quantification app…
▽ More
Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains. Uncertainty quantification is all the more challenging when training distribution and test distribution are different, even the distribution shifts are mild. Despite the ubiquity of distribution shifts in real-world applications, existing uncertainty quantification approaches mainly study the in-distribution setting where the train and test distributions are the same. In this paper, we develop a systematic calibration model to handle distribution shifts by leveraging data from multiple domains. Our proposed method -- multi-domain temperature scaling -- uses the heterogeneity in the domains to improve calibration robustness under distribution shift. Through experiments on three benchmark data sets, we find our proposed method outperforms existing methods as measured on both in-distribution and out-of-distribution test sets.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Achieving Risk Control in Online Learning Settings
Authors:
Shai Feldman,
Liran Ringel,
Stephen Bates,
Yaniv Romano
Abstract:
To provide rigorous uncertainty quantification for online learning models, we develop a framework for constructing uncertainty sets that provably control risk -- such as coverage of confidence intervals, false negative rate, or F1 score -- in the online setting. This extends conformal prediction to apply to a larger class of online learning problems. Our method guarantees risk control at any user-…
▽ More
To provide rigorous uncertainty quantification for online learning models, we develop a framework for constructing uncertainty sets that provably control risk -- such as coverage of confidence intervals, false negative rate, or F1 score -- in the online setting. This extends conformal prediction to apply to a larger class of online learning problems. Our method guarantees risk control at any user-specified level even when the underlying data distribution shifts drastically, even adversarially, over time in an unknown fashion. The technique we propose is highly flexible as it can be applied with any base online learning algorithm (e.g., a deep neural network trained online), requiring minimal implementation effort and essentially zero additional computational cost. We further extend our approach to control multiple risks simultaneously, so the prediction sets we generate are valid for all given risks. To demonstrate the utility of our method, we conduct experiments on real-world tabular time-series data sets showing that the proposed method rigorously controls various natural risks. Furthermore, we show how to construct valid intervals for an online image-depth estimation problem that previous sequential calibration schemes cannot handle.
△ Less
Submitted 27 January, 2023; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Principal-Agent Hypothesis Testing
Authors:
Stephen Bates,
Michael I. Jordan,
Michael Sklar,
Jake A. Soloff
Abstract:
Consider the relationship between a regulator (the principal) and an experimenter (the agent) such as a pharmaceutical company. The pharmaceutical company wishes to sell a drug for profit, whereas the regulator wishes to allow only efficacious drugs to be marketed. The efficacy of the drug is not known to the regulator, so the pharmaceutical company must run a costly trial to prove efficacy to the…
▽ More
Consider the relationship between a regulator (the principal) and an experimenter (the agent) such as a pharmaceutical company. The pharmaceutical company wishes to sell a drug for profit, whereas the regulator wishes to allow only efficacious drugs to be marketed. The efficacy of the drug is not known to the regulator, so the pharmaceutical company must run a costly trial to prove efficacy to the regulator. Critically, the statistical protocol used to establish efficacy affects the behavior of a strategic, self-interested agent; a lower standard of statistical evidence incentivizes the agent to run more trials that are less likely to be effective. The interaction between the statistical protocol and the incentives of the pharmaceutical company is crucial for understanding this system and designing protocols with high social utility. In this work, we discuss how the regulator can set up a protocol with payoffs based on statistical evidence. We show how to design protocols that are robust to an agent's strategic actions, and derive the optimal protocol in the presence of strategic entrants.
△ Less
Submitted 15 April, 2024; v1 submitted 13 May, 2022;
originally announced May 2022.
-
Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging
Authors:
Anastasios N Angelopoulos,
Amit P Kohli,
Stephen Bates,
Michael I Jordan,
Jitendra Malik,
Thayer Alshaabi,
Srigokul Upadhyayula,
Yaniv Romano
Abstract:
Image-to-image regression is an important learning task, used frequently in biological imaging. Current algorithms, however, do not generally offer statistical guarantees that protect against a model's mistakes and hallucinations. To address this, we develop uncertainty quantification techniques with rigorous statistical guarantees for image-to-image regression problems. In particular, we show how…
▽ More
Image-to-image regression is an important learning task, used frequently in biological imaging. Current algorithms, however, do not generally offer statistical guarantees that protect against a model's mistakes and hallucinations. To address this, we develop uncertainty quantification techniques with rigorous statistical guarantees for image-to-image regression problems. In particular, we show how to derive uncertainty intervals around each pixel that are guaranteed to contain the true value with a user-specified confidence probability. Our methods work in conjunction with any base machine learning model, such as a neural network, and endow it with formal mathematical guarantees -- regardless of the true unknown data distribution or choice of model. Furthermore, they are simple to implement and computationally inexpensive. We evaluate our procedure on three image-to-image regression tasks: quantitative phase microscopy, accelerated magnetic resonance imaging, and super-resolution transmission electron microscopy of a Drosophila melanogaster brain.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Conformal prediction for the design problem
Authors:
Clara Fannjiang,
Stephen Bates,
Anastasios N. Angelopoulos,
Jennifer Listgarten,
Michael I. Jordan
Abstract:
Many applications of machine learning methods involve an iterative protocol in which data are collected, a model is trained, and then outputs of that model are used to choose what data to consider next. For example, one data-driven approach for designing proteins is to train a regression model to predict the fitness of protein sequences, then use it to propose new sequences believed to exhibit gre…
▽ More
Many applications of machine learning methods involve an iterative protocol in which data are collected, a model is trained, and then outputs of that model are used to choose what data to consider next. For example, one data-driven approach for designing proteins is to train a regression model to predict the fitness of protein sequences, then use it to propose new sequences believed to exhibit greater fitness than observed in the training data. Since validating designed sequences in the wet lab is typically costly, it is important to quantify the uncertainty in the model's predictions. This is challenging because of a characteristic type of distribution shift between the training and test data in the design setting -- one in which the training and test data are statistically dependent, as the latter is chosen based on the former. Consequently, the model's error on the test data -- that is, the designed sequences -- has an unknown and possibly complex relationship with its error on the training data. We introduce a method to quantify predictive uncertainty in such settings. We do so by constructing confidence sets for predictions that account for the dependence between the training and test data. The confidence sets we construct have finite-sample guarantees that hold for any prediction algorithm, even when a trained model chooses the test-time input distribution. As a motivating use case, we demonstrate with several real data sets how our method quantifies uncertainty for the predicted fitness of designed proteins, and can therefore be used to select design algorithms that achieve acceptable trade-offs between high predicted fitness and low predictive uncertainty.
△ Less
Submitted 31 May, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Optimal Data Selection: An Online Distributed View
Authors:
Mariel Werner,
Anastasios Angelopoulos,
Stephen Bates,
Michael I. Jordan
Abstract:
The blessing of ubiquitous data also comes with a curse: the communication, storage, and labeling of massive, mostly redundant datasets. We seek to solve this problem at its core, collecting only valuable data and throwing out the rest via submodular maximization. Specifically, we develop algorithms for the online and distributed version of the problem, where data selection occurs in an uncoordina…
▽ More
The blessing of ubiquitous data also comes with a curse: the communication, storage, and labeling of massive, mostly redundant datasets. We seek to solve this problem at its core, collecting only valuable data and throwing out the rest via submodular maximization. Specifically, we develop algorithms for the online and distributed version of the problem, where data selection occurs in an uncoordinated fashion across multiple data streams. We design a general and flexible core selection routine for our algorithms which, given any stream of data, any assessment of its value, and any formulation of its selection cost, extracts the most valuable subset of the stream up to a constant factor while using minimal memory. Notably, our methods have the same theoretical guarantees as their offline counterparts, and, as far as we know, provide the first guarantees for online distributed submodular optimization in the literature. Finally, in learning tasks on ImageNet and MNIST, we show that our selection methods outperform random selection by $5-20\%$.
△ Less
Submitted 14 December, 2023; v1 submitted 25 January, 2022;
originally announced January 2022.
-
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
Authors:
Anastasios N. Angelopoulos,
Stephen Bates,
Emmanuel J. Candès,
Michael I. Jordan,
Lihua Lei
Abstract:
We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersect…
▽ More
We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use the framework to provide new calibration methods for several core machine learning tasks, with detailed worked examples in computer vision and tabular medical data.
△ Less
Submitted 29 September, 2022; v1 submitted 3 October, 2021;
originally announced October 2021.
-
Calibrated Multiple-Output Quantile Regression with Representation Learning
Authors:
Shai Feldman,
Stephen Bates,
Yaniv Romano
Abstract:
We develop a method to generate predictive regions that cover a multivariate response variable with a user-specified probability. Our work is composed of two components. First, we use a deep generative model to learn a representation of the response that has a unimodal distribution. Existing multiple-output quantile regression approaches are effective in such cases, so we apply them on the learned…
▽ More
We develop a method to generate predictive regions that cover a multivariate response variable with a user-specified probability. Our work is composed of two components. First, we use a deep generative model to learn a representation of the response that has a unimodal distribution. Existing multiple-output quantile regression approaches are effective in such cases, so we apply them on the learned representation, and then transform the solution to the original space of the response. This process results in a flexible and informative region that can have an arbitrary shape, a property that existing methods lack. Second, we propose an extension of conformal prediction to the multivariate response setting that modifies any method to return sets with a pre-specified coverage level. The desired coverage is theoretically guaranteed in the finite-sample case for any distribution. Experiments conducted on both real and synthetic data show that our method constructs regions that are significantly smaller compared to existing techniques.
△ Less
Submitted 23 December, 2022; v1 submitted 2 October, 2021;
originally announced October 2021.
-
Discriminative Attribution from Counterfactuals
Authors:
Nils Eckstein,
Alexander S. Bates,
Gregory S. X. E. Jefferis,
Jan Funke
Abstract:
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations to generate attribution maps that highlight the most discriminative features between pairs of classes. We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner, thus preventing potential observer bias.…
▽ More
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations to generate attribution maps that highlight the most discriminative features between pairs of classes. We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner, thus preventing potential observer bias. We evaluate the proposed method on three diverse datasets, including a challenging artificial dataset and real-world biological data. We show quantitatively and qualitatively that the highlighted features are substantially more discriminative than those extracted using conventional attribution methods and argue that this type of explanation is better suited for understanding fine grained class differences as learned by a deep neural network.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
Authors:
Anastasios N. Angelopoulos,
Stephen Bates
Abstract:
Black-box machine learning models are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Conformal prediction is a user-friendly paradigm for creating statistically rigorous uncertainty sets/intervals for the predictions of such models. Critically, the sets are valid in a distribution-free sense: they p…
▽ More
Black-box machine learning models are now routinely used in high-risk settings, like medical diagnostics, which demand uncertainty quantification to avoid consequential model failures. Conformal prediction is a user-friendly paradigm for creating statistically rigorous uncertainty sets/intervals for the predictions of such models. Critically, the sets are valid in a distribution-free sense: they possess explicit, non-asymptotic guarantees even without distributional assumptions or model assumptions. One can use conformal prediction with any pre-trained model, such as a neural network, to produce sets that are guaranteed to contain the ground truth with a user-specified probability, such as 90%. It is easy-to-understand, easy-to-use, and general, applying naturally to problems arising in the fields of computer vision, natural language processing, deep reinforcement learning, and so on.
This hands-on introduction is aimed to provide the reader a working understanding of conformal prediction and related distribution-free uncertainty quantification techniques with one self-contained document. We lead the reader through practical theory for and examples of conformal prediction and describe its extensions to complex machine learning tasks involving structured outputs, distribution shift, time-series, outliers, models that abstain, and more. Throughout, there are many explanatory illustrations, examples, and code samples in Python. With each code sample comes a Jupyter notebook implementing the method on a real-data example; the notebooks can be accessed and easily run using our codebase.
△ Less
Submitted 7 December, 2022; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Test-time Collective Prediction
Authors:
Celestine Mendler-Dünner,
Wenshuo Guo,
Stephen Bates,
Michael I. Jordan
Abstract:
An increasingly common setting in machine learning involves multiple parties, each with their own data, who want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents to make better predictions than they would individually, but may not be willing to release their data or model parameters. In this work, we explore a decentr…
▽ More
An increasingly common setting in machine learning involves multiple parties, each with their own data, who want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents to make better predictions than they would individually, but may not be willing to release their data or model parameters. In this work, we explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model without relying on external validation, model retraining, or data pooling. Our approach takes inspiration from the literature in social science on human consensus-making. We analyze our mechanism theoretically, showing that it converges to inverse meansquared-error (MSE) weighting in the large-sample limit. To compute error bars on the collective predictions we propose a decentralized Jackknife procedure that evaluates the sensitivity of our mechanism to a single agent's prediction. Empirically, we demonstrate that our scheme effectively combines models with differing quality across the input space. The proposed consensus prediction achieves significant gains over classical model averaging, and even outperforms weighted averaging schemes that have access to additional validation data.
△ Less
Submitted 22 June, 2021;
originally announced June 2021.
-
Improving Conditional Coverage via Orthogonal Quantile Regression
Authors:
Shai Feldman,
Stephen Bates,
Yaniv Romano
Abstract:
We develop a method to generate prediction intervals that have a user-specified coverage level across all regions of feature-space, a property called conditional coverage. A typical approach to this task is to estimate the conditional quantiles with quantile regression -- it is well-known that this leads to correct coverage in the large-sample limit, although it may not be accurate in finite sampl…
▽ More
We develop a method to generate prediction intervals that have a user-specified coverage level across all regions of feature-space, a property called conditional coverage. A typical approach to this task is to estimate the conditional quantiles with quantile regression -- it is well-known that this leads to correct coverage in the large-sample limit, although it may not be accurate in finite samples. We find in experiments that traditional quantile regression can have poor conditional coverage. To remedy this, we modify the loss function to promote independence between the size of the intervals and the indicator of a miscoverage event. For the true conditional quantiles, these two quantities are independent (orthogonal), so the modified loss function continues to be valid. Moreover, we empirically show that the modified loss function leads to improved conditional coverage, as evaluated by several metrics. We also introduce two new metrics that check conditional coverage by looking at the strength of the dependence between the interval size and the indicator of miscoverage.
△ Less
Submitted 2 October, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Private Prediction Sets
Authors:
Anastasios N. Angelopoulos,
Stephen Bates,
Tijana Zrnic,
Michael I. Jordan
Abstract:
In real-world settings involving consequential decision-making, the deployment of machine learning systems generally requires both reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. Our framework is based on conformal prediction, a methodology that augments predictive models to return prediction sets that pro…
▽ More
In real-world settings involving consequential decision-making, the deployment of machine learning systems generally requires both reliable uncertainty quantification and protection of individuals' privacy. We present a framework that treats these two desiderata jointly. Our framework is based on conformal prediction, a methodology that augments predictive models to return prediction sets that provide uncertainty quantification -- they provably cover the true response with a user-specified probability, such as 90%. One might hope that when used with privately-trained models, conformal prediction would yield privacy guarantees for the resulting prediction sets; unfortunately, this is not the case. To remedy this key problem, we develop a method that takes any pre-trained predictive model and outputs differentially private prediction sets. Our method follows the general approach of split conformal prediction; we use holdout data to calibrate the size of the prediction sets but preserve privacy by using a privatized quantile subroutine. This subroutine compensates for the noise introduced to preserve privacy in order to guarantee correct coverage. We evaluate the method on large-scale computer vision datasets.
△ Less
Submitted 3 March, 2024; v1 submitted 11 February, 2021;
originally announced February 2021.
-
Distribution-Free, Risk-Controlling Prediction Sets
Authors:
Stephen Bates,
Anastasios Angelopoulos,
Lihua Lei,
Jitendra Malik,
Michael I. Jordan
Abstract:
While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. To convey instance-wise uncertainty for prediction tasks, we show how to generate set-valued predictions from a black-box…
▽ More
While improving prediction accuracy has been the focus of machine learning in recent years, this alone does not suffice for reliable decision-making. Deploying learning systems in consequential settings also requires calibrating and communicating the uncertainty of predictions. To convey instance-wise uncertainty for prediction tasks, we show how to generate set-valued predictions from a black-box predictor that control the expected loss on future test points at a user-specified level. Our approach provides explicit finite-sample guarantees for any dataset by using a holdout set to calibrate the size of the prediction sets. This framework enables simple, distribution-free, rigorous error control for many tasks, and we demonstrate it in five large-scale machine learning problems: (1) classification problems where some mistakes are more costly than others; (2) multi-label classification, where each observation has multiple associated labels; (3) classification problems where the labels have a hierarchical structure; (4) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (5) protein structure prediction. Lastly, we discuss extensions to uncertainty quantification for ranking, metric learning and distributionally robust learning.
△ Less
Submitted 4 August, 2021; v1 submitted 7 January, 2021;
originally announced January 2021.
-
Uncertainty Sets for Image Classifiers using Conformal Prediction
Authors:
Anastasios Angelopoulos,
Stephen Bates,
Jitendra Malik,
Michael I. Jordan
Abstract:
Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings. Existing uncertainty quantification techniques, such as Platt scaling, attempt to calibrate the network's probability estimates, but they do not have formal guarantees. We present an algorithm that modifies an…
▽ More
Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings. Existing uncertainty quantification techniques, such as Platt scaling, attempt to calibrate the network's probability estimates, but they do not have formal guarantees. We present an algorithm that modifies any classifier to output a predictive set containing the true label with a user-specified probability, such as 90%. The algorithm is simple and fast like Platt scaling, but provides a formal finite-sample coverage guarantee for every model and dataset. Our method modifies an existing conformal prediction algorithm to give more stable predictive sets by regularizing the small scores of unlikely classes after Platt scaling. In experiments on both Imagenet and Imagenet-V2 with ResNet-152 and other classifiers, our scheme outperforms existing approaches, achieving coverage with sets that are often factors of 5 to 10 smaller than a stand-alone Platt scaling baseline.
△ Less
Submitted 3 September, 2022; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Achieving Equalized Odds by Resampling Sensitive Attributes
Authors:
Yaniv Romano,
Stephen Bates,
Emmanuel J. Candès
Abstract:
We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we dev…
▽ More
We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we develop a formal hypothesis test to detect whether a prediction rule violates this property, the first such test in the literature. Both the model fitting and hypothesis testing leverage a resampled version of the sensitive attribute obeying equalized odds, by construction. We demonstrate the applicability and validity of the proposed framework both in regression and multi-class classification problems, reporting improved performance over state-of-the-art methods. Lastly, we show how to incorporate techniques for equitable uncertainty quantification---unbiased for each group under study---to communicate the results of the data analysis in exact terms.
△ Less
Submitted 7 June, 2020;
originally announced June 2020.
-
In-Datacenter Performance Analysis of a Tensor Processing Unit
Authors:
Norman P. Jouppi,
Cliff Young,
Nishant Patil,
David Patterson,
Gaurav Agrawal,
Raminder Bajwa,
Sarah Bates,
Suresh Bhatia,
Nan Boden,
Al Borchers,
Rick Boyle,
Pierre-luc Cantin,
Clifford Chao,
Chris Clark,
Jeremy Coriell,
Mike Daley,
Matt Dau,
Jeffrey Dean,
Ben Gelb,
Tara Vazir Ghaemmaghami,
Rajendra Gottipati,
William Gulland,
Robert Hagmann,
C. Richard Ho,
Doug Hogberg
, et al. (50 additional authors not shown)
Abstract:
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOp…
▽ More
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.
△ Less
Submitted 16 April, 2017;
originally announced April 2017.