Zum Hauptinhalt springen

Showing 1–50 of 89 results for author: Kallus, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12004  [pdf, other

    cs.LG stat.ME stat.ML

    CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies

    Authors: Brian M Cho, Ana-Roxana Pop, Kyra Gan, Sam Corbett-Davies, Israel Nir, Ariel Evnine, Nathan Kallus

    Abstract: When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2406.06452  [pdf, other

    stat.ME cs.LG stat.ML

    Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data

    Authors: Miruna Oprescu, Nathan Kallus

    Abstract: Accurately predicting conditional average treatment effects (CATEs) is crucial in personalized medicine and digital platform analytics. Since often the treatments of interest cannot be directly randomized, observational data is leveraged to learn CATEs, but this approach can incur significant bias from unobserved confounding. One strategy to overcome these limitations is to seek latent quasi-exper… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 20 pages, 3 figures

  3. arXiv:2405.16564  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Contextual Linear Optimization with Bandit Feedback

    Authors: Yichun Hu, Nathan Kallus, Xiaojie Mao, Yanchen Wu

    Abstract: Contextual linear optimization (CLO) uses predictive observations to reduce uncertainty in random cost coefficients and thereby improve average-cost performance. An example is a stochastic shortest path with random edge costs (e.g., traffic) and predictive features (e.g., lagged traffic, weather). Existing work on CLO assumes the data has fully observed cost coefficient vectors, but in many applic… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2405.12119  [pdf, other

    cs.IR cs.AI cs.CL

    Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation

    Authors: Zhankui He, Zhouhang Xie, Harald Steck, Dawen Liang, Rahul Jha, Nathan Kallus, Julian McAuley

    Abstract: Large language models (LLMs) are revolutionizing conversational recommender systems by adeptly indexing item content, understanding complex conversational contexts, and generating relevant item titles. However, controlling the distribution of recommended items remains a challenge. This leads to suboptimal performance due to the failure to capture rapidly changing data distributions, such as item p… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  5. arXiv:2404.00099  [pdf, other

    cs.AI stat.ML

    Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes

    Authors: Andrew Bennett, Nathan Kallus, Miruna Oprescu, Wen Sun, Kaiwen Wang

    Abstract: We study evaluating a policy under best- and worst-case perturbations to a Markov decision process (MDP), given transition observations from the original MDP, whether under the same or different policy. This is an important problem when there is the possibility of a shift between historical and future environments, due to e.g. unmeasured confounding, distributional shift, or an adversarial environ… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 40 pages, 1 figure

  6. arXiv:2403.10671  [pdf, other

    stat.ML cs.LG

    Hessian-Free Laplace in Bayesian Deep Learning

    Authors: James McInerney, Nathan Kallus

    Abstract: The Laplace approximation (LA) of the Bayesian posterior is a Gaussian distribution centered at the maximum a posteriori estimate. Its appeal in Bayesian deep learning stems from the ability to quantify uncertainty post-hoc (i.e., after standard network parameter optimization), the ease of sampling from the approximate posterior, and the analytic form of model evidence. However, an important compu… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures

  7. arXiv:2403.06323  [pdf, other

    cs.LG

    Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to Standard RL

    Authors: Kaiwen Wang, Dawen Liang, Nathan Kallus, Wen Sun

    Abstract: We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. Using an augmented Markov Decision Process (MDP), we propose two general meta-algorithms via reductions to standard RL: one based on optimistic algorithms and another based on policy optimization.… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  8. Is Cosine-Similarity of Embeddings Really About Similarity?

    Authors: Harald Steck, Chaitanya Ekanadham, Nathan Kallus

    Abstract: Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 9 pages

    Journal ref: ACM Web Conference 2024 (WWW 2024 Companion)

  9. arXiv:2403.05385  [pdf, other

    cs.LG

    Switching the Loss Reduces the Cost in Batch (Offline) Reinforcement Learning

    Authors: Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James McInerney, Dawen Liang, Nathan Kallus, Csaba Szepesvári

    Abstract: We propose training fitted Q-iteration with log-loss (FQI-log) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-log scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving small-cost bo… ▽ More

    Submitted 1 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  10. arXiv:2403.02467  [pdf

    econ.EM cs.LG stat.ME stat.ML

    Applied Causal Inference Powered by ML and AI

    Authors: Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

    Abstract: An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools.

    Submitted 4 March, 2024; originally announced March 2024.

  11. arXiv:2402.07198  [pdf, other

    cs.LG

    More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

    Authors: Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun

    Abstract: In this paper, we prove that Distributional Reinforcement Learning (DistRL), which learns the return distribution, can obtain second-order bounds in both online and offline RL in general settings with function approximation. Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributio… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  12. arXiv:2402.06122  [pdf, other

    stat.ME cs.LG stat.ML

    Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

    Authors: Brian Cho, Kyra Gan, Nathan Kallus

    Abstract: We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-by-betting framework and provides a non-asymptotic $α$-level test across any stopping time. Our contributions are two-fold: (1) we propose a novel betting scheme and provide theoreti… ▽ More

    Submitted 2 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: To appear at the Forty-first International Conference on Machine Learning (ICML 2024)

  13. arXiv:2402.01845  [pdf, other

    cs.LG stat.ML

    Multi-Armed Bandits with Interference

    Authors: Su Jia, Peter Frazier, Nathan Kallus

    Abstract: Experimentation with interference poses a significant challenge in contemporary online platforms. Prior research on experimentation with interference has concentrated on the final output of a policy. The cumulative performance, while equally crucial, is less well understood. To address this gap, we introduce the problem of {\em Multi-armed Bandits with Interference} (MABI), where the learner assig… ▽ More

    Submitted 15 July, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  14. arXiv:2312.15574  [pdf, other

    math.ST cs.LG

    Clustered Switchback Experiments: Near-Optimal Rates Under Spatiotemporal Interference

    Authors: Su Jia, Nathan Kallus, Christina Lee Yu

    Abstract: We consider experimentation in the presence of non-stationarity, inter-unit (spatial) interference, and carry-over effects (temporal interference), where we wish to estimate the global average treatment effect (GATE), the difference between average outcomes having exposed all units at all times to treatment or to control. We suppose spatial interference is described by a graph, where a unit's outc… ▽ More

    Submitted 23 June, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

  15. arXiv:2311.03564  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Low-Rank MDPs with Continuous Action Spaces

    Authors: Andrew Bennett, Nathan Kallus, Miruna Oprescu

    Abstract: Low-Rank Markov Decision Processes (MDPs) have recently emerged as a promising framework within the domain of reinforcement learning (RL), as they allow for provably approximately correct (PAC) learning guarantees while also incorporating ML algorithms for representation learning. However, current methods for low-rank MDPs are limited in that they only consider finite action spaces, and give vacuo… ▽ More

    Submitted 1 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: 25 pages, AISTATS 2024

    Journal ref: PMLR, Volume 238, 2024

  16. arXiv:2310.15433  [pdf, other

    cs.LG cs.IR

    Off-Policy Evaluation for Large Action Spaces via Policy Convolution

    Authors: Noveen Sachdeva, Lequn Wang, Dawen Liang, Nathan Kallus, Julian McAuley

    Abstract: Developing accurate off-policy estimators is crucial for both evaluating and optimizing for new policies. The main challenge in off-policy estimation is the distribution shift between the logging policy that generates data and the target policy that we aim to evaluate. Typically, techniques for correcting distribution shift involve some form of importance sampling. This approach results in unbiase… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Under review. 36 pages, 31 figures

  17. Large Language Models as Zero-Shot Conversational Recommenders

    Authors: Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Prasad Majumder, Nathan Kallus, Julian McAuley

    Abstract: In this paper, we present empirical studies on conversational recommendation tasks using representative large language models in a zero-shot setting with three primary contributions. (1) Data: To gain insights into model behavior in "in-the-wild" conversational recommendation scenarios, we construct a new dataset of recommendation-related conversations by scraping a popular discussion website. Thi… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted as CIKM 2023 long paper. Longer version is coming soon (e.g., more details about dataset)

  18. arXiv:2307.13793  [pdf, ps, other

    stat.ME cs.LG econ.EM math.ST stat.ML

    Source Condition Double Robust Inference on Functionals of Inverse Problems

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function. We provide the first source condition double robust inference method that ens… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  19. arXiv:2307.11704  [pdf, other

    cs.LG

    JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

    Authors: Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun

    Abstract: Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost and it is the core NP-hard combinatorial optimization problem of query optimization. In this paper, we present JoinGym, a lightweight and easy-to-use query optimization environment for reinforcement learning (RL) that captures both the left-deep and bushy variants of the JOS problem. Compar… ▽ More

    Submitted 17 October, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: JoinGym is available at https://github.com/kaiwenw/JoinGym!

  20. arXiv:2305.15703  [pdf, ps, other

    cs.LG cs.AI math.OC math.ST stat.ML

    The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

    Authors: Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun

    Abstract: While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than th… ▽ More

    Submitted 22 September, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023

  21. arXiv:2305.14816  [pdf, ps, other

    cs.LG math.ST stat.ML

    Provable Offline Preference-Based Reinforcement Learning

    Authors: Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

    Abstract: In this paper, we investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offl… ▽ More

    Submitted 29 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: The first two authors contribute equally

  22. arXiv:2304.10577  [pdf, other

    cs.LG stat.ML

    B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding

    Authors: Miruna Oprescu, Jacob Dorn, Marah Ghoummaid, Andrew Jesson, Nathan Kallus, Uri Shalit

    Abstract: Estimating heterogeneous treatment effects from observational data is a crucial task across many fields, helping policy and decision-makers take better actions. There has been recent progress on robust and efficient methods for estimating the conditional average treatment effect (CATE) function, but these methods often do not take into account the risk of hidden confounding, which could arbitraril… ▽ More

    Submitted 13 June, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: 20 pages, 4 figures, ICML 2023

    Journal ref: PMLR 202 (2023) 26599-26618

  23. arXiv:2302.05404  [pdf, ps, other

    stat.ML cs.LG econ.EM math.ST stat.ME

    Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: In this paper, we study nonparametric estimation of instrumental variable (IV) regressions. Recently, many flexible machine learning methods have been developed for instrumental variable estimation. However, these methods have at least one of the following limitations: (1) restricting the IV regression to be uniquely identified; (2) only obtaining estimation error rates in terms of pseudometrics (… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Under review

  24. arXiv:2302.03201  [pdf, ps, other

    cs.LG math.OC math.ST stat.ML

    Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

    Authors: Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $τ$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is $Ω(\sqrt{τ^{-1}AK})$, where $A$ is the number of actions and $K$ is the number of episodes, and that it is achieved by an Upper Confidence Bound algorithm with a nov… ▽ More

    Submitted 24 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023

  25. arXiv:2302.02392  [pdf, ps, other

    cs.LG stat.ML

    Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

    Authors: Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

    Abstract: In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage… ▽ More

    Submitted 13 November, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: The original title of this paper was "Refined Value-Based Offline RL under Realizability and Partial Coverage," but it was later changed. This paper has been accepted for NeurIPS 2023

  26. arXiv:2301.12366  [pdf, other

    cs.LG cs.AI math.OC math.ST

    Smooth Non-Stationary Bandits

    Authors: Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier

    Abstract: In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee $\tilde Θ(T^{2/3})$ regret. However, in practice environments are often changing {… ▽ More

    Submitted 7 June, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: Accepted by ICML 2023

  27. arXiv:2212.06355  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    A Review of Off-Policy Evaluation in Reinforcement Learning

    Authors: Masatoshi Uehara, Chengchun Shi, Nathan Kallus

    Abstract: Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Still under revision

  28. arXiv:2211.06457  [pdf, other

    stat.ML cs.LG

    The Implicit Delta Method

    Authors: Nathan Kallus, James McInerney

    Abstract: Epistemic uncertainty quantification is a crucial part of drawing credible conclusions from predictive models, whether concerned about the prediction at a given point or any downstream evaluation that uses the model as input. When the predictive model is simple and its evaluation differentiable, this task is solved by the delta method, where we propagate the asymptotically-normal uncertainty in th… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: 18 pages, NeurIPS 2022

  29. arXiv:2210.14492  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Safe Reinforcement Learning with Binary Feedback

    Authors: Andrew Bennett, Dipendra Misra, Nathan Kallus

    Abstract: Safety is a crucial necessity in many applications of reinforcement learning (RL), whether robotic, automotive, or medical. Many existing approaches to safe RL rely on receiving numeric safety feedback, but in many cases this feedback can only take binary values; that is, whether an action in a given state is safe or unsafe. This is particularly true when feedback comes from human experts. We ther… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  30. arXiv:2207.13081  [pdf, other

    cs.LG stat.ML

    Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

    Authors: Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

    Abstract: We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs.… ▽ More

    Submitted 14 November, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: This paper was accepted in NeurIPS 2023

  31. arXiv:2207.05837  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Learning Bellman Complete Representations for Offline Policy Evaluation

    Authors: Jonathan D. Chang, Kaiwen Wang, Nathan Kallus, Wen Sun

    Abstract: We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted for Long Talk at ICML 2022

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:2938-2971, 2022

  32. arXiv:2206.12081  [pdf, other

    cs.LG stat.ME stat.ML

    Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

    Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

    Abstract: We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emis… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  33. arXiv:2206.12020  [pdf, ps, other

    cs.LG math.ST stat.ME stat.ML

    Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

    Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

    Abstract: We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new \textit{Partially Observable Bilinear Actor-Critic framework}, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as we… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  34. arXiv:2205.11486  [pdf, other

    stat.ML cs.LG econ.EM stat.ME

    Robust and Agnostic Learning of Conditional Distributional Treatment Effects

    Authors: Nathan Kallus, Miruna Oprescu

    Abstract: The conditional average treatment effect (CATE) is the best measure of individual causal effects given baseline covariates. However, the CATE only captures the (conditional) average, and can overlook risks and tail events, which are important to treatment choice. In aggregate analyses, this is usually addressed by measuring the distributional treatment effect (DTE), such as differences in quantile… ▽ More

    Submitted 24 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: 24 pages, 6 figures, AISTATS 2023

    Journal ref: PMLR 206 (2023) 6037-6060

  35. arXiv:2205.10327  [pdf, other

    stat.ME cs.LG econ.EM stat.ML

    What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment

    Authors: Nathan Kallus

    Abstract: The fundamental problem of causal inference -- that we never observe counterfactuals -- prevents us from identifying how many might be negatively affected by a proposed intervention. If, in an A/B test, half of users click (or buy, or watch, or renew, etc.), whether exposed to the standard experience A or a new one B, hypothetically it could be because the change affects no one, because the change… ▽ More

    Submitted 20 November, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

  36. arXiv:2204.06562  [pdf

    cs.CV cs.AI cs.LG

    Estimating Structural Disparities for Face Models

    Authors: Shervin Ardeshir, Cristina Segalin, Nathan Kallus

    Abstract: In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations (groups) of datapoints. Thus, the inputs to disparity quantification consist of a model's predictions $\hat{y}$, the ground-truth labels for the predictions $y$, and group labels $g$ for the data points. Performance of the model for each gr… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Journal ref: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  37. arXiv:2202.09667  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

    Authors: Nathan Kallus, Xiaojie Mao, Kaiwen Wang, Zhengyuan Zhou

    Abstract: Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However, depending entirely on logged data, OPE/L is sensitive to environment distribution shifts -- discrepancies between the data-generating environment and that where policies are deployed. \citet{si2020distributional} prop… ▽ More

    Submitted 18 July, 2022; v1 submitted 19 February, 2022; originally announced February 2022.

    Comments: Short Talk at ICML 2022

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:10598-10632, 2022

  38. arXiv:2112.11449  [pdf, other

    stat.ME cs.LG econ.EM math.OC stat.ML

    Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding

    Authors: Jacob Dorn, Kevin Guo, Nathan Kallus

    Abstract: We consider the problem of constructing bounds on the average treatment effect (ATE) when unmeasured confounders exist but have bounded influence. Specifically, we assume that omitted confounders could not change the odds of treatment for any unit by more than a fixed factor. We derive the sharp partial identification bounds implied by this assumption by leveraging distributionally robust optimiza… ▽ More

    Submitted 22 July, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

  39. arXiv:2111.08664  [pdf, other

    stat.AP cs.CY econ.EM

    An Empirical Evaluation of the Impact of New York's Bail Reform on Crime Using Synthetic Controls

    Authors: Angela Zhou, Andrew Koo, Nathan Kallus, Rene Ropac, Richard Peterson, Stephen Koppel, Tiffany Bergin

    Abstract: We conduct an empirical evaluation of the impact of New York's bail reform on crime. New York State's Bail Elimination Act went into effect on January 1, 2020, eliminating money bail and pretrial detention for nearly all misdemeanor and nonviolent felony defendants. Our analysis of effects on aggregate crime rates after the reform informs the understanding of bail reform and general deterrence. We… ▽ More

    Submitted 25 June, 2023; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: text edits, removed San Francisco/Houston due to bail reform overlap in study period

  40. arXiv:2110.15332  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

    Authors: Andrew Bennett, Nathan Kallus

    Abstract: In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates derived under the assumption of a perfect Markov decision process (MDP) model. Here we tackle this by considering off-policy evaluation in a partially observed MDP… ▽ More

    Submitted 22 March, 2023; v1 submitted 28 October, 2021; originally announced October 2021.

  41. arXiv:2110.10081  [pdf, other

    cs.LG stat.ML

    Stateful Offline Contextual Policy Evaluation and Learning

    Authors: Nathan Kallus, Angela Zhou

    Abstract: We study off-policy evaluation and learning from sequential data in a structured class of Markov decision processes that arise from repeated interactions with an exogenous sequence of arrivals with contexts, which generate unknown individual-level responses to agent actions. This model can be thought of as an offline generalization of contextual bandits with resource constraints. We formalize the… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

  42. arXiv:2110.02919  [pdf, other

    cs.LG stat.ML

    Residual Overfit Method of Exploration

    Authors: James McInerney, Nathan Kallus

    Abstract: Exploration is a crucial aspect of bandit and reinforcement learning algorithms. The uncertainty quantification necessary for exploration often comes from either closed-form expressions based on simple models or resampling and posterior approximations that are computationally intensive. We propose instead an approximate exploration methodology based on fitting only two point estimates, one tuned a… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: 13 pages, 16 figures

  43. arXiv:2106.07914  [pdf, other

    cs.LG stat.ME

    Control Variates for Slate Off-Policy Evaluation

    Authors: Nikos Vlassis, Ashok Chandrashekar, Fernando Amat Gil, Nathan Kallus

    Abstract: We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions, often termed slates. The problem is common to recommender systems and user-interface optimization, and it is particularly challenging because of the combinatorially-sized action space. Swaminathan et al. (2017) have proposed the pseudoinverse (PI) estimator under the assumption that the… ▽ More

    Submitted 2 November, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Journal ref: NeurIPS 2021

  44. arXiv:2106.01723  [pdf, other

    stat.ML cs.LG math.ST

    Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

    Authors: Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan

    Abstract: Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimiz… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

  45. arXiv:2106.00418  [pdf, other

    stat.ML cs.LG math.ST

    Post-Contextual-Bandit Inference

    Authors: Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan

    Abstract: Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

  46. arXiv:2103.14029  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

    Authors: Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

    Abstract: We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available. Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions. In this paper, we tackle the primary challenge to causal inference using negative controls: the identification and estimation of these bridge functions.… ▽ More

    Submitted 9 October, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

  47. arXiv:2102.02981  [pdf, ps, other

    cs.LG math.ST stat.ML

    Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

    Authors: Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie

    Abstract: We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods. Under various combinations of realizability and completeness assumptions, we show that the minimax approach enables us to achieve a fast rate of convergence for weights… ▽ More

    Submitted 24 July, 2022; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: Under Review

  48. arXiv:2102.00479  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Fast Rates for the Regret of Offline Reinforcement Learning

    Authors: Yichun Hu, Nathan Kallus, Masatoshi Uehara

    Abstract: We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted $Q$-iteration (FQI), suggest a $O(1/\sqrt{n})$ convergence for regret, empirical behavior exhibits \emph{much} faster convergence. In this paper, we present a finer regret a… ▽ More

    Submitted 12 July, 2023; v1 submitted 31 January, 2021; originally announced February 2021.

  49. arXiv:2012.11066  [pdf, other

    cs.LG cs.CY stat.ML

    Fairness, Welfare, and Equity in Personalized Pricing

    Authors: Nathan Kallus, Angela Zhou

    Abstract: We study the interplay of fairness, welfare, and equity considerations in personalized pricing based on customer features. Sellers are increasingly able to conduct price personalization based on predictive modeling of demand conditional on covariates: setting customized interest rates, targeted discounts of consumer goods, and personalized subsidies of scarce resources with positive externalities… ▽ More

    Submitted 27 December, 2020; v1 submitted 20 December, 2020; originally announced December 2020.

    Comments: Accepted at FAccT 2021

  50. arXiv:2012.09422  [pdf, ps, other

    cs.LG econ.EM math.ST stat.ML

    The Variational Method of Moments

    Authors: Andrew Bennett, Nathan Kallus

    Abstract: The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. A standard approach reduces the problem to a finite set of marginal moment conditions and applies the optimally weighted generalized method of moments (OWGMM), but this requires we know a finite set of identifying… ▽ More

    Submitted 22 March, 2023; v1 submitted 17 December, 2020; originally announced December 2020.