-
Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty
Authors:
Kaizhao Liu,
Jose Blanchet,
Lexing Ying,
Yiping Lu
Abstract:
Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result kno…
▽ More
Bootstrap is a popular methodology for simulating input uncertainty. However, it can be computationally expensive when the number of samples is large. We propose a new approach called \textbf{Orthogonal Bootstrap} that reduces the number of required Monte Carlo replications. We decomposes the target being simulated into two parts: the \textit{non-orthogonal part} which has a closed-form result known as Infinitesimal Jackknife and the \textit{orthogonal part} which is easier to be simulated. We theoretically and numerically show that Orthogonal Bootstrap significantly reduces the computational cost of Bootstrap while improving empirical accuracy and maintaining the same width of the constructed interval.
△ Less
Submitted 30 April, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls
Authors:
Yiping Lu,
Jiajin Li,
Lexing Ying,
Jose Blanchet
Abstract:
The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the dif…
▽ More
The optimal design of experiments typically involves solving an NP-hard combinatorial optimization problem. In this paper, we aim to develop a globally convergent and practically efficient optimization algorithm. Specifically, we consider a setting where the pre-treatment outcome data is available and the synthetic control estimator is invoked. The average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the observed data. {Under this setting, we surprisingly observed that the optimal experimental design problem could be reduced to a so-called \textit{phase synchronization} problem.} We solve this problem via a normalized variant of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design when pre-treatment data is sampled from certain data-generating processes. Empirically, we conduct extensive experiments to demonstrate the effectiveness of our method on both the US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean square error, our algorithm surpasses the random design by a large margin.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Minimax Optimal Kernel Operator Learning via Multilevel Training
Authors:
Jikai Jin,
Yiping Lu,
Jose Blanchet,
Lexing Ying
Abstract:
Learning mappings between infinite-dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbe…
▽ More
Learning mappings between infinite-dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbert spaces. We establish the information-theoretic lower bound in terms of the Sobolev Hilbert-Schmidt norm and show that a regularization that learns the spectral components below the bias contour and ignores the ones that are above the variance contour can achieve the optimal learning rate. At the same time, the spectral components between the bias and variance contours give us flexibility in designing computationally feasible machine learning algorithms. Based on this observation, we develop a multilevel kernel operator learning algorithm that is optimal when learning linear operators between infinite-dimensional function spaces.
△ Less
Submitted 24 July, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Asymptotically Optimal Control of a Centralized Dynamic Matching Market with General Utilities
Authors:
Jose H. Blanchet,
Martin I. Reiman,
Viragh Shah,
Lawrence M. Wein,
Linjia Wu
Abstract:
We consider a matching market where buyers and sellers arrive according to independent Poisson processes at the same rate and independently abandon the market if not matched after an exponential amount of time with the same mean. In this centralized market, the utility for the system manager from matching any buyer and any seller is a general random variable. We consider a sequence of systems inde…
▽ More
We consider a matching market where buyers and sellers arrive according to independent Poisson processes at the same rate and independently abandon the market if not matched after an exponential amount of time with the same mean. In this centralized market, the utility for the system manager from matching any buyer and any seller is a general random variable. We consider a sequence of systems indexed by $n$ where the arrivals in the $n^{\mathrm{th}}$ system are sped up by a factor of $n$. We analyze two families of one-parameter policies: the population threshold policy immediately matches an arriving agent to its best available mate only if the number of mates in the system is above a threshold, and the utility threshold policy matches an arriving agent to its best available mate only if the corresponding utility is above a threshold. Using a fluid analysis of the two-dimensional Markov process of buyers and sellers, we show that when the matching utility distribution is light-tailed, the population threshold policy with threshold $\frac{n}{\ln n}$ is asymptotically optimal among all policies that make matches only at agent arrival epochs. In the heavy-tailed case, we characterize the optimal threshold level for both policies. We also study the utility threshold policy in an unbalanced matching market with heavy-tailed matching utilities and find that the buyers and sellers have the same asymptotically optimal utility threshold. We derive optimal thresholds when the matching utility distribution is exponential, uniform, Pareto, and correlated Pareto. We find that as the right tail of the matching utility distribution gets heavier, the threshold level of each policy (and hence market thickness) increases, as does the magnitude by which the utility threshold policy outperforms the population threshold policy.
△ Less
Submitted 10 June, 2021; v1 submitted 8 February, 2020;
originally announced February 2020.
-
Semi-parametric dynamic contextual pricing
Authors:
Virag Shah,
Jose Blanchet,
Ramesh Johari
Abstract:
Motivated by the application of real-time pricing in e-commerce platforms, we consider the problem of revenue-maximization in a setting where the seller can leverage contextual information describing the customer's history and the product's type to predict her valuation of the product. However, her true valuation is unobservable to the seller, only binary outcome in the form of success-failure of…
▽ More
Motivated by the application of real-time pricing in e-commerce platforms, we consider the problem of revenue-maximization in a setting where the seller can leverage contextual information describing the customer's history and the product's type to predict her valuation of the product. However, her true valuation is unobservable to the seller, only binary outcome in the form of success-failure of a transaction is observed. Unlike in usual contextual bandit settings, the optimal price/arm given a covariate in our setting is sensitive to the detailed characteristics of the residual uncertainty distribution. We develop a semi-parametric model in which the residual distribution is non-parametric and provide the first algorithm which learns both regression parameters and residual distribution with $\tilde O(\sqrt{n})$ regret. We empirically test a scalable implementation of our algorithm and observe good performance.
△ Less
Submitted 10 August, 2019; v1 submitted 7 January, 2019;
originally announced January 2019.