Zum Hauptinhalt springen

Showing 1–50 of 139 results for author: Wainwright, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20128  [pdf, ps, other

    math.OC cs.GT stat.ML

    Finite-Sample Guarantees for Best-Response Learning Dynamics in Zero-Sum Matrix Games

    Authors: Fathima Zarin Faizal, Asuman Ozdaglar, Martin J. Wainwright

    Abstract: We study best-response type learning dynamics for two player zero-sum matrix games. We consider two settings that are distinguished by the type of information that each player has about the game and their opponent's strategy. The first setting is the full information case, in which each player knows their own and the opponent's payoff matrices and observes the opponent's mixed strategy. The second… ▽ More

    Submitted 7 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: 36 pages; under review

  2. arXiv:2404.10179  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Scaling Instructable Agents Across Many Simulated Worlds

    Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

    Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2401.13588  [pdf

    cs.CL cs.AI cs.SE

    Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes

    Authors: Darren Liu, Cheng Ding, Delgersuren Bold, Monique Bouvier, Jiaying Lu, Benjamin Shickel, Craig S. Jabaley, Wenhui Zhang, Soojin Park, Michael J. Young, Mark S. Wainwright, Gilles Clermont, Parisa Rashidi, Eric S. Rosenthal, Laurie Dimisko, Ran Xiao, Joo Heung Yoon, Carl Yang, Xiao Hu

    Abstract: The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in r… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  5. arXiv:2401.05233  [pdf, other

    cs.LG cs.IT eess.SY math.OC stat.ML

    Taming "data-hungry" reinforcement learning? Stability in continuous state-action spaces

    Authors: Yaqi Duan, Martin J. Wainwright

    Abstract: We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both off-line and on-line settings. Our analysis highlights two key stability properties, relating to how changes in value functions and/or policies affect the Bellman operator and occupation measures. We argue that these properties are satisf… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  6. arXiv:2309.08634  [pdf, other

    stat.ML cs.AI cs.LG stat.AP stat.ME

    Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing

    Authors: Junhui Cai, Ran Chen, Martin J. Wainwright, Linda Zhao

    Abstract: Key challenges in running a retail business include how to select products to present to consumers (the assortment problem), and how to price products (the pricing problem) to maximize revenue or profit. Instead of considering these problems in isolation, we propose a joint approach to assortment-pricing based on contextual bandits. Our model is doubly high-dimensional, in that both context vector… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  7. arXiv:2303.12613  [pdf, other

    math.ST cs.IT

    Noisy recovery from random linear observations: Sharp minimax rates under elliptical constraints

    Authors: Reese Pathak, Martin J. Wainwright, Lin Xiao

    Abstract: Estimation problems with constrained parameter spaces arise in various settings. In many of these problems, the observations available to the statistician can be modelled as arising from the noisy realization of the image of a random linear operator; an important special case is random design regression. We derive sharp rates of estimation for arbitrary compact elliptical parameter sets and demons… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 53 pages, 2 figures

  8. arXiv:2303.02534  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Semi-parametric inference based on adaptively collected data

    Authors: Licong Lin, Koulik Khamaru, Martin J. Wainwright

    Abstract: Many standard estimators, when applied to adaptively collected data, fail to be asymptotically normal, thereby complicating the construction of confidence intervals. We address this challenge in a semi-parametric context: estimating the parameter vector of a generalized linear regression model contaminated by a non-parametric nuisance component. We construct suitably weighted estimating equations… ▽ More

    Submitted 4 March, 2023; originally announced March 2023.

  9. arXiv:2211.03899  [pdf, other

    stat.ML cs.LG math.ST

    Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

    Authors: Yaqi Duan, Martin J. Wainwright

    Abstract: We study non-parametric estimation of the value function of an infinite-horizon $γ$-discounted Markov reward process (MRP) using observations from a single trajectory. We provide non-asymptotic guarantees for a general family of kernel-based multi-step temporal difference (TD) estimates, including canonical $K$-step look-ahead TD for $K = 1, 2, \ldots$ and the TD$(λ)$ family for $λ\in [0,1)$ as sp… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  10. arXiv:2210.11377  [pdf, other

    stat.ML cs.LG math.OC math.ST

    Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

    Authors: Eric Xia, Martin J. Wainwright

    Abstract: We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for policy evaluation in general state spaces. It alternates between fitting the Bellman residual using non-parametric regression (as in boosting), and estimating the value function via the least-squares temporal difference (LSTD) procedure applied with a feature set that grows adaptively over time. By exploiting the connection to… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: 40 pages, 7 figures

  11. arXiv:2210.04334  [pdf, other

    stat.ME cs.LG eess.SP

    QuTE: decentralized multiple testing on sensor networks with false discovery rate control

    Authors: Aaditya Ramdas, Jianbo Chen, Martin J. Wainwright, Michael I. Jordan

    Abstract: This paper designs methods for decentralized multiple hypothesis testing on graphs that are equipped with provable guarantees on the false discovery rate (FDR). We consider the setting where distinct agents reside on the nodes of an undirected graph, and each agent possesses p-values corresponding to one or more hypotheses local to its node. Each agent must individually decide whether to reject on… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: This paper appeared in the IEEE CDC'17 conference proceedings. The last two sections were then developed in 2018, and it is now being put on arXiv simply for easier access

  12. arXiv:2209.13075  [pdf, other

    math.ST cs.IT stat.ML

    Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency

    Authors: Wenlong Mou, Martin J. Wainwright, Peter L. Bartlett

    Abstract: The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures. We analyze a broad class of two-stage procedures that first estimate the treatment effect function, and then use this quantity to estimate the linear functional. We prove non-asymptotic upper bounds on the mean-squared error of such procedures: these bounds re… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 56 pages, 6 figures

  13. arXiv:2206.00796  [pdf, ps, other

    cs.LG

    Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

    Authors: Andrea Zanette, Martin J. Wainwright

    Abstract: The $Q$-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed even with linear function approximation. In practice, tools such as target networks and experience replay appear to be essential, but the individual contribution… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: Appears in ICML 2022

  14. arXiv:2205.02986  [pdf, other

    math.ST cs.LG stat.ML

    Optimally tackling covariate shift in RKHS-based nonparametric regression

    Authors: Cong Ma, Reese Pathak, Martin J. Wainwright

    Abstract: We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chose… ▽ More

    Submitted 6 June, 2023; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: to appear in the Annals of Statistics

  15. arXiv:2203.12786  [pdf, ps, other

    cs.LG

    Bellman Residual Orthogonalization for Offline Reinforcement Learning

    Authors: Andrea Zanette, Martin J. Wainwright

    Abstract: We propose and analyze a reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions. Focusing on applications to model-free offline RL with function approximation, we exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed po… ▽ More

    Submitted 11 October, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: Appears in NeurIPS 2022

  16. arXiv:2202.02837  [pdf, other

    math.ST cs.LG stat.ML

    A new similarity measure for covariate shift with applications to nonparametric regression

    Authors: Reese Pathak, Cong Ma, Martin J. Wainwright

    Abstract: We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions that is based on the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of Hölder continuous function… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Comments: 22 pages, 2 figures, 1 table

  17. arXiv:2201.08536  [pdf, other

    stat.ML cs.LG

    Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

    Authors: Koulik Khamaru, Eric Xia, Martin J. Wainwright, Michael I. Jordan

    Abstract: Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, howev… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

  18. arXiv:2201.08518  [pdf, ps, other

    math.ST cs.LG math.OC stat.ML

    Optimal variance-reduced stochastic approximation in Banach spaces

    Authors: Wenlong Mou, Koulik Khamaru, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

    Abstract: We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contras… ▽ More

    Submitted 29 November, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

  19. arXiv:2112.12770  [pdf, ps, other

    math.OC cs.LG math.PR math.ST stat.ML

    Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

    Authors: Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright, Peter L. Bartlett

    Abstract: We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a… ▽ More

    Submitted 11 May, 2024; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: Published at Mathematical Statistics and Learning

  20. arXiv:2109.12002  [pdf, other

    stat.ML cs.LG math.ST

    Optimal policy evaluation using kernel-based temporal difference methods

    Authors: Yaqi Duan, Mengdi Wang, Martin J. Wainwright

    Abstract: We study methods based on reproducing kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process (MRP). We study a regularized form of the kernel least-squares temporal difference (LSTD) estimate; in the population limit of infinite data, it corresponds to the fixed point of a projected Bellman operator defined by the associated reproducing kern… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  21. arXiv:2108.08812  [pdf, ps, other

    cs.LG

    Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

    Authors: Andrea Zanette, Martin J. Wainwright, Emma Brunskill

    Abstract: Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action valu… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

    Comments: Initial submission; appeared as spotlight talk in ICML 2021 Workshop on Theory of RL

  22. arXiv:2107.02266  [pdf, other

    math.ST cs.LG stat.ML

    Near-optimal inference in adaptive linear regression

    Authors: Koulik Khamaru, Yash Deshpande, Tor Lattimore, Lester Mackey, Martin J. Wainwright

    Abstract: When data is collected in an adaptive manner, even simple methods like ordinary least squares can exhibit non-normal asymptotic behavior. As an undesirable consequence, hypothesis tests and confidence intervals based on asymptotic normality can lead to erroneous results. We propose a family of online debiasing estimators to correct these distributional anomalies in least squares estimation. Our pr… ▽ More

    Submitted 21 March, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

    Comments: 51 pages, 7 figures

  23. arXiv:2106.14352  [pdf, other

    stat.ML cs.LG

    Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

    Authors: Koulik Khamaru, Eric Xia, Martin J. Wainwright, Michael I. Jordan

    Abstract: Various algorithms in reinforcement learning exhibit dramatic variability in their convergence rates and ultimate accuracy as a function of the problem structure. Such instance-specific behavior is not captured by existing global minimax bounds, which are worst-case in nature. We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete st… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

  24. arXiv:2105.01850  [pdf, other

    cs.LG stat.ML

    Preference learning along multiple criteria: A game-theoretic perspective

    Authors: Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright

    Abstract: The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well known that any Nash equilibrium of the zero sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, howe… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: 47 pages; published as a conference paper at NeurIPS 2020

  25. arXiv:2101.07781  [pdf, other

    stat.ML cs.LG math.ST

    Minimax Off-Policy Evaluation for Multi-Armed Bandits

    Authors: Cong Ma, Banghua Zhu, Jiantao Jiao, Martin J. Wainwright

    Abstract: We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards, and develop minimax rate-optimal procedures under three settings. First, when the behavior policy is known, we show that the Switch estimator, a method that alternates between the plug-in and importance sampling estimators, is minimax rate-optimal for all sample sizes. Second, when the behavior poli… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

  26. arXiv:2012.05299  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Optimal oracle inequalities for solving projected fixed-point equations

    Authors: Wenlong Mou, Ashwin Pananjady, Martin J. Wainwright

    Abstract: Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

  27. arXiv:2006.10189  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Revisiting minimum description length complexity in overparameterized models

    Authors: Raaz Dwivedi, Chandan Singh, Bin Yu, Martin J. Wainwright

    Abstract: Complexity is a fundamental concept underlying statistical learning theory that aims to inform generalization performance. Parameter count, while successful in low-dimensional settings, is not well-justified for overparameterized settings when the number of parameters is more than the number of training samples. We revisit complexity measures based on Rissanen's principle of minimum description le… ▽ More

    Submitted 12 October, 2023; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: First two authors contributed equally

  28. arXiv:2005.11411  [pdf, other

    cs.LG math.ST stat.ML

    Instability, Computational Efficiency and Statistical Accuracy

    Authors: Nhat Ho, Koulik Khamaru, Raaz Dwivedi, Martin J. Wainwright, Michael I. Jordan, Bin Yu

    Abstract: Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case. The limiting performance of such estimators depends on the properties of the population-level operator in the idealized limit of infinitely many samples. We develop a general framework that yields bounds on statistical accurac… ▽ More

    Submitted 20 March, 2022; v1 submitted 22 May, 2020; originally announced May 2020.

    Comments: 68 pages, 6 Figures, 2 Tables. First three authors contributed equally

  29. arXiv:2005.05238  [pdf, other

    cs.LG math.OC stat.ML

    FedSplit: An algorithmic framework for fast federated optimization

    Authors: Reese Pathak, Martin J. Wainwright

    Abstract: Motivated by federated learning, we consider the hub-and-spoke model of distributed optimization in which a central authority coordinates the computation of a solution among many agents while limiting communication. We first study some past procedures for federated optimization, and show that their fixed points need not correspond to stationary points of the original optimization problem, even in… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: 27 pages, 4 figures

  30. arXiv:2005.03725  [pdf, other

    math.ST cs.LG stat.ML

    Lower bounds in multiple testing: A framework based on derandomized proxies

    Authors: Max Rabinovich, Michael I. Jordan, Martin J. Wainwright

    Abstract: The large bulk of work in multiple testing has focused on specifying procedures that control the false discovery rate (FDR), with relatively less attention being paid to the corresponding Type II error known as the false non-discovery rate (FNR). A line of more recent work in multiple testing has begun to investigate the tradeoffs between the FDR and FNR and to provide lower bounds on the performa… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

  31. arXiv:2004.04719  [pdf, ps, other

    stat.ML cs.LG math.OC math.ST

    On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration

    Authors: Wenlong Mou, Chris Junchi Li, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

    Abstract: We undertake a precise study of the asymptotic and non-asymptotic properties of stochastic approximation procedures with Polyak-Ruppert averaging for solving a linear system $\bar{A} θ= \bar{b}$. When the matrix $\bar{A}$ is Hurwitz, we prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity. The CLT characterizes the exact asym… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

  32. arXiv:2003.07337  [pdf, other

    stat.ML cs.LG math.OC

    Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

    Authors: Koulik Khamaru, Ashwin Pananjady, Feng Ruan, Martin J. Wainwright, Michael I. Jordan

    Abstract: We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations s… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: 38 pages, 3 figures

  33. arXiv:1912.05153  [pdf, other

    stat.ML cs.DS cs.LG math.PR stat.CO

    Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing

    Authors: Wenlong Mou, Nhat Ho, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

    Abstract: We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For… ▽ More

    Submitted 11 December, 2019; originally announced December 2019.

  34. arXiv:1910.00551  [pdf, ps, other

    stat.ML cs.DS cs.LG stat.CO

    An Efficient Sampling Algorithm for Non-smooth Composite Potentials

    Authors: Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

    Abstract: We consider the problem of sampling from a density of the form $p(x) \propto \exp(-f(x)- g(x))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth and strongly convex function and $g: \mathbb{R}^d \rightarrow \mathbb{R}$ is a convex and Lipschitz function. We propose a new algorithm based on the Metropolis-Hastings framework, and prove that it mixes to within TV distance $\varepsilon$ of… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

  35. arXiv:1909.08749  [pdf, other

    stat.ML cs.LG math.OC math.PR math.ST

    Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular reinforcement learning

    Authors: Ashwin Pananjady, Martin J. Wainwright

    Abstract: Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without a… ▽ More

    Submitted 15 September, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

    Comments: Version v2 is consistent with manuscript to appear in IEEE Transactions on Information Theory

  36. arXiv:1908.10859  [pdf, ps, other

    stat.ML cs.DS cs.LG math.OC stat.CO

    High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

    Authors: Wenlong Mou, Yi-An Ma, Martin J. Wainwright, Peter L. Bartlett, Michael I. Jordan

    Abstract: We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we develop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generali… ▽ More

    Submitted 26 May, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

    Comments: Changes from v1: improved algorithm with $O (d^{1/4} / \varepsilon^{1/2})$ mixing time

  37. arXiv:1906.04697  [pdf, other

    cs.LG math.OC stat.ML

    Variance-reduced $Q$-learning is minimax optimal

    Authors: Martin J. Wainwright

    Abstract: We introduce and analyze a form of variance-reduced $Q$-learning. For $γ$-discounted MDPs with finite state space $\mathcal{X}$ and action space $\mathcal{U}$, we prove that it yields an $ε$-accurate estimate of the optimal $Q$-function in the $\ell_\infty$-norm using $\mathcal{O} \left(\left(\frac{D}{ ε^2 (1-γ)^3} \right) \; \log \left( \frac{D}{(1-γ)} \right) \right)$ samples, where… ▽ More

    Submitted 8 August, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

    Comments: Update from v1: new Proposition 1 on minimax optimality; updated referencing and discussion of related work

  38. arXiv:1905.12247  [pdf, other

    stat.ML cs.LG stat.CO

    Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients

    Authors: Yuansi Chen, Raaz Dwivedi, Martin J. Wainwright, Bin Yu

    Abstract: Hamiltonian Monte Carlo (HMC) is a state-of-the-art Markov chain Monte Carlo sampling algorithm for drawing samples from smooth probability densities over continuous spaces. We study the variant most widely used in practice, Metropolized HMC with the Störmer-Verlet or leapfrog integrator, and make two primary contributions. First, we provide a non-asymptotic upper bound on the mixing time of the M… ▽ More

    Submitted 11 January, 2021; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: 73 pages, 2 figures, fixed a mistake in the proof of Lemma 11, accepted in JMLR

  39. arXiv:1905.06265  [pdf, other

    cs.LG math.OC stat.ML

    Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learning

    Authors: Martin J. Wainwright

    Abstract: Motivated by the study of $Q$-learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We prove a general sandwich relation on the iterate error at each time, and use it to derive non-asymptotic bounds on the error in terms of a cone-indu… ▽ More

    Submitted 24 June, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: Changes from v1: -- Part of Lemma 1 was incorrect; corrected -- proof of Lemma 2: fixed minor typo in equation (36)

  40. arXiv:1904.02144  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

    Authors: Jianbo Chen, Michael I. Jordan, Martin J. Wainwright

    Abstract: The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks opt… ▽ More

    Submitted 27 April, 2020; v1 submitted 3 April, 2019; originally announced April 2019.

  41. arXiv:1902.00194  [pdf, other

    math.ST cs.LG stat.ML

    Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models

    Authors: Raaz Dwivedi, Nhat Ho, Koulik Khamaru, Martin J. Wainwright, Michael I. Jordan, Bin Yu

    Abstract: We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We provide a rigorous characterization of EM for fitting a weakly identif… ▽ More

    Submitted 15 November, 2021; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 30 pages, 4 figures. The first three authors contributed equally to this work. To appear in AISTATS 2020

  42. arXiv:1812.08305  [pdf, ps, other

    cs.LG math.OC stat.ML

    Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

    Authors: Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright

    Abstract: We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order eva… ▽ More

    Submitted 18 May, 2020; v1 submitted 19 December, 2018; originally announced December 2018.

    Comments: Version v3 consistent with paper appearing in JMLR

  43. arXiv:1808.02610  [pdf, other

    cs.LG stat.ML

    L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data

    Authors: Jianbo Chen, Le Song, Martin J. Wainwright, Michael I. Jordan

    Abstract: We study instancewise feature importance scoring as a method for model interpretation. Any such method yields, for each predicted instance, a vector of importance scores associated with the feature vector. Methods based on the Shapley score have been proposed as a fair way of computing feature attributions of this kind, but incur an exponential complexity in the number of features. This combinator… ▽ More

    Submitted 7 August, 2018; originally announced August 2018.

  44. arXiv:1806.09544  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Towards Optimal Estimation of Bivariate Isotonic Matrices with Unknown Permutations

    Authors: Cheng Mao, Ashwin Pananjady, Martin J. Wainwright

    Abstract: Many applications, including rank aggregation, crowd-labeling, and graphon estimation, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and/or columns. We consider the problem of estimating an unknown matrix in this class, based on noisy observations of (possibly, a subset of) its entries. We design and analyze polynomial-time algorithms that impr… ▽ More

    Submitted 26 October, 2019; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: 60 pages, 1 figure. This paper is a longer version of the paper arXiv:1802.09963 v3, which appeared in part as a 4-page extended abstract at Conference on Learning Theory (COLT) 2018. This paper studies the problem in more general settings and in another error metric. This version corrects a statement in Theorem 2 of v1

  45. arXiv:1804.09629  [pdf, other

    stat.ML cs.LG math.OC

    Convergence guarantees for a class of non-convex and non-smooth optimization problems

    Authors: Koulik Khamaru, Martin J. Wainwright

    Abstract: We consider the problem of finding critical points of functions that are non-convex and non-smooth. Studying a fairly broad class of such problems, we analyze the behavior of three gradient-based methods (gradient descent, proximal update, and Frank-Wolfe update). For each of these methods, we establish rates of convergence for general problems, and also prove faster rates for continuous sub-analy… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Comments: 50 pages, 2 figures

  46. arXiv:1803.07763  [pdf, other

    math.ST cs.IT

    From Gauss to Kolmogorov: Localized Measures of Complexity for Ellipses

    Authors: Yuting Wei, Billy Fang, Martin J. Wainwright

    Abstract: The Gaussian width is a fundamental quantity in probability, statistics and geometry, known to underlie the intrinsic difficulty of estimation and hypothesis testing. In this work, we show how the Gaussian width, when localized to any given point of an ellipse, can be controlled by the Kolmogorov width of a set similarly localized. This connection leads to an explicit characterization of the estim… ▽ More

    Submitted 21 March, 2018; originally announced March 2018.

  47. arXiv:1802.09963  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Breaking the $1/\sqrt{n}$ Barrier: Faster Rates for Permutation-based Models in Polynomial Time

    Authors: Cheng Mao, Ashwin Pananjady, Martin J. Wainwright

    Abstract: Many applications, including rank aggregation and crowd-labeling, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and columns. We consider the problem of estimating such a matrix based on noisy observations of a subset of its entries, and design and analyze a polynomial-time algorithm that improves upon the state of the art. In particular, our re… ▽ More

    Submitted 5 June, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: 30 pages, 1 figure. Accepted for presentation at Conference on Learning Theory (COLT) 2018

  48. arXiv:1802.09098  [pdf, other

    stat.ME cs.LG math.ST

    SAFFRON: an adaptive algorithm for online control of the false discovery rate

    Authors: Aaditya Ramdas, Tijana Zrnic, Martin Wainwright, Michael Jordan

    Abstract: In the online false discovery rate (FDR) problem, one observes a possibly infinite sequence of $p$-values $P_1,P_2,\dots$, each testing a different null hypothesis, and an algorithm must pick a sequence of rejection thresholds $α_1,α_2,\dots$ in an online fashion, effectively rejecting the $k$-th null hypothesis whenever $P_k \leq α_k$. Importantly, $α_k$ must be a function of the past, and cannot… ▽ More

    Submitted 10 July, 2019; v1 submitted 25 February, 2018; originally announced February 2018.

    Comments: 19 pages, 13 figures

  49. arXiv:1802.07814  [pdf, other

    cs.LG cs.AI stat.ML

    Learning to Explain: An Information-Theoretic Perspective on Model Interpretation

    Authors: Jianbo Chen, Le Song, Martin J. Wainwright, Michael I. Jordan

    Abstract: We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given t… ▽ More

    Submitted 13 June, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    Comments: Accepted to ICML 2018 as a long oral

  50. arXiv:1801.01253  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    Approximate Ranking from Pairwise Comparisons

    Authors: Reinhard Heckel, Max Simchowitz, Kannan Ramchandran, Martin J. Wainwright

    Abstract: A common problem in machine learning is to rank a set of n items based on pairwise comparisons. Here ranking refers to partitioning the items into sets of pre-specified sizes according to their scores, which includes identification of the top-k items as the most prominent special case. The score of a given item is defined as the probability that it beats a randomly chosen other item. Finding an ex… ▽ More

    Submitted 4 January, 2018; originally announced January 2018.

    Comments: AISTATS 2017