Skip to main content

Showing 1–50 of 86 results for author: Langford, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13765  [pdf, other

    cs.LG cs.AI cs.CV

    Towards Principled Representation Learning from Videos for Reinforcement Learning

    Authors: Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford

    Abstract: We study pre-training representations for decision-making using video data, which is abundantly available for tasks such as game agents and software testing. Even though significant empirical advances have been made on this problem, a theoretical understanding remains absent. We initiate the theoretical investigation into principled approaches for representation learning and focus on learning the… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: ICLR 2024 Spotlight Conference Paper

  2. arXiv:2403.00833  [pdf, other

    cs.AI

    Position Paper: Agent AI Towards a Holistic Intelligence

    Authors: Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao

    Abstract: Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments. In leveraging the power of foundation models, it is crucial for AI research to pivot away from excessive reductionism and toward an emphasis on systems that function as cohesive wholes. Specifically, we emphasize developing Agent AI -- an embodied system that… ▽ More

    Submitted 28 February, 2024; originally announced March 2024.

    Comments: 22 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2401.03568

  3. arXiv:2402.06187  [pdf, other

    cs.LG cs.AI cs.RO

    Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

    Authors: Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang

    Abstract: We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: Accepted at Forty-first International Conference on Machine Learning (ICML 2024)

  4. arXiv:2311.03534  [pdf, other

    cs.LG cs.AI cs.RO

    PcLast: Discovering Plannable Continuous Latent States

    Authors: Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb

    Abstract: Goal-conditioned planning benefits from learned low-dimensional representations of rich observations. While compact latent representations typically learned from variational autoencoders or inverse dynamics enable goal-conditioned decision making, they ignore state reachability, hampering their performance. In this paper, we learn a representation that associates reachable states together for effe… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at ICML 2024

  5. arXiv:2307.15039  [pdf, other

    cs.HC

    EyeO: Autocalibrating Gaze Output with Gaze Input

    Authors: Akanksha Saran, Jacob Alber, Cyril Zhang, Ann Paradiso, Danielle Bragg, John Langford

    Abstract: Gaze tracking devices have the potential to greatly expand interactivity, yet miscalibration remains a significant barrier to use. As devices miscalibrate, people tend to compensate by intentionally offsetting their gaze, which makes detecting miscalibration from eye signals difficult. To help address this problem, we propose a novel approach to seamless calibration based on the insight that the s… ▽ More

    Submitted 31 October, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  6. arXiv:2303.02535  [pdf, other

    cs.LG

    Streaming Active Learning with Deep Neural Networks

    Authors: Akanksha Saran, Safoora Yousefi, Akshay Krishnamurthy, John Langford, Jordan T. Ash

    Abstract: Active learning is perhaps most naturally posed as an online learning problem. However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. This paper proposes VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are e… ▽ More

    Submitted 6 June, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: ICML 2023

  7. arXiv:2211.07614  [pdf, other

    cs.LG

    Towards Data-Driven Offline Simulations for Online Reinforcement Learning

    Authors: Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman

    Abstract: Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks. Yet, it is still uncommon to deploy a dynamically learning agent (rather than a fixed policy) to a production system, as it's perceived as unsafe. Using historical data to reason about learning algorithms, similar to offline policy evaluat… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Presented at the 3rd Offline Reinforcement Learning Workshop at NeurIPS 2022

  8. arXiv:2211.00164  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

    Authors: Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

    Abstract: Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenou… ▽ More

    Submitted 13 August, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

    Comments: ICML 2023

  9. arXiv:2210.14077  [pdf, other

    cs.LG

    Eigen Memory Trees

    Authors: Mark Rucker, Jordan T. Ash, John Langford, Paul Mineiro, Ida Momennejad

    Abstract: This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios. EMTs store data at the leaves of a binary tree and route new samples through the structure using the principal components of previous experiences, facilitating efficient (logarithmic) access to relevant memories. We demonstrate that EMT outperforms existing online memory approaches, and… ▽ More

    Submitted 31 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: corrected an author name; corrected title plurality

  10. arXiv:2207.08229  [pdf, other

    cs.LG cs.RO stat.ML

    Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

    Authors: Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford

    Abstract: In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex… ▽ More

    Submitted 27 December, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

    Comments: Project Website: https://controllable-latent-state.github.io/

  11. arXiv:2207.05836  [pdf, other

    cs.LG stat.ML

    Contextual Bandits with Large Action Spaces: Made Practical

    Authors: Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

    Abstract: A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decisi… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: To appear at ICML 2022

  12. arXiv:2206.08364  [pdf, other

    cs.LG cs.AI cs.HC stat.ML

    Interaction-Grounded Learning with Action-inclusive Feedback

    Authors: Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

    Abstract: Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches f… ▽ More

    Submitted 12 October, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

    Comments: Published in NeurIPS 2022

  13. arXiv:2206.04282  [pdf, ps, other

    cs.LG

    Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

    Authors: Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford

    Abstract: In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand. Learning from high-dimensional observations has been the subject of extensive investigation in supervised learning and statistics (e.g., via sparsity), but analogous issues in reinforcement learning are not well und… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2022

  14. arXiv:2202.05318  [pdf, other

    stat.ML cs.CR cs.LG math.OC

    Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

    Authors: Alberto Bietti, Chen-Yu Wei, Miroslav Dudík, John Langford, Zhiwei Steven Wu

    Abstract: Large-scale machine learning systems often involve data distributed across a collection of users. Federated learning algorithms leverage this structure by communicating model updates to a central server, rather than entire datasets. In this paper, we study stochastic optimization algorithms for a personalized federated learning setting involving local and global models subject to user-level (joint… ▽ More

    Submitted 15 July, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: ICML

  15. arXiv:2110.08847  [pdf, other

    cs.LG

    Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

    Authors: Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

    Abstract: Many real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations and subsequently plan efficiently. However, such approach… ▽ More

    Submitted 5 March, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: ICLR 2022

  16. arXiv:2106.04887  [pdf, other

    cs.LG cs.AI stat.ML

    Interaction-Grounded Learning

    Authors: Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

    Abstract: Consider a prosthetic arm, learning to adapt to its user's control signals. We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional context vec… ▽ More

    Submitted 13 July, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: Published in ICML 2021

  17. arXiv:2106.04815  [pdf, other

    cs.LG

    ChaCha for Online AutoML

    Authors: Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi

    Abstract: We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings. ChaCha handles the process of determining a champion and scheduling a set of `live' challengers over time based on sample complexity bounds. It is guaranteed to have sublinear regret after the optimal configuration is added into consideration by an application-dependen… ▽ More

    Submitted 11 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: 16 pages (including supplementary appendix). Appearing at ICML 2021

    Journal ref: ICML 2021

  18. arXiv:2011.12715  [pdf, other

    cs.AI cs.LG cs.NI cs.SE

    Resonance: Replacing Software Constants with Context-Aware Models in Real-time Communication

    Authors: Jayant Gupchup, Ashkan Aazami, Yaran Fan, Senja Filipi, Tom Finley, Scott Inglis, Marcus Asteborg, Luke Caroll, Rajan Chari, Markus Cozowicz, Vishak Gopal, Vinod Prakash, Sasikanth Bendapudi, Jack Gerrits, Eric Lau, Huazhou Liu, Marco Rossi, Dima Slobodianyk, Dmitri Birjukov, Matty Cooper, Nilesh Javar, Dmitriy Perednya, Sriram Srinivasan, John Langford, Ross Cutler , et al. (1 additional authors not shown)

    Abstract: Large software systems tune hundreds of 'constants' to optimize their runtime performance. These values are commonly derived through intuition, lab tests, or A/B tests. A 'one-size-fits-all' approach is often sub-optimal as the best value depends on runtime context. In this paper, we provide an experimental approach to replace constants with learned contextual functions for Skype - a widely used r… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: Workshop on ML for Systems at NeurIPS 2020, Accepted

    Journal ref: ML for Systems, NeurIPS 2020

  19. arXiv:2010.03799  [pdf, ps, other

    cs.LG math.OC math.ST stat.ML

    Learning the Linear Quadratic Regulator from Nonlinear Observations

    Authors: Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

    Abstract: We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learn… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: To appear at NeurIPS 2020

  20. arXiv:2006.07507  [pdf, other

    cs.LG stat.ML

    Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting

    Authors: Keyi Chen, John Langford, Francesco Orabona

    Abstract: Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance. In practical applications, however, there remains an empirical gap between tuned stochastic gradient descent (SGD) and PFSGD. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on trunca… ▽ More

    Submitted 3 May, 2022; v1 submitted 12 June, 2020; originally announced June 2020.

  21. arXiv:2006.06040  [pdf, other

    cs.LG stat.ML

    Efficient Contextual Bandits with Continuous Actions

    Authors: Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins

    Abstract: We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.

    Submitted 3 December, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: To appear at NeurIPS 2020

  22. arXiv:2004.03544  [pdf, other

    cs.CR

    PACT: Privacy Sensitive Protocols and Mechanisms for Mobile Contact Tracing

    Authors: Justin Chan, Dean Foster, Shyam Gollakota, Eric Horvitz, Joseph Jaeger, Sham Kakade, Tadayoshi Kohno, John Langford, Jonathan Larson, Puneet Sharma, Sudheesh Singanamalla, Jacob Sunshine, Stefano Tessaro

    Abstract: The global health threat from COVID-19 has been controlled in a number of instances by large-scale testing and contact tracing efforts. We created this document to suggest three functionalities on how we might best harness computing technologies to supporting the goals of public health organizations in minimizing morbidity and mortality associated with the spread of COVID-19, while protecting the… ▽ More

    Submitted 7 May, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: 22 pages, 2 figures

  23. arXiv:2003.12880  [pdf, other

    cs.LG stat.ML

    Federated Residual Learning

    Authors: Alekh Agarwal, John Langford, Chen-Yu Wei

    Abstract: We study a new form of federated learning where the clients train personalized local models and make predictions jointly with the server-side shared model. Using this new federated learning framework, the complexity of the central shared model can be minimized while still gaining all the performance benefits that joint training provides. Our framework is robust to data heterogeneity, addressing th… ▽ More

    Submitted 28 March, 2020; originally announced March 2020.

  24. arXiv:1911.05815  [pdf, other

    cs.LG stat.ML

    Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

    Authors: Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford

    Abstract: We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space. The algorithm interleaves representation learning to identify a new notion of kinematic state abstraction with strategic exploration to reach new states using the learned abstraction. The algorithm provably explores the environment with… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  25. arXiv:1906.03671  [pdf, other

    cs.LG stat.ML

    Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

    Authors: Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal

    Abstract: We design a new algorithm for batch active learning with deep neural network models. Our algorithm, Batch Active learning by Diverse Gradient Embeddings (BADGE), samples groups of points that are disparate and high-magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch. Crucially, BA… ▽ More

    Submitted 23 February, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Journal ref: 2020 International Conference on Learning Representations

  26. arXiv:1906.03323  [pdf, other

    cs.LG stat.ML

    Empirical Likelihood for Contextual Bandits

    Authors: Nikos Karampatziakis, John Langford, Paul Mineiro

    Abstract: We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that se… ▽ More

    Submitted 17 October, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: Accepted at NeurIPS 2020

  27. arXiv:1905.13360  [pdf, other

    cs.LG stat.ML

    Efficient Forward Architecture Search

    Authors: Hanzhang Hu, John Langford, Rich Caruana, Saurajit Mukherjee, Eric Horvitz, Debadeepta Dey

    Abstract: We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers. The added shortcut connections effectively perform gradient boosting on the augmented layers. The proposed algorithm is motivated by the feature selection algorithm forward stage-wise linear regression, since we consider NAS as a generalization of feature selectio… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: preprint

  28. arXiv:1902.01520  [pdf, other

    stat.ML cs.LG

    Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

    Authors: Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

    Abstract: We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent "zooming" behavior and, with no tuning, yield improved guarantees for… ▽ More

    Submitted 20 June, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: 41 pages, 1 figure, preliminary version in COLT 2019

  29. arXiv:1901.09018  [pdf, other

    cs.LG stat.ML

    Provably efficient RL with Rich Observations via Latent State Decoding

    Authors: Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

    Abstract: We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps -- where previously decoded latent states provide labels for later regression problems --… ▽ More

    Submitted 9 September, 2021; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: The ICML 2019 version omitted the second constraint on $ε$ in Theorem 4.1. We thank Yonathan Efroni for calling this to our attention

  30. arXiv:1901.00301  [pdf, other

    cs.LG stat.ML

    Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

    Authors: Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford, Sahand N Negahban

    Abstract: We investigate the feasibility of learning from a mix of both fully-labeled supervised data and contextual bandit data. We specifically consider settings in which the underlying learning signal may be different between these two data sources. Theoretically, we state and prove no-regret algorithms for learning that is robust to misaligned cost distributions between the two sources. Empirically, we… ▽ More

    Submitted 21 June, 2019; v1 submitted 2 January, 2019; originally announced January 2019.

    Comments: 42 pages, 21 figures, ICML 2019

  31. arXiv:1811.08540  [pdf, other

    cs.LG stat.ML

    Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches

    Authors: Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

    Abstract: We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy. We design new algorithms for RL with a generic model class and analyze their statistical properties. Our algorithms have sample complexity governed by a new structural parameter called the witness rank, which… ▽ More

    Submitted 30 May, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: COLT 2019

  32. arXiv:1807.06473  [pdf, other

    cs.LG stat.ML

    Contextual Memory Trees

    Authors: Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

    Abstract: We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size. It is designed to efficiently query for memories from that store, supporting logarithmic time insertion and retrieval operations. Hence CMT can be integrated into existing statistical learning algorithms as an augmented memory unit without substanti… ▽ More

    Submitted 2 June, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

    Comments: ICM 2019

  33. arXiv:1803.02453  [pdf, other

    cs.LG

    A Reductions Approach to Fair Classification

    Authors: Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, Hanna Wallach

    Abstract: We present a systematic approach for achieving fairness in a binary classification setting. While we focus on two well-known quantitative definitions of fairness, our approach encompasses many other previously studied definitions as special cases. The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier wit… ▽ More

    Submitted 16 July, 2018; v1 submitted 6 March, 2018; originally announced March 2018.

  34. arXiv:1803.00606  [pdf, other

    cs.LG stat.ML

    On Oracle-Efficient PAC RL with Rich Observations

    Authors: Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

    Abstract: We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and… ▽ More

    Submitted 16 January, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

    Comments: appeared at NeurIPS 18; full paper including appendix; updated style file

  35. arXiv:1802.04064  [pdf, other

    stat.ML cs.LG

    A Contextual Bandit Bake-off

    Authors: Alberto Bietti, Alekh Agarwal, John Langford

    Abstract: Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithm… ▽ More

    Submitted 4 June, 2021; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: JMLR

  36. arXiv:1708.01799  [pdf, ps, other

    cs.LG stat.ML

    Efficient Contextual Bandits in Non-stationary Worlds

    Authors: Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford

    Abstract: Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a ch… ▽ More

    Submitted 3 April, 2019; v1 submitted 5 August, 2017; originally announced August 2017.

  37. arXiv:1706.04964  [pdf, ps, other

    cs.LG

    Learning Deep ResNet Blocks Sequentially using Boosting Theory

    Authors: Furong Huang, Jordan Ash, John Langford, Robert Schapire

    Abstract: Deep neural networks are known to be difficult to train due to the instability of back-propagation. A deep \emph{residual network} (ResNet) with identity loops remedies this by stabilizing gradient computations. We prove a boosting theory for the ResNet architecture. We construct $T$ weak module classifiers, each contains two of the $T$ layers, such that the combined strong learner is a ResNet. Th… ▽ More

    Submitted 14 June, 2018; v1 submitted 15 June, 2017; originally announced June 2017.

    Comments: Accepted to ICML 2018

  38. arXiv:1704.08795  [pdf, other

    cs.CL

    Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

    Authors: Dipendra Misra, John Langford, Yoav Artzi

    Abstract: We propose to directly map raw visual observations and text input to actions for instruction execution. While existing approaches assume access to structured environment representations or use a pipeline of separately trained models, we learn a single model to jointly reason about linguistic and visual input. We use reinforcement learning in a contextual bandit setting to train a neural network ag… ▽ More

    Submitted 22 July, 2017; v1 submitted 27 April, 2017; originally announced April 2017.

    Comments: In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017

  39. arXiv:1703.01014  [pdf, other

    cs.LG stat.ML

    Active Learning for Cost-Sensitive Classification

    Authors: Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daume III, John Langford

    Abstract: We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs. Our algorithm, COAL, makes predictions by regressing to each label's cost and predicting the smallest. On a new example, it uses a set of regressors that perform well on past data to estimate possible costs for each label. It queries only the labels that could… ▽ More

    Submitted 11 October, 2021; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: Fixed typos in Appendix A

    Journal ref: Journal of Machine Learning Research, 2019

  40. arXiv:1610.09512  [pdf, other

    cs.LG stat.ML

    Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

    Authors: Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

    Abstract: This paper studies systematic exploration for reinforcement learning with rich observations and function approximation. We introduce a new model called contextual decision processes, that unifies and generalizes most prior settings. Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally… ▽ More

    Submitted 1 December, 2016; v1 submitted 29 October, 2016; originally announced October 2016.

    Comments: 42 pages, 1 figure

  41. arXiv:1606.04988  [pdf, other

    stat.ML cs.LG

    Logarithmic Time One-Against-Some

    Authors: Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro

    Abstract: We create a new online reduction of multiclass classification to binary classification for which training and prediction time scale logarithmically with the number of classes. Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an… ▽ More

    Submitted 30 November, 2016; v1 submitted 15 June, 2016; originally announced June 2016.

  42. arXiv:1606.03966  [pdf, other

    cs.LG cs.DC

    Making Contextual Decisions with Low Technical Debt

    Authors: Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins

    Abstract: Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Reinforcement-based learning algorithms such as contextual bandits can be very effective in these settings, but applying them in practice is fraught with technical debt, and no general system exists that supports them completely. We address this and create the fi… ▽ More

    Submitted 9 May, 2017; v1 submitted 13 June, 2016; originally announced June 2016.

  43. arXiv:1605.04812  [pdf, other

    cs.LG cs.AI stat.ML

    Off-policy evaluation for slate recommendation

    Authors: Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni

    Abstract: This paper studies the evaluation of policies that recommend an ordered set of items (e.g., a ranking) based on some context---a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy's performance. A thorough empirical evaluation on real-world data reveals that our… ▽ More

    Submitted 6 November, 2017; v1 submitted 16 May, 2016; originally announced May 2016.

    Comments: 31 pages (9 main paper, 20 supplementary), 12 figures (2 main paper, 10 supplementary)

  44. arXiv:1602.07265  [pdf, ps, other

    cs.LG stat.ML

    Search Improves Label for Active Learning

    Authors: Alina Beygelzimer, Daniel Hsu, John Langford, Chicheng Zhang

    Abstract: We investigate active learning with access to two distinct oracles: Label (which is standard) and Search (which is not). The Search oracle models the situation where a human searches a database to seed or counterexample an existing solution. Search is stronger than Label while being natural to implement in many situations. We show that an algorithm using both oracles can provide exponentially larg… ▽ More

    Submitted 24 October, 2016; v1 submitted 23 February, 2016; originally announced February 2016.

    Comments: 32 pages; NIPS 2016

  45. arXiv:1602.02722  [pdf, other

    cs.LG stat.ML

    PAC Reinforcement Learning with Rich Observations

    Authors: Akshay Krishnamurthy, Alekh Agarwal, John Langford

    Abstract: We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal of achieving long-term performance competitive with a large set of policies. To avoid barriers to sample-efficient learning associated with large observation sp… ▽ More

    Submitted 28 October, 2016; v1 submitted 8 February, 2016; originally announced February 2016.

  46. arXiv:1602.02202  [pdf, other

    cs.LG

    Efficient Second Order Online Learning by Sketching

    Authors: Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford

    Abstract: We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data. SON is an enhanced version of the Online Newton Step, which, via sketching techniques enjoys a running time linear in the dimension and sketch size. We further develop sparse forms of the sketching methods (such as Oja's rule), making the… ▽ More

    Submitted 17 October, 2017; v1 submitted 5 February, 2016; originally announced February 2016.

  47. arXiv:1506.08669  [pdf, other

    cs.LG stat.ML

    Efficient and Parsimonious Agnostic Active Learning

    Authors: Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

    Abstract: We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise. 2) It is efficiently implementable with an ERM oracle. 3) It is more aggressive than all previous approaches satisfying 1 and 2. To do this we create an algorithm based on a n… ▽ More

    Submitted 7 January, 2016; v1 submitted 29 June, 2015; originally announced June 2015.

  48. arXiv:1503.05615  [pdf, other

    cs.CL cs.LG

    Learning to Search for Dependencies

    Authors: Kai-Wei Chang, He He, Hal Daumé III, John Langford

    Abstract: We demonstrate that a dependency parser can be built using a credit assignment compiler which removes the burden of worrying about low-level machine learning details from the parser implementation. The result is a simple parser which robustly applies to many languages that provides similar statistical and computational performance with best-to-date transition-based parsing approaches, while avoidi… ▽ More

    Submitted 7 May, 2015; v1 submitted 18 March, 2015; originally announced March 2015.

  49. arXiv:1503.02834  [pdf, ps, other

    stat.ME cs.AI

    Doubly Robust Policy Evaluation and Optimization

    Authors: Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li

    Abstract: We study sequential decision making in environments where rewards are only partially observed, but can be modeled as a function of observed contexts and the chosen action by the decision maker. This setting, known as contextual bandits, encompasses a wide variety of applications such as health care, content recommendation and Internet advertising. A central task is evaluation of a new policy given… ▽ More

    Submitted 10 March, 2015; originally announced March 2015.

    Comments: Published in at http://dx.doi.org/10.1214/14-STS500 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS500

    Journal ref: Statistical Science 2014, Vol. 29, No. 4, 485-511

  50. arXiv:1502.02704  [pdf, other

    cs.LG

    Learning Reductions that Really Work

    Authors: Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

    Abstract: We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful.

    Submitted 9 February, 2015; originally announced February 2015.