Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Cheung, W C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02594  [pdf, other

    cs.LG stat.ML

    Leveraging (Biased) Information: Multi-armed Bandits with Offline Data

    Authors: Wang Chi Cheung, Lixing Lyu

    Abstract: We leverage offline data to facilitate online learning in stochastic multi-armed bandits. The probability distributions that govern the offline data and the online rewards can be different. Without any non-trivial upper bound on their difference, we show that no non-anticipatory policy can outperform the UCB policy by (Auer et al. 2002), even in the presence of offline data. In complement, we prop… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 24 pages, 5 figures. Accepted to ICML 2024

  2. arXiv:2402.19090  [pdf, ps, other

    cs.LG

    Best Arm Identification with Resource Constraints

    Authors: Zitian Li, Wang Chi Cheung

    Abstract: Motivated by the cost heterogeneity in experimentation across different alternatives, we study the Best Arm Identification with Resource Constraints (BAIwRC) problem. The agent aims to identify the best arm under resource constraints, where resources are consumed for each arm pull. We make two novel contributions. We design and analyze the Successive Halving with Resource Rationing algorithm (SH-R… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  3. arXiv:2302.04182  [pdf, other

    cs.LG math.OC

    Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands

    Authors: Lixing Lyu, Wang Chi Cheung

    Abstract: We consider a general online resource allocation model with bandit feedback and time-varying demands. While online resource allocation has been well studied in the literature, most existing works make the strong assumption that the demand arrival process is stationary. In practical applications, such as online advertisement and revenue management, however, this process may be exogenous and non-sta… ▽ More

    Submitted 12 June, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 74 pages. A preliminary short version entitled "Non-Stationary Bandits with Knapsack Problems with Advice" is accepted to ICML 2023

  4. arXiv:2110.08627  [pdf, other

    cs.LG cs.AI cs.IT stat.ML

    Achieving the Pareto Frontier of Regret Minimization and Best Arm Identification in Multi-Armed Bandits

    Authors: Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

    Abstract: We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To this end, we design and analy… ▽ More

    Submitted 9 June, 2023; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: 43 pages, 10 figures

  5. arXiv:2010.07904  [pdf, other

    cs.LG cs.IT

    Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions

    Authors: Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

    Abstract: We consider a best arm identification (BAI) problem for stochastic bandits with adversarial corruptions in the fixed-budget setting of T steps. We design a novel randomized algorithm, Probabilistic Sequential Shrinking($u$) (PSS($u$)), which is agnostic to the amount of corruptions. When the amount of corruptions per step (CPS) is below a threshold, PSS($u$) identifies the best arm or item with pr… ▽ More

    Submitted 18 June, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: 22 pages, 9 figures

  6. arXiv:2006.14389  [pdf, other

    cs.LG stat.ML

    Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

    Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

    Abstract: We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. We first develop the Sliding Window Upper-Confidence bound for Reinf… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: To appear in proceedings of the 37th International Conference on Machine Learning. Shortened conference version of its journal version (available at: arXiv:1906.02922)

  7. arXiv:2001.08655  [pdf, other

    cs.LG cs.IT stat.ML

    Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting

    Authors: Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

    Abstract: We design and analyze CascadeBAI, an algorithm for finding the best set of $K$ items, also called an arm, within the framework of cascading bandits. An upper bound on the time complexity of CascadeBAI is derived by overcoming a crucial analytical challenge, namely, that of probabilistically estimating the amount of available feedback at each step. To do so, we define a new class of random variable… ▽ More

    Submitted 15 June, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: 39 pages, 25 figures. Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria, PMLR 108, 2020

  8. arXiv:1906.02922  [pdf, other

    cs.LG stat.ML

    Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

    Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

    Abstract: We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under temporal drifts, ie, both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not exceed certain variation budgets. This setting captures the endogeneity, exogeneity, uncertainty, and partial feed… ▽ More

    Submitted 18 May, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

  9. arXiv:1905.06466  [pdf, ps, other

    cs.LG math.OC stat.ML

    Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

    Authors: Wang Chi Cheung

    Abstract: We consider an agent who is involved in a Markov decision process and receives a vector of outcomes every round. Her objective is to maximize a global concave reward function on the average vectorial outcome. The problem models applications such as multi-objective optimization, maximum entropy exploration, and constrained optimization in Markovian environments. In our general setting where a stati… ▽ More

    Submitted 15 May, 2019; originally announced May 2019.

    Comments: 54 pages, 1 figure

  10. arXiv:1903.01461  [pdf, other

    cs.LG stat.ML

    Hedging the Drift: Learning to Optimize under Non-Stationarity

    Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

    Abstract: We introduce data-driven decision-making algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown \emph{a priori} and possibly adversarial) non-stationarity can… ▽ More

    Submitted 17 March, 2021; v1 submitted 4 March, 2019; originally announced March 2019.

    Comments: Journal version of the AISTATS 2019 version (available at arXiv:1810.03024). This version fixed an error in the proof of Theorem 2 with Assumption 4 of arXiv:2103.05750

  11. arXiv:1810.05640  [pdf, other

    cs.AI cs.LG stat.ML

    Inventory Balancing with Online Learning

    Authors: Wang Chi Cheung, Will Ma, David Simchi-Levi, Xinshang Wang

    Abstract: We study a general problem of allocating limited resources to heterogeneous customers over time under model uncertainty. Each type of customer can be serviced using different actions, each of which stochastically consumes some combination of resources, and returns different rewards for the resources consumed. We consider a general model where the resource consumption distribution associated with e… ▽ More

    Submitted 30 August, 2021; v1 submitted 11 October, 2018; originally announced October 2018.

  12. arXiv:1810.03024  [pdf, other

    cs.LG stat.ML

    Learning to Optimize under Non-Stationarity

    Authors: Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

    Abstract: We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defi… ▽ More

    Submitted 17 July, 2021; v1 submitted 6 October, 2018; originally announced October 2018.

    Comments: This version fixed an error in the proof of Lemma 1 with Assumption 4 of arXiv:2103.05750

    Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019)

  13. arXiv:1810.01187  [pdf, other

    cs.LG stat.ML

    Thompson Sampling Algorithms for Cascading Bandits

    Authors: Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan

    Abstract: Motivated by the pressing need for efficient optimization in online recommender systems, we revisit the cascading bandit model proposed by Kveton et al. (2015). While Thompson sampling (TS) algorithms have been shown to be empirically superior to Upper Confidence Bound (UCB) algorithms for cascading bandits, theoretical guarantees are only known for the latter. In this paper, we first provide a pr… ▽ More

    Submitted 15 May, 2021; v1 submitted 2 October, 2018; originally announced October 2018.

    Comments: 62 pages, 6 figures

  14. arXiv:1704.00108  [pdf, ps, other

    cs.LG

    Assortment Optimization under Unknown MultiNomial Logit Choice Models

    Authors: Wang Chi Cheung, David Simchi-Levi

    Abstract: Motivated by e-commerce, we study the online assortment optimization problem. The seller offers an assortment, i.e. a subset of products, to each arriving customer, who then purchases one or no product from her offered assortment. A customer's purchase decision is governed by the underlying MultiNomial Logit (MNL) choice model. The seller aims to maximize the total revenue in a finite sales horizo… ▽ More

    Submitted 31 March, 2017; originally announced April 2017.

    Comments: 16 pages, 2 figures