Search | arXiv e-print repository

Near-Optimal Mechanisms for Resource Allocation Without Monetary Transfers

Authors: Moise Blanchard, Patrick Jaillet

Abstract: We study the problem in which a central planner sequentially allocates a single resource to multiple strategic agents using their utility reports at each round, but without using any monetary transfers. We consider general agent utility distributions and two standard settings: a finite horizon $T$ and an infinite horizon with $γ$ discounts. We provide general tools to characterize the convergence… ▽ More We study the problem in which a central planner sequentially allocates a single resource to multiple strategic agents using their utility reports at each round, but without using any monetary transfers. We consider general agent utility distributions and two standard settings: a finite horizon $T$ and an infinite horizon with $γ$ discounts. We provide general tools to characterize the convergence rate between the optimal mechanism for the central planner and the first-best allocation if true agent utilities were available. This heavily depends on the utility distributions, yielding rates anywhere between $1/\sqrt T$ and $1/T$ for the finite-horizon setting, and rates faster than $\sqrt{1-γ}$, including exponential rates for the infinite-horizon setting as agents are more patient $γ\to 1$. On the algorithmic side, we design mechanisms based on the promised-utility framework to achieve these rates and leverage structure on the utility distributions. Intuitively, the more flexibility the central planner has to reward or penalize any agent while incurring little social welfare cost, the faster the convergence rate. In particular, discrete utility distributions typically yield the slower rates $1/\sqrt T$ and $\sqrt{1-γ}$, while smooth distributions with density typically yield faster rates $1/T$ (up to logarithmic factors) and $1-γ$. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2407.17112 [pdf, other]

Neural Dueling Bandits

Authors: Arun Verma, Zhongxiang Dai, Xiaoqiang Lin, Patrick Jaillet, Bryan Kian Hsiang Low

Abstract: Contextual dueling bandit is used to model the bandit problems, where a learner's goal is to find the best arm for a given context using observed noisy preference feedback over the selected arms for the past contexts. However, existing algorithms assume the reward function is linear, which can be complex and non-linear in many real-life applications like online recommendations or ranking web searc… ▽ More Contextual dueling bandit is used to model the bandit problems, where a learner's goal is to find the best arm for a given context using observed noisy preference feedback over the selected arms for the past contexts. However, existing algorithms assume the reward function is linear, which can be complex and non-linear in many real-life applications like online recommendations or ranking web search results. To overcome this challenge, we use a neural network to estimate the reward function using preference feedback for the previously selected arms. We propose upper confidence bound- and Thompson sampling-based algorithms with sub-linear regret guarantees that efficiently select arms in each round. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution. Experimental results on the problem instances derived from synthetic datasets corroborate our theoretical results. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: Accepted at ICML 2024 Workshop on Foundations of Reinforcement Learning and Control

arXiv:2406.03682 [pdf, other]

A Universal Class of Sharpness-Aware Minimization Algorithms

Authors: Behrooz Tahmasebi, Ashkan Soleymani, Dara Bahri, Stefanie Jegelka, Patrick Jaillet

Abstract: Recently, there has been a surge in interest in developing optimization algorithms for overparameterized models as achieving generalization is believed to require algorithms with suitable biases. This interest centers on minimizing sharpness of the original loss function; the Sharpness-Aware Minimization (SAM) algorithm has proven effective. However, most literature only considers a few sharpness… ▽ More Recently, there has been a surge in interest in developing optimization algorithms for overparameterized models as achieving generalization is believed to require algorithms with suitable biases. This interest centers on minimizing sharpness of the original loss function; the Sharpness-Aware Minimization (SAM) algorithm has proven effective. However, most literature only considers a few sharpness measures, such as the maximum eigenvalue or trace of the training loss Hessian, which may not yield meaningful insights for non-convex optimization scenarios like neural networks. Additionally, many sharpness measures are sensitive to parameter invariances in neural networks, magnifying significantly under rescaling parameters. Motivated by these challenges, we introduce a new class of sharpness measures in this paper, leading to new sharpness-aware objective functions. We prove that these measures are \textit{universally expressive}, allowing any function of the training loss Hessian matrix to be represented by appropriate hyperparameters. Furthermore, we show that the proposed objective functions explicitly bias towards minimizing their corresponding sharpness measures, and how they allow meaningful applications to models with parameter invariances (such as scale-invariances). Finally, as instances of our proposed general framework, we present \textit{Frob-SAM} and \textit{Det-SAM}, which are specifically designed to minimize the Frobenius norm and the determinant of the Hessian of the training loss, respectively. We also demonstrate the advantages of our general framework through extensive experiments. △ Less

Submitted 10 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: ICML 2024. Code is available at http://github.com/dbahri/universal_sam

arXiv:2405.17346 [pdf, other]

Prompt Optimization with Human Feedback

Authors: Xiaoqiang Lin, Zhongxiang Dai, Arun Verma, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low

Abstract: Large language models (LLMs) have demonstrated remarkable performances in various tasks. However, the performance of LLMs heavily depends on the input prompt, which has given rise to a number of recent works on prompt optimization. However, previous works often require the availability of a numeric score to assess the quality of every prompt. Unfortunately, when a human user interacts with a black… ▽ More Large language models (LLMs) have demonstrated remarkable performances in various tasks. However, the performance of LLMs heavily depends on the input prompt, which has given rise to a number of recent works on prompt optimization. However, previous works often require the availability of a numeric score to assess the quality of every prompt. Unfortunately, when a human user interacts with a black-box LLM, attaining such a score is often infeasible and unreliable. Instead, it is usually significantly easier and more reliable to obtain preference feedback from a human user, i.e., showing the user the responses generated from a pair of prompts and asking the user which one is preferred. Therefore, in this paper, we study the problem of prompt optimization with human feedback (POHF), in which we aim to optimize the prompt for a black-box LLM using only human preference feedback. Drawing inspiration from dueling bandits, we design a theoretically principled strategy to select a pair of prompts to query for preference feedback in every iteration, and hence introduce our algorithm named automated POHF (APOHF). We apply our APOHF algorithm to various tasks, including optimizing user instructions, prompt optimization for text-to-image generative models, and response optimization with human feedback (i.e., further refining the response using a variant of our APOHF). The results demonstrate that our APOHF can efficiently find a good prompt using a small number of preference feedback instances. Our code can be found at \url{https://github.com/xqlin98/APOHF}. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Preprint, 18 pages

arXiv:2405.16122 [pdf, other]

Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars

Authors: Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low

Abstract: Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar s… ▽ More Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the instruction, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named EASE, which leverages the hidden embedding from a pre-trained language model to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars while accounting for exemplar ordering. Our EASE can efficiently find an ordered set of exemplars that performs well for all test queries from a given task, thereby eliminating test-time computation. Importantly, EASE can be readily extended to jointly optimize both the exemplars and the instruction. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of EASE over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest. Our code is available at https://github.com/ZhaoxuanWu/EASE-Prompt-Optimization. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 23 pages, 1 figure, 23 tables

arXiv:2404.01676 [pdf, other]

Incentives in Private Collaborative Machine Learning

Authors: Rachael Hwee Ling Sim, Yehong Zhang, Trong Nghia Hoang, Xinyi Xu, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Collaborative machine learning involves training models on data from multiple parties but must incentivize their participation. Existing data valuation methods fairly value and reward each party based on shared data or model parameters but neglect the privacy risks involved. To address this, we introduce differential privacy (DP) as an incentive. Each party can select its required DP guarantee and… ▽ More Collaborative machine learning involves training models on data from multiple parties but must incentivize their participation. Existing data valuation methods fairly value and reward each party based on shared data or model parameters but neglect the privacy risks involved. To address this, we introduce differential privacy (DP) as an incentive. Each party can select its required DP guarantee and perturb its sufficient statistic (SS) accordingly. The mediator values the perturbed SS by the Bayesian surprise it elicits about the model parameters. As our valuation function enforces a privacy-valuation trade-off, parties are deterred from selecting excessive DP guarantees that reduce the utility of the grand coalition's model. Finally, the mediator rewards each party with different posterior samples of the model parameters. Such rewards still satisfy existing incentives like fairness but additionally preserve DP and a high similarity to the grand coalition's posterior. We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted to NeurIPS 2023

arXiv:2402.08533 [pdf, other]

Grace Period is All You Need: Individual Fairness without Revenue Loss in Revenue Management

Authors: Patrick Jaillet, Chara Podimata, Zijie Zhou

Abstract: Imagine you and a friend purchase identical items at a store, yet only your friend received a discount. Would your friend's discount make you feel unfairly treated by the store? And would you be less willing to purchase from that store again in the future? Based on a large-scale online survey that we ran on Prolific, it turns out that the answers to the above questions are positive. Motivated by t… ▽ More Imagine you and a friend purchase identical items at a store, yet only your friend received a discount. Would your friend's discount make you feel unfairly treated by the store? And would you be less willing to purchase from that store again in the future? Based on a large-scale online survey that we ran on Prolific, it turns out that the answers to the above questions are positive. Motivated by these findings, in this work we propose a notion of individual fairness in online revenue management and an algorithmic module (called ``Grace Period'') that can be embedded in traditional revenue management algorithms and guarantee individual fairness. Specifically, we show how to embed the Grace Period in five common revenue management algorithms including Deterministic Linear Programming with Probabilistic Assignment, Resolving Deterministic Linear Programming with Probabilistic Assignment, Static Bid Price Control, Booking Limit, and Nesting, thus covering both stochastic and adversarial customer arrival settings. Embedding the Grace Period does not incur additional regret for any of these algorithms. This finding indicates that there is no tradeoff between a seller maximizing their revenue and guaranteeing that each customer feels fairly treated. △ Less

Submitted 17 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2312.13130 [pdf, ps, other]

Distribution-Dependent Rates for Multi-Distribution Learning

Authors: Rafael Hanashiro, Patrick Jaillet

Abstract: To address the needs of modeling uncertainty in sensitive machine learning applications, the setup of distributionally robust optimization (DRO) seeks good performance uniformly across a variety of tasks. The recent multi-distribution learning (MDL) framework tackles this objective in a dynamic interaction with the environment, where the learner has sampling access to each target distribution. Dra… ▽ More To address the needs of modeling uncertainty in sensitive machine learning applications, the setup of distributionally robust optimization (DRO) seeks good performance uniformly across a variety of tasks. The recent multi-distribution learning (MDL) framework tackles this objective in a dynamic interaction with the environment, where the learner has sampling access to each target distribution. Drawing inspiration from the field of pure-exploration multi-armed bandits, we provide distribution-dependent guarantees in the MDL regime, that scale with suboptimality gaps and result in superior dependence on the sample size when compared to the existing distribution-independent analyses. We investigate two non-adaptive strategies, uniform and non-uniform exploration, and present non-asymptotic regret bounds using novel tools from empirical process theory. Furthermore, we devise an adaptive optimistic algorithm, LCB-DR, that showcases enhanced dependence on the gaps, mirroring the contrast between uniform and optimistic allocation in the multi-armed bandit literature. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2312.04073 [pdf, other]

Information Design for Hybrid Work under Infectious Disease Transmission Risk

Authors: Sohil Shah, Saurabh Amin, Patrick Jaillet

Abstract: We study a planner's provision of information to manage workplace occupancy when strategic workers (agents) face risk of infectious disease transmission. The planner implements an information mechanism to signal information about the underlying risk of infection at the workplace. Agents update their belief over the risk parameter using this information and choose to work in-person or remotely. We… ▽ More We study a planner's provision of information to manage workplace occupancy when strategic workers (agents) face risk of infectious disease transmission. The planner implements an information mechanism to signal information about the underlying risk of infection at the workplace. Agents update their belief over the risk parameter using this information and choose to work in-person or remotely. We address the design of the optimal signaling mechanism that best aligns the workplace occupancy with the planner's preference (i.e., maintaining safe capacity limits and operational efficiency at workplace). For various forms of planner preferences, we show numerical and analytical proof that interval-based information mechanisms are optimal. These mechanisms partition the continuous domain of the risk parameter into disjoint intervals and provision information based on interval-specific probability distributions over a finite set of signals. When the planner seeks to achieve an occupancy that lies in one of finitely many pre-specified ranges independent of the underlying risk, we provide an optimal mechanism that uses at most two intervals. On the other hand, when the preference on the occupancy is risk-dependent, we show that an approximately optimal interval-based mechanism can be computed efficiently. We bound the approximation loss for preferences that are expressed through a Lipschitz continuous function of both occupancy and risk parameter. We provide examples that demonstrate the improvement of proposed signaling mechanisms relative to the common benchmarks in information provision. Our findings suggest that information provision over the risk of disease transmission is an effective intervention for maintaining desirable occupancy levels at the workplace. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.06012 [pdf, other]

Doubly Robust Structure Identification from Temporal Data

Authors: Emmanouil Angelis, Francesco Quinzan, Ashkan Soleymani, Patrick Jaillet, Stefan Bauer

Abstract: Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantia… ▽ More Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantially biased. Furthermore, potential causes may be correlated in practical applications. Moreover, existing algorithms often do not work with cyclic data. To address these challenges, we propose a new doubly robust method for Structure Identification from Temporal Data ( SITD ). We provide theoretical guarantees, showing that our method asymptotically recovers the true underlying causal structure. Our analysis extends to cases where the potential causes have cycles and they may be confounded. We further perform extensive experiments to showcase the superior performance of our method. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2311.01195 [pdf, other]

Batch Bayesian Optimization for Replicable Experimental Design

Authors: Zhongxiang Dai, Quoc Phong Nguyen, Sebastian Shenghong Tay, Daisuke Urano, Richalynn Leong, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Many real-world experimental design problems (a) evaluate multiple experimental conditions in parallel and (b) replicate each condition multiple times due to large and heteroscedastic observation noise. Given a fixed total budget, this naturally induces a trade-off between evaluating more unique conditions while replicating each of them fewer times vs. evaluating fewer unique conditions and replic… ▽ More Many real-world experimental design problems (a) evaluate multiple experimental conditions in parallel and (b) replicate each condition multiple times due to large and heteroscedastic observation noise. Given a fixed total budget, this naturally induces a trade-off between evaluating more unique conditions while replicating each of them fewer times vs. evaluating fewer unique conditions and replicating each more times. Moreover, in these problems, practitioners may be risk-averse and hence prefer an input with both good average performance and small variability. To tackle both challenges, we propose the Batch Thompson Sampling for Replicable Experimental Design (BTS-RED) framework, which encompasses three algorithms. Our BTS-RED-Known and BTS-RED-Unknown algorithms, for, respectively, known and unknown noise variance, choose the number of replications adaptively rather than deterministically such that an input with a larger noise variance is replicated more times. As a result, despite the noise heteroscedasticity, both algorithms enjoy a theoretical guarantee and are asymptotically no-regret. Our Mean-Var-BTS-RED algorithm aims at risk-averse optimization and is also asymptotically no-regret. We also show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2310.07884 [pdf, ps, other]

Secretary Problems with Random Number of Candidates: How Prior Distributional Information Helps

Authors: Junhui Zhang, Patrick Jaillet

Abstract: We study variants of the secretary problem, where $N$, the number of candidates, is a random variable, and the decision maker wants to maximize the probability of success -- picking the largest number among the $N$ candidates -- using only the relative ranks of the candidates revealed so far. We consider three forms of prior information about $\mathbf p$, the probability distribution of $N$. In… ▽ More We study variants of the secretary problem, where $N$, the number of candidates, is a random variable, and the decision maker wants to maximize the probability of success -- picking the largest number among the $N$ candidates -- using only the relative ranks of the candidates revealed so far. We consider three forms of prior information about $\mathbf p$, the probability distribution of $N$. In the full information setting, we assume $\mathbf p$ to be fully known. In that case, we show that single-threshold type of strategies can achieve $1/e$-approximation to the maximum probability of success among all possible strategies. In the upper bound setting, we assume that $N\leq \overline{n}$ (or $\mathbb E[N]\leq \barμ$), where $\bar{n}$ (or $\barμ$) is known. In that case, we show that randomization over single-threshold type of strategies can achieve the optimal worst case probability of success of $\frac{1}{\log(\bar{n})}$ (or $\frac{1}{\log(\barμ)}$) asymptotically. Surprisingly, there is a single-threshold strategy (depending on $\overline{n}$) that can succeed with probability $2/e^2$ for all but an exponentially small fraction of distributions supported on $[\bar{n}]$. In the sampling setting, we assume that we have access to $m$ samples $N^{(1)},\ldots,N^{(m)}\sim_{iid} \mathbf p$. In that case, we show that if $N\leq T$ with probability at least $1-O(ε)$ for some $T\in \mathbb N$, $m\gtrsim \frac{1}{ε^2}\max(\log(\frac{1}ε),ε\log(\frac{\log(T)}ε))$ is enough to learn a strategy that is at least $ε$-suboptimal, and we provide a lower bound of $Ω(\frac{1}{ε^2})$, showing that the sampling algorithm is optimal when $ε=O(\frac{1}{\log\log(T)})$. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.05373 [pdf, other]

Quantum Bayesian Optimization

Authors: Zhongxiang Dai, Gregory Kang Ruey Lau, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets f… ▽ More Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets for any classical BO algorithm. Recent works on quantum bandits have shown that with the aid of quantum computing, it is possible to achieve tighter regret upper bounds better than their corresponding classical lower bounds. However, these works are restricted to either multi-armed or linear bandits, and are hence not able to solve sophisticated real-world problems with non-linear reward functions. To this end, we introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm. To the best of our knowledge, our Q-GP-UCB is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting. Moreover, thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm from the previous work. We use simulations, as well as an experiment using a real quantum computer, to verify that the theoretical quantum speedup achieved by our Q-GP-UCB is also potentially relevant in practice. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2310.02905 [pdf, other]

Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers

Authors: Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low

Abstract: Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimi… ▽ More Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. However, BO usually falls short when optimizing highly sophisticated (e.g., high-dimensional) objective functions, such as the functions mapping an instruction to the performance of an LLM. This is mainly due to the limited expressive power of the Gaussian process (GP) which is used by BO as a surrogate to model the objective function. Meanwhile, it has been repeatedly shown that neural networks (NNs), especially pre-trained transformers, possess strong expressive power and can model highly complex functions. So, we adopt a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs. More importantly, the neural bandit algorithm allows us to naturally couple the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM), which significantly boosts its performance. These motivate us to propose our INSTruction optimization usIng Neural bandits Coupled with Transformers (INSTINCT) algorithm. We perform instruction optimization for ChatGPT and use extensive experiments to show that INSTINCT consistently outperforms baselines in different tasks, e.g., various instruction induction tasks and the task of improving zero-shot chain-of-thought instructions. Our code is available at https://github.com/xqlin98/INSTINCT. △ Less

Submitted 23 June, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Accepted to ICML 2024

arXiv:2307.03994 [pdf, other]

Market Design for Dynamic Pricing and Pooling in Capacitated Networks

Authors: Saurabh Amin, Patrick Jaillet, Haripriya Pulyassary, Manxi Wu

Abstract: We study a market mechanism that sets edge prices to incentivize strategic agents to organize trips that efficiently share limited network capacity. This market allows agents to form groups to share trips, make decisions on departure times and route choices, and make payments to cover edge prices and other costs. We develop a new approach to analyze the existence and computation of market equilibr… ▽ More We study a market mechanism that sets edge prices to incentivize strategic agents to organize trips that efficiently share limited network capacity. This market allows agents to form groups to share trips, make decisions on departure times and route choices, and make payments to cover edge prices and other costs. We develop a new approach to analyze the existence and computation of market equilibrium, building on theories of combinatorial auctions and dynamic network flows. Our approach tackles the challenges in market equilibrium characterization arising from: (a) integer and network constraints on the dynamic flow of trips in sharing limited edge capacity; (b) heterogeneous and private preferences of strategic agents. We provide sufficient conditions on the network topology and agents' preferences that ensure the existence and polynomial-time computation of market equilibrium. We identify a particular market equilibrium that achieves maximum utilities for all agents, and is equivalent to the outcome of the classical Vickery Clark Grove mechanism. Finally, we extend our results to general networks with multiple populations and apply them to compute dynamic tolls for efficient carpooling in San Francisco Bay Area. △ Less

Submitted 1 November, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

arXiv:2306.12282 [pdf, other]

Online Resource Allocation with Convex-set Machine-Learned Advice

Authors: Negin Golrezaei, Patrick Jaillet, Zijie Zhou

Abstract: Decision-makers often have access to a machine-learned prediction about demand, referred to as advice, which can potentially be utilized in online decision-making processes for resource allocation. However, exploiting such advice poses challenges due to its potential inaccuracy. To address this issue, we propose a framework that enhances online resource allocation decisions with potentially unreli… ▽ More Decision-makers often have access to a machine-learned prediction about demand, referred to as advice, which can potentially be utilized in online decision-making processes for resource allocation. However, exploiting such advice poses challenges due to its potential inaccuracy. To address this issue, we propose a framework that enhances online resource allocation decisions with potentially unreliable machine-learned (ML) advice. We assume here that this advice is represented by a general convex uncertainty set for the demand vector. We introduce a parameterized class of Pareto optimal online resource allocation algorithms that strike a balance between consistent and robust ratios. The consistent ratio measures the algorithm's performance (compared to the optimal hindsight solution) when the ML advice is accurate, while the robust ratio captures performance under an adversarial demand process when the advice is inaccurate. Specifically, in a C-Pareto optimal setting, we maximize the robust ratio while ensuring that the consistent ratio is at least C. Our proposed C-Pareto optimal algorithm is an adaptive protection level algorithm, which extends the classical fixed protection level algorithm introduced in Littlewood (2005) and Ball and Queyranne (2009). Solving a complex non-convex continuous optimization problem characterizes the adaptive protection level algorithm. To complement our algorithms, we present a simple method for computing the maximum achievable consistent ratio, which serves as an estimate for the maximum value of the ML advice. Additionally, we present numerical studies to evaluate the performance of our algorithm in comparison to benchmark algorithms. The results demonstrate that by adjusting the parameter C, our algorithms effectively strike a balance between worst-case and average performance, outperforming the benchmark algorithms. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 74 pages, 5 figures

arXiv:2306.10096 [pdf, ps, other]

Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes

Authors: Moïse Blanchard, Junhui Zhang, Patrick Jaillet

Abstract: We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius $ε$ with a separation oracle in dimension $d$ -- or to minimize $1$-Lipschitz convex functions to accuracy $ε$ over the unit ball -- our algorithms use… ▽ More We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius $ε$ with a separation oracle in dimension $d$ -- or to minimize $1$-Lipschitz convex functions to accuracy $ε$ over the unit ball -- our algorithms use $\mathcal O(\frac{d^2}{p}\ln \frac{1}ε)$ bits of memory, and make $\mathcal O((C\frac{d}{p}\ln \frac{1}ε)^p)$ oracle calls, for some universal constant $C \geq 1$. The family is parametrized by $p\in[d]$ and provides an oracle-complexity/memory trade-off in the sub-polynomial regime $\ln\frac{1}ε\gg\ln d$. While several works gave lower-bound trade-offs (impossibility results) -- we explicit here their dependence with $\ln\frac{1}ε$, showing that these also hold in any sub-polynomial regime -- to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $ε\leq 1/\sqrt d$. The algorithms divide the $d$ variables into $p$ blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime $ε\leq d^{-Ω(d)}$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.07024 [pdf, other]

DRCFS: Doubly Robust Causal Feature Selection

Authors: Francesco Quinzan, Ashkan Soleymani, Patrick Jaillet, Cristian R. Rojas, Stefan Bauer

Abstract: Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the caus… ▽ More Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems. △ Less

Submitted 5 July, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2302.07186 [pdf, ps, other]

Adversarial Rewards in Universal Learning for Contextual Bandits

Authors: Moise Blanchard, Steve Hanneke, Patrick Jaillet

Abstract: We study the fundamental limits of learning in contextual bandits, where a learner's rewards depend on their actions and a known context, which extends the canonical multi-armed bandit to the case where side-information is available. We are interested in universally consistent algorithms, which achieve sublinear regret compared to any measurable fixed policy, without any function class restriction… ▽ More We study the fundamental limits of learning in contextual bandits, where a learner's rewards depend on their actions and a known context, which extends the canonical multi-armed bandit to the case where side-information is available. We are interested in universally consistent algorithms, which achieve sublinear regret compared to any measurable fixed policy, without any function class restriction. For stationary contextual bandits, when the underlying reward mechanism is time-invariant, Blanchard et. al (2022) characterized learnable context processes for which universal consistency is achievable; and further gave algorithms ensuring universal consistency whenever this is achievable, a property known as optimistic universal consistency. It is well understood, however, that reward mechanisms can evolve over time, possibly adversarially, and depending on the learner's actions. We show that optimistic universal learning for contextual bandits with adversarial rewards is impossible in general, contrary to all previously studied settings in online learning -- including standard supervised learning. We also give necessary and sufficient conditions for universal learning under various adversarial reward models, and an exact characterization for online rewards. In particular, the set of learnable processes for these reward models is still extremely general -- larger than i.i.d., stationary or ergodic -- but in general strictly smaller than that for supervised learning or stationary contextual bandits, shedding light on new adversarial phenomena. △ Less

Submitted 12 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.06916 [pdf, other]

Effective Dimension in Bandit Problems under Censorship

Authors: Gauthier Guinet, Saurabh Amin, Patrick Jaillet

Abstract: In this paper, we study both multi-armed and contextual bandit problems in censored environments. Our goal is to estimate the performance loss due to censorship in the context of classical algorithms designed for uncensored environments. Our main contributions include the introduction of a broad class of censorship models and their analysis in terms of the effective dimension of the problem -- a n… ▽ More In this paper, we study both multi-armed and contextual bandit problems in censored environments. Our goal is to estimate the performance loss due to censorship in the context of classical algorithms designed for uncensored environments. Our main contributions include the introduction of a broad class of censorship models and their analysis in terms of the effective dimension of the problem -- a natural measure of its underlying statistical complexity and main driver of the regret bound. In particular, the effective dimension allows us to maintain the structure of the original problem at first order, while embedding it in a bigger space, and thus naturally leads to results analogous to uncensored settings. Our analysis involves a continuous generalization of the Elliptical Potential Inequality, which we believe is of independent interest. We also discover an interesting property of decision-making under censorship: a transient phase during which initial misspecification of censorship is self-corrected at an extra cost, followed by a stationary phase that reflects the inherent slowdown of learning governed by the effective dimension. Our results are useful for applications of sequential decision-making models where the feedback received depends on strategic uncertainty (e.g., agents' willingness to follow a recommendation) and/or random uncertainty (e.g., loss or delay in arrival of information). △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 45 pages, 5 figures, NeurIPS 2022

Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2302.04963 [pdf, ps, other]

Quadratic Memory is Necessary for Optimal Query Complexity in Convex Optimization: Center-of-Mass is Pareto-Optimal

Authors: Moïse Blanchard, Junhui Zhang, Patrick Jaillet

Abstract: We give query complexity lower bounds for convex optimization and the related feasibility problem. We show that quadratic memory is necessary to achieve the optimal oracle complexity for first-order convex optimization. In particular, this shows that center-of-mass cutting-planes algorithms in dimension $d$ which use $\tilde O(d^2)$ memory and $\tilde O(d)$ queries are Pareto-optimal for both conv… ▽ More We give query complexity lower bounds for convex optimization and the related feasibility problem. We show that quadratic memory is necessary to achieve the optimal oracle complexity for first-order convex optimization. In particular, this shows that center-of-mass cutting-planes algorithms in dimension $d$ which use $\tilde O(d^2)$ memory and $\tilde O(d)$ queries are Pareto-optimal for both convex optimization and the feasibility problem, up to logarithmic factors. Precisely, we prove that to minimize $1$-Lipschitz convex functions over the unit ball to $1/d^4$ accuracy, any deterministic first-order algorithms using at most $d^{2-δ}$ bits of memory must make $\tildeΩ(d^{1+δ/3})$ queries, for any $δ\in[0,1]$. For the feasibility problem, in which an algorithm only has access to a separation oracle, we show a stronger trade-off: for at most $d^{2-δ}$ memory, the number of queries required is $\tildeΩ(d^{1+δ})$. This resolves a COLT 2019 open problem of Woodworth and Srebro. △ Less

Submitted 18 May, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

arXiv:2302.01523 [pdf, other]

Multi-channel Autobidding with Budget and ROI Constraints

Authors: Yuan Deng, Negin Golrezaei, Patrick Jaillet, Jason Cheuk Nam Liang, Vahab Mirrokni

Abstract: In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice,… ▽ More In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser's global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem. Finally, we argue that all our results hold for both single-item and multi-item auctions from which channels procure impressions on advertisers' behalf. △ Less

Submitted 14 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2301.00241 [pdf, ps, other]

Contextual Bandits and Optimistically Universal Learning

Authors: Moise Blanchard, Steve Hanneke, Patrick Jaillet

Abstract: We consider the contextual bandit problem on general action and context spaces, where the learner's rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available, e.g., patients' records or customers' history, which allows for personalized treatment. We focus on consistency -- vanishing regret co… ▽ More We consider the contextual bandit problem on general action and context spaces, where the learner's rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available, e.g., patients' records or customers' history, which allows for personalized treatment. We focus on consistency -- vanishing regret compared to the optimal policy -- and show that for large classes of non-i.i.d. contexts, consistency can be achieved regardless of the time-invariant reward mechanism, a property known as universal consistency. Precisely, we first give necessary and sufficient conditions on the context-generating process for universal consistency to be possible. Second, we show that there always exists an algorithm that guarantees universal consistency whenever this is achievable, called an optimistically universal learning rule. Interestingly, for finite action spaces, learnable processes for universal learning are exactly the same as in the full-feedback setting of supervised learning, previously studied in the literature. In other words, learning can be performed with partial feedback without any generalization cost. The algorithms balance a trade-off between generalization (similar to structural risk minimization) and personalization (tailoring actions to specific contexts). Lastly, we consider the case of added continuity assumptions on rewards and show that these lead to universal consistency for significantly larger classes of data-generating processes. △ Less

Submitted 31 December, 2022; originally announced January 2023.

arXiv:2211.11065 [pdf, ps, other]

Additional Results and Extensions for the paper "Probabilistic bounds on the $k-$Traveling Salesman Problem and the Traveling Repairman Problem''

Authors: Moïse Blanchard, Alexandre Jacquillat, Patrick Jaillet

Abstract: This technical report provides additional results for the main paper ``Probabilistic bounds on the $k-$Traveling Salesman Problem ($k-$TSP) and the Traveling Repairman Problem (TRP)''. For the $k-$TSP, we extend the probabilistic bounds derived in the main paper to the case of distributions with general densities. For the TRP, we propose a utility-based notion of fairness and derive constant-facto… ▽ More This technical report provides additional results for the main paper ``Probabilistic bounds on the $k-$Traveling Salesman Problem ($k-$TSP) and the Traveling Repairman Problem (TRP)''. For the $k-$TSP, we extend the probabilistic bounds derived in the main paper to the case of distributions with general densities. For the TRP, we propose a utility-based notion of fairness and derive constant-factor probabilistic bounds for this objective, thus extending the TRP bounds from the main paper to non-linear utilities. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2211.11063 [pdf, other]

Probabilistic bounds on the $k-$Traveling Salesman Problem and the Traveling Repairman Problem

Authors: Moïse Blanchard, Alexandre Jacquillat, Patrick Jaillet

Abstract: The $k-$traveling salesman problem ($k$-TSP) seeks a tour of minimal length that visits a subset of $k\leq n$ points. The traveling repairman problem (TRP) seeks a complete tour with minimal latency. This paper provides constant-factor probabilistic approximations of both problems. We first show that the optimal length of the $k$-TSP path grows at a rate of… ▽ More The $k-$traveling salesman problem ($k$-TSP) seeks a tour of minimal length that visits a subset of $k\leq n$ points. The traveling repairman problem (TRP) seeks a complete tour with minimal latency. This paper provides constant-factor probabilistic approximations of both problems. We first show that the optimal length of the $k$-TSP path grows at a rate of $Θ\left(k/n^{\frac{1}{2}\left(1+\frac{1}{k-1}\right)}\right)$. The proof provides a constant-factor approximation scheme, which solves a TSP in a high-concentration zone -- leveraging large deviations of local concentrations. Then, we show that the optimal TRP latency grows at a rate of $Θ(n\sqrt n)$. This result extends the classical Beardwood-Halton-Hammersley theorem to the TRP. Again, the proof provides a constant-factor approximation scheme, which visits zones by decreasing order of probability density. We discuss practical implications of this result in the design of transportation and logistics systems. Finally, we propose dedicated notions of fairness -- randomized population-based fairness for the $k$-TSP and geographical fairness for the TRP -- and give algorithms to balance efficiency and fairness. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2210.06850 [pdf, other]

Sample-Then-Optimize Batch Neural Thompson Sampling

Authors: Zhongxiang Dai, Yao Shu, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Bayesian optimization (BO), which uses a Gaussian process (GP) as a surrogate to model its objective function, is popular for black-box optimization. However, due to the limitations of GPs, BO underperforms in some problems such as those with categorical, high-dimensional or image inputs. To this end, recent works have used the highly expressive neural networks (NNs) as the surrogate model and der… ▽ More Bayesian optimization (BO), which uses a Gaussian process (GP) as a surrogate to model its objective function, is popular for black-box optimization. However, due to the limitations of GPs, BO underperforms in some problems such as those with categorical, high-dimensional or image inputs. To this end, recent works have used the highly expressive neural networks (NNs) as the surrogate model and derived theoretical guarantees using the theory of neural tangent kernel (NTK). However, these works suffer from the limitations of the requirement to invert an extremely large parameter matrix and the restriction to the sequential (rather than batch) setting. To overcome these limitations, we introduce two algorithms based on the Thompson sampling (TS) policy named Sample-Then-Optimize Batch Neural TS (STO-BNTS) and STO-BNTS-Linear. To choose an input query, we only need to train an NN (resp. a linear model) and then choose the query by maximizing the trained NN (resp. linear model), which is equivalently sampled from the GP posterior with the NTK as the kernel function. As a result, our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy. Next, we derive regret upper bounds for our algorithms with batch evaluations, and use insights from batch BO and NTK to show that they are asymptotically no-regret under certain conditions. Finally, we verify their empirical effectiveness using practical AutoML and reinforcement learning experiments. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022), Extended version with proofs and additional experimental details and results, 30 pages

arXiv:2209.04748 [pdf, other]

Individual Welfare Guarantees in the Autobidding World with Machine-learned Advice

Authors: Yuan Deng, Negin Golrezaei, Patrick Jaillet, Jason Cheuk Nam Liang, Vahab Mirrokni

Abstract: Online advertising channels have commonly focused on maximizing total advertiser value (or welfare) to enhance long-run retention and channel healthiness. Previous literature has studied auction design by incorporating machine learning predictions on advertiser values (also known as machine-learned advice) through various forms to improve total welfare. Yet, such improvements could come at the cos… ▽ More Online advertising channels have commonly focused on maximizing total advertiser value (or welfare) to enhance long-run retention and channel healthiness. Previous literature has studied auction design by incorporating machine learning predictions on advertiser values (also known as machine-learned advice) through various forms to improve total welfare. Yet, such improvements could come at the cost of individual bidders' welfare and do not shed light on how particular advertiser bidding strategies impact welfare. Motivated by this, we present an analysis on an individual bidder's welfare loss in the autobidding world for auctions with and without machine-learned advice, and also uncover how advertiser strategies relate to such losses. In particular, we demonstrate how ad platforms can utilize ML advice to improve welfare guarantee on the aggregate and individual bidder level by setting ML advice as personalized reserve prices when the platform consists of autobidders who maximize value while respecting a return-on-ad spent (ROAS) constraint. Under parallel VCG auctions with such ML advice-based reserves, we present a worst-case welfare lower-bound guarantee for an individual autobidder, and show that the lower-bound guarantee is positively correlated with ML advice quality as well the scale of bids induced by the autobidder's bidding strategies. Further, we prove an impossibility result showing that no truthful, and possibly randomized mechanism with anonymous allocations can achieve universally better individual welfare guarantees than VCG, in presence of personalized reserves based on ML-advice of equal quality. Moreover, we extend our individual welfare guarantee results to generalized first price (GFP) and generalized second price (GSP) auctions. △ Less

Submitted 14 June, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

arXiv:2208.09611 [pdf, other]

Weighted Maximum Entropy Inverse Reinforcement Learning

Authors: The Viet Bui, Tien Mai, Patrick Jaillet

Abstract: We study inverse reinforcement learning (IRL) and imitation learning (IM), the problems of recovering a reward or policy function from expert's demonstrated trajectories. We propose a new way to improve the learning process by adding a weight function to the maximum entropy framework, with the motivation of having the ability to learn and recover the stochasticity (or the bounded rationality) of t… ▽ More We study inverse reinforcement learning (IRL) and imitation learning (IM), the problems of recovering a reward or policy function from expert's demonstrated trajectories. We propose a new way to improve the learning process by adding a weight function to the maximum entropy framework, with the motivation of having the ability to learn and recover the stochasticity (or the bounded rationality) of the expert policy. Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes, thus enhancing the learning procedure. Our numerical experiments using human and simulated demonstrations and with discrete and continuous IRL/IM tasks show that our approach outperforms prior algorithms. △ Less

Submitted 20 August, 2022; originally announced August 2022.

arXiv:2206.06872 [pdf, other]

On Provably Robust Meta-Bayesian Optimization

Authors: Zhongxiang Dai, Yizhou Chen, Haibin Yu, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Bayesian optimization (BO) has become popular for sequential optimization of black-box functions. When BO is used to optimize a target function, we often have access to previous evaluations of potentially related functions. This begs the question as to whether we can leverage these previous experiences to accelerate the current BO task through meta-learning (meta-BO), while ensuring robustness aga… ▽ More Bayesian optimization (BO) has become popular for sequential optimization of black-box functions. When BO is used to optimize a target function, we often have access to previous evaluations of potentially related functions. This begs the question as to whether we can leverage these previous experiences to accelerate the current BO task through meta-learning (meta-BO), while ensuring robustness against potentially harmful dissimilar tasks that could sabotage the convergence of BO. This paper introduces two scalable and provably robust meta-BO algorithms: robust meta-Gaussian process-upper confidence bound (RM-GP-UCB) and RM-GP-Thompson sampling (RM-GP-TS). We prove that both algorithms are asymptotically no-regret even when some or all previous tasks are dissimilar to the current task, and show that RM-GP-UCB enjoys a better theoretical robustness than RM-GP-TS. We also exploit the theoretical guarantees to optimize the weights assigned to individual previous tasks through regret minimization via online learning, which diminishes the impact of dissimilar tasks and hence further enhances the robustness. Empirical evaluations show that (a) RM-GP-UCB performs effectively and consistently across various applications, and (b) RM-GP-TS, despite being less robust than RM-GP-UCB both in theory and in practice, performs competitively in some scenarios with less dissimilar tasks and is more computationally efficient. △ Less

Submitted 15 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Accepted to 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022), Extended version with proofs and additional experimental details and results, 31 pages

arXiv:2205.14309 [pdf, other]

Federated Neural Bandits

Authors: Zhongxiang Dai, Yao Shu, Arun Verma, Flint Xiaofeng Fan, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Recent works on neural contextual bandits have achieved compelling performances due to their ability to leverage the strong representation power of neural networks (NNs) for reward prediction. Many applications of contextual bandits involve multiple agents who collaborate without sharing raw observations, thus giving rise to the setting of federated contextual bandits. Existing works on federated… ▽ More Recent works on neural contextual bandits have achieved compelling performances due to their ability to leverage the strong representation power of neural networks (NNs) for reward prediction. Many applications of contextual bandits involve multiple agents who collaborate without sharing raw observations, thus giving rise to the setting of federated contextual bandits. Existing works on federated contextual bandits rely on linear or kernelized bandits, which may fall short when modeling complex real-world reward functions. So, this paper introduces the federated neural-upper confidence bound (FN-UCB) algorithm. To better exploit the federated setting, FN-UCB adopts a weighted combination of two UCBs: $\text{UCB}^{a}$ allows every agent to additionally use the observations from the other agents to accelerate exploration (without sharing raw observations), while $\text{UCB}^{b}$ uses an NN with aggregated parameters for reward prediction in a similar way to federated averaging for supervised learning. Notably, the weight between the two UCBs required by our theoretical analysis is amenable to an interesting interpretation, which emphasizes $\text{UCB}^{a}$ initially for accelerated exploration and relies more on $\text{UCB}^{b}$ later after enough observations have been collected to train the NNs for accurate reward prediction (i.e., reliable exploitation). We prove sub-linear upper bounds on both the cumulative regret and the number of communication rounds of FN-UCB, and empirically demonstrate its competitive performance. △ Less

Submitted 28 February, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: ICLR 2023. Code: https://github.com/daizhongxiang/Federated-Neural-Bandits

arXiv:2205.02732 [pdf, other]

Optimal Information Provision for Strategic Hybrid Workers

Authors: Sohil Shah, Saurabh Amin, Patrick Jaillet

Abstract: We study the problem of information provision by a strategic central planner who can publicly signal about an uncertain infectious risk parameter. Signalling leads to an updated public belief over the parameter, and agents then make equilibrium choices on whether to work remotely or in-person. The planner maintains a set of desirable outcomes for each realization of the uncertain parameter and see… ▽ More We study the problem of information provision by a strategic central planner who can publicly signal about an uncertain infectious risk parameter. Signalling leads to an updated public belief over the parameter, and agents then make equilibrium choices on whether to work remotely or in-person. The planner maintains a set of desirable outcomes for each realization of the uncertain parameter and seeks to maximize the probability that agents choose an acceptable outcome for the true parameter. We distinguish between stateless and stateful objectives. In the former, the set of desirable outcomes does not change as a function of the risk parameter, whereas in the latter it does. For stateless objectives, we reduce the problem to maximizing the probability of inducing mean beliefs that lie in intervals computable from the set of desirable outcomes. We derive the optimal signalling mechanism and show that it partitions the parameter domain into at most two intervals with the signals generated according to an interval-specific distribution. For the stateful case, we consider a practically relevant situation in which the planner can enforce in-person work capacity limits that progressively get more stringent as the risk parameter increases. We show that the optimal signalling mechanism for this case can be obtained by solving a linear program. We numerically verify the improvement in achieving desirable outcomes using our information design relative to no information and full information benchmarks. △ Less

Submitted 5 May, 2022; originally announced May 2022.

arXiv:2203.05067 [pdf, ps, other]

Universal Regression with Adversarial Responses

Authors: Moïse Blanchard, Patrick Jaillet

Abstract: We provide algorithms for regression with adversarial responses under large classes of non-i.i.d. instance sequences, on general separable metric spaces, with provably minimal assumptions. We also give characterizations of learnability in this regression context. We consider universal consistency which asks for strong consistency of a learner without restrictions on the value responses. Our analys… ▽ More We provide algorithms for regression with adversarial responses under large classes of non-i.i.d. instance sequences, on general separable metric spaces, with provably minimal assumptions. We also give characterizations of learnability in this regression context. We consider universal consistency which asks for strong consistency of a learner without restrictions on the value responses. Our analysis shows that such an objective is achievable for a significantly larger class of instance sequences than stationary processes, and unveils a fundamental dichotomy between value spaces: whether finite-horizon mean estimation is achievable or not. We further provide optimistically universal learning rules, i.e., such that if they fail to achieve universal consistency, any other algorithms will fail as well. For unbounded losses, we propose a mild integrability condition under which there exist algorithms for adversarial regression under large classes of non-i.i.d. instance sequences. In addition, our analysis also provides a learning rule for mean estimation in general metric spaces that is consistent under adversarial responses without any moment conditions on the sequence, a result of independent interest. △ Less

Submitted 9 June, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

arXiv:2202.13597 [pdf, other]

Rectified Max-Value Entropy Search for Bayesian Optimization

Authors: Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Although the existing max-value entropy search (MES) is based on the widely celebrated notion of mutual information, its empirical performance can suffer due to two misconceptions whose implications on the exploration-exploitation trade-off are investigated in this paper. These issues are essential in the development of future acquisition functions and the improvement of the existing ones as they… ▽ More Although the existing max-value entropy search (MES) is based on the widely celebrated notion of mutual information, its empirical performance can suffer due to two misconceptions whose implications on the exploration-exploitation trade-off are investigated in this paper. These issues are essential in the development of future acquisition functions and the improvement of the existing ones as they encourage an accurate measure of the mutual information such as the rectified MES (RMES) acquisition function we develop in this work. Unlike the evaluation of MES, we derive a closed-form probability density for the observation conditioned on the max-value and employ stochastic gradient ascent with reparameterization to efficiently optimize RMES. As a result of a more principled acquisition function, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2112.15364 [pdf, other]

Robust Entropy-regularized Markov Decision Processes

Authors: Tien Mai, Patrick Jaillet

Abstract: Stochastic and soft optimal policies resulting from entropy-regularized Markov decision processes (ER-MDP) are desirable for exploration and imitation learning applications. Motivated by the fact that such policies are sensitive with respect to the state transition probabilities, and the estimation of these probabilities may be inaccurate, we study a robust version of the ER-MDP model, where the s… ▽ More Stochastic and soft optimal policies resulting from entropy-regularized Markov decision processes (ER-MDP) are desirable for exploration and imitation learning applications. Motivated by the fact that such policies are sensitive with respect to the state transition probabilities, and the estimation of these probabilities may be inaccurate, we study a robust version of the ER-MDP model, where the stochastic optimal policies are required to be robust with respect to the ambiguity in the underlying transition probabilities. Our work is at the crossroads of two important schemes in reinforcement learning (RL), namely, robust MDP and entropy regularized MDP. We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings, making the robust ER-MDP problem tractable. We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy iteration, which would lead to new robust RL and inverse RL algorithms to handle uncertainties. Analyses on computational complexity and error propagation under conventional uncertainty settings are also provided. △ Less

Submitted 31 December, 2021; originally announced December 2021.

arXiv:2110.14153 [pdf, other]

Differentially Private Federated Bayesian Optimization with Distributed Exploration

Authors: Zhongxiang Dai, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Bayesian optimization (BO) has recently been extended to the federated learning (FL) setting by the federated Thompson sampling (FTS) algorithm, which has promising applications such as federated hyperparameter tuning. However, FTS is not equipped with a rigorous privacy guarantee which is an important consideration in FL. Recent works have incorporated differential privacy (DP) into the training… ▽ More Bayesian optimization (BO) has recently been extended to the federated learning (FL) setting by the federated Thompson sampling (FTS) algorithm, which has promising applications such as federated hyperparameter tuning. However, FTS is not equipped with a rigorous privacy guarantee which is an important consideration in FL. Recent works have incorporated differential privacy (DP) into the training of deep neural networks through a general framework for adding DP to iterative algorithms. Following this general DP framework, our work here integrates DP into FTS to preserve user-level privacy. We also leverage the ability of this general DP framework to handle different parameter vectors, as well as the technique of local modeling for BO, to further improve the utility of our algorithm through distributed exploration (DE). The resulting differentially private FTS with DE (DP-FTS-DE) algorithm is endowed with theoretical guarantees for both the privacy and utility and is amenable to interesting theoretical insights about the privacy-utility trade-off. We also use real-world experiments to show that DP-FTS-DE achieves high utility (competitive performance) with a strong privacy guarantee (small privacy loss) and induces a trade-off between privacy and utility. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: Accepted to 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Extended version with proofs and additional experimental details and results, 29 pages

arXiv:2107.14465 [pdf, other]

Trusted-Maximizers Entropy Search for Efficient Bayesian Optimization

Authors: Quoc Phong Nguyen, Zhaoxuan Wu, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Information-based Bayesian optimization (BO) algorithms have achieved state-of-the-art performance in optimizing a black-box objective function. However, they usually require several approximations or simplifying assumptions (without clearly understanding their effects on the BO performance) and/or their generalization to batch BO is computationally unwieldy, especially with an increasing batch si… ▽ More Information-based Bayesian optimization (BO) algorithms have achieved state-of-the-art performance in optimizing a black-box objective function. However, they usually require several approximations or simplifying assumptions (without clearly understanding their effects on the BO performance) and/or their generalization to batch BO is computationally unwieldy, especially with an increasing batch size. To alleviate these issues, this paper presents a novel trusted-maximizers entropy search (TES) acquisition function: It measures how much an input query contributes to the information gain on the maximizer over a finite set of trusted maximizers, i.e., inputs optimizing functions that are sampled from the Gaussian process posterior belief of the objective function. Evaluating TES requires either only a stochastic approximation with sampling or a deterministic approximation with expectation propagation, both of which are investigated and empirically evaluated using synthetic benchmark objective functions and real-world optimization problems, e.g., hyperparameter tuning of a convolutional neural network and synthesizing 'physically realizable' faces to fool a black-box face recognition system. Though TES can naturally be generalized to a batch variant with either approximation, the latter is amenable to be scaled to a much larger batch size in our experiments. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: Published as a conference paper at UAI 2021

arXiv:2107.07725 [pdf, other]

Learning to Price against a Budget and ROI Constrained Buyer

Authors: Negin Golrezaei, Patrick Jaillet, Jason Cheuk Nam Liang, Vahab Mirrokni

Abstract: Internet advertisers (buyers) repeatedly procure ad impressions from ad platforms (sellers) with the aim to maximize total conversion (i.e. ad value) while respecting both budget and return-on-investment (ROI) constraints for efficient utilization of limited monetary resources. Facing such a constrained buyer who aims to learn her optimal strategy to acquire impressions, we study from a seller's p… ▽ More Internet advertisers (buyers) repeatedly procure ad impressions from ad platforms (sellers) with the aim to maximize total conversion (i.e. ad value) while respecting both budget and return-on-investment (ROI) constraints for efficient utilization of limited monetary resources. Facing such a constrained buyer who aims to learn her optimal strategy to acquire impressions, we study from a seller's perspective how to learn and price ad impressions through repeated posted price mechanisms to maximize revenue. For this two-sided learning setup, we propose a learning algorithm for the seller that utilizes an episodic binary-search procedure to identify a revenue-optimal selling price. We show that such a simple learning algorithm enjoys low seller regret when within each episode, the budget and ROI constrained buyer approximately best responds to the posted price. We present simple yet natural buyer's bidding algorithms under which the buyer approximately best responds while satisfying budget and ROI constraints, leading to a low regret for our proposed seller pricing algorithm. The design of our seller algorithm is motivated by the fact that the seller's revenue function admits a bell-shaped structure when the buyer best responds to prices under budget and ROI constraints, enabling our seller algorithm to identify revenue-optimal selling prices efficiently. △ Less

Submitted 7 February, 2023; v1 submitted 16 July, 2021; originally announced July 2021.

arXiv:2105.06126 [pdf, other]

Value-at-Risk Optimization with Gaussian Processes

Authors: Quoc Phong Nguyen, Zhongxiang Dai, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Value-at-risk (VaR) is an established measure to assess risks in critical real-world applications with random environmental factors. This paper presents a novel VaR upper confidence bound (V-UCB) algorithm for maximizing the VaR of a black-box objective function with the first no-regret guarantee. To realize this, we first derive a confidence bound of VaR and then prove the existence of values of… ▽ More Value-at-risk (VaR) is an established measure to assess risks in critical real-world applications with random environmental factors. This paper presents a novel VaR upper confidence bound (V-UCB) algorithm for maximizing the VaR of a black-box objective function with the first no-regret guarantee. To realize this, we first derive a confidence bound of VaR and then prove the existence of values of the environmental random variable (to be selected to achieve no regret) such that the confidence bound of VaR lies within that of the objective function evaluated at such values. Our V-UCB algorithm empirically demonstrates state-of-the-art performance in optimizing synthetic benchmark functions, a portfolio optimization problem, and a simulated robot task. △ Less

Submitted 13 May, 2021; originally announced May 2021.

arXiv:2104.08472 [pdf, other]

Convolutional Normalizing Flows for Deep Gaussian Processes

Authors: Haibin Yu, Dapeng Liu, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: Deep Gaussian processes (DGPs), a hierarchical composition of GP models, have successfully boosted the expressive power of their single-layer counterpart. However, it is impossible to perform exact inference in DGPs, which has motivated the recent development of variational inference-based methods. Unfortunately, either these methods yield a biased posterior belief or it is difficult to evaluate t… ▽ More Deep Gaussian processes (DGPs), a hierarchical composition of GP models, have successfully boosted the expressive power of their single-layer counterpart. However, it is impossible to perform exact inference in DGPs, which has motivated the recent development of variational inference-based methods. Unfortunately, either these methods yield a biased posterior belief or it is difficult to evaluate their convergence. This paper introduces a new approach for specifying flexible, arbitrarily complex, and scalable approximate posterior distributions. The posterior distribution is constructed through a normalizing flow (NF) which transforms a simple initial probability into a more complex one through a sequence of invertible transformations. Moreover, a novel convolutional normalizing flow (CNF) is developed to improve the time efficiency and capture dependency between layers. Empirical evaluation shows that CNF DGP outperforms the state-of-the-art approximation methods for DGPs. △ Less

Submitted 26 May, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

Comments: To appear in Proceedings of the International Joint Conference on Neural Networks 2021 (IJCNN'21). arXiv admin note: text overlap with arXiv:1910.11998

arXiv:2102.09132 [pdf, other]

Efficient Carpooling and Toll Pricing for Autonomous Transportation

Authors: Saurabh Amin, Patrick Jaillet, Manxi Wu

Abstract: In this paper, we address the existence and computation of competitive equilibrium in the transportation market for autonomous carpooling first proposed by [Ostrovsky and Schwarz, 2019]. At equilibrium, the market organizes carpooled trips over a transportation network in a socially optimal manner and sets the corresponding payments for individual riders and toll prices on edges. The market outcom… ▽ More In this paper, we address the existence and computation of competitive equilibrium in the transportation market for autonomous carpooling first proposed by [Ostrovsky and Schwarz, 2019]. At equilibrium, the market organizes carpooled trips over a transportation network in a socially optimal manner and sets the corresponding payments for individual riders and toll prices on edges. The market outcome ensures individual rationality, stability of carpooled trips, budget balance, and market clearing properties under heterogeneous rider preferences. We show that the question of market's existence can be resolved by proving the existence of an integer optimal solution of a linear programming problem. We characterize conditions on the network topology and riders' disutility for carpooling under which a market equilibrium can be computed in polynomial time. This characterization relies on ideas from the theory of combinatorial auctions and minimum cost network flow problem. Finally, we characterize a market equilibrium that achieves strategyproofness and maximizes welfare of individual riders. △ Less

Submitted 17 February, 2021; originally announced February 2021.

arXiv:2012.10695 [pdf, other]

An Information-Theoretic Framework for Unifying Active Learning Problems

Authors: Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: This paper presents an information-theoretic framework for unifying active learning problems: level set estimation (LSE), Bayesian optimization (BO), and their generalized variant. We first introduce a novel active learning criterion that subsumes an existing LSE algorithm and achieves state-of-the-art performance in LSE problems with a continuous input domain. Then, by exploiting the relationship… ▽ More This paper presents an information-theoretic framework for unifying active learning problems: level set estimation (LSE), Bayesian optimization (BO), and their generalized variant. We first introduce a novel active learning criterion that subsumes an existing LSE algorithm and achieves state-of-the-art performance in LSE problems with a continuous input domain. Then, by exploiting the relationship between LSE and BO, we design a competitive information-theoretic acquisition function for BO that has interesting connections to upper confidence bound and max-value entropy search (MES). The latter connection reveals a drawback of MES which has important implications on not only MES but also on other MES-based acquisition functions. Finally, our unifying information-theoretic framework can be applied to solve a generalized problem of LSE and BO involving multiple level sets in a data-efficient manner. We empirically evaluate the performance of our proposed algorithms using synthetic benchmark functions, a real-world dataset, and in hyperparameter tuning of machine learning models. △ Less

Submitted 19 December, 2020; originally announced December 2020.

Comments: 35th AAAI Conference on Artificial Intelligence (AAAI 2021), Extended version with derivations, 12 pages

arXiv:2012.10688 [pdf, other]

Top-$k$ Ranking Bayesian Optimization

Authors: Quoc Phong Nguyen, Sebastian Tay, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: This paper presents a novel approach to top-$k$ ranking Bayesian optimization (top-$k$ ranking BO) which is a practical and significant generalization of preferential BO to handle top-$k$ ranking and tie/indifference observations. We first design a surrogate model that is not only capable of catering to the above observations, but is also supported by a classic random utility model. Another equall… ▽ More This paper presents a novel approach to top-$k$ ranking Bayesian optimization (top-$k$ ranking BO) which is a practical and significant generalization of preferential BO to handle top-$k$ ranking and tie/indifference observations. We first design a surrogate model that is not only capable of catering to the above observations, but is also supported by a classic random utility model. Another equally important contribution is the introduction of the first information-theoretic acquisition function in BO with preferential observation called multinomial predictive entropy search (MPES) which is flexible in handling these observations and optimized for all inputs of a query jointly. MPES possesses superior performance compared with existing acquisition functions that select the inputs of a query one at a time greedily. We empirically evaluate the performance of MPES using several synthetic benchmark functions, CIFAR-$10$ dataset, and SUSHI preference dataset. △ Less

Submitted 19 December, 2020; originally announced December 2020.

Comments: 35th AAAI Conference on Artificial Intelligence (AAAI 2021), Extended version with derivations, 13 pages

arXiv:2011.03653 [pdf, other]

No-regret Learning in Price Competitions under Consumer Reference Effects

Authors: Negin Golrezaei, Patrick Jaillet, Jason Cheuk Nam Liang

Abstract: We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices vary over time according to a memory-based dynamic, which is a weighted average of all historical prices. We focus on the setting where firms are not aware of demand functio… ▽ More We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices vary over time according to a memory-based dynamic, which is a weighted average of all historical prices. We focus on the setting where firms are not aware of demand functions and how reference prices are formed but have access to an oracle that provides a measure of consumers' responsiveness to the current posted prices. We show that if the firms run no-regret algorithms, in particular, online mirror descent(OMD), with decreasing step sizes, the market stabilizes in the sense that firms' prices and reference prices converge to a stable Nash Equilibrium (SNE). Interestingly, we also show that there exist constant step sizesunder which the market stabilizes. We further characterize the rate of convergence to the SNE for both decreasing and constant OMD step sizes. △ Less

Submitted 21 March, 2022; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: Includes minor correction to statement of Theorem 5.1 and relevant assumptions in NeurIPS camera ready version

arXiv:2010.12883 [pdf, other]

Variational Bayesian Unlearning

Authors: Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet

Abstract: This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler divergence between the approximate posterior belief of model parameters after directly unlearning from erased data vs. the exact posterior belief from retraining with remaining data. Using the variational… ▽ More This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler divergence between the approximate posterior belief of model parameters after directly unlearning from erased data vs. the exact posterior belief from retraining with remaining data. Using the variational inference (VI) framework, we show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief given the full data (i.e., including the remaining data); the latter prevents catastrophic unlearning that can render the model useless. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging. We propose two novel tricks to tackle this challenge. We empirically demonstrate our unlearning methods on Bayesian models such as sparse Gaussian process and logistic regression using synthetic and real-world datasets. △ Less

Submitted 24 October, 2020; originally announced October 2020.

Comments: 34th Annual Conference on Neural Information Processing Systems (NeurIPS 2020), Extended version with proofs, 22 pages

arXiv:2010.10154 [pdf, other]

Federated Bayesian Optimization via Thompson Sampling

Authors: Zhongxiang Dai, Kian Hsiang Low, Patrick Jaillet

Abstract: Bayesian optimization (BO) is a prominent approach to optimizing expensive-to-evaluate black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to a surging interest in federated learning (FL) which focuses on collaborative training of deep neural networks (DNNs) via first-order optimization techniques. However, some co… ▽ More Bayesian optimization (BO) is a prominent approach to optimizing expensive-to-evaluate black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to a surging interest in federated learning (FL) which focuses on collaborative training of deep neural networks (DNNs) via first-order optimization techniques. However, some common machine learning tasks such as hyperparameter tuning of DNNs lack access to gradients and thus require zeroth-order/black-box optimization. This hints at the possibility of extending BO to the FL setting (FBO) for agents to collaborate in these black-box optimization tasks. This paper presents federated Thompson sampling (FTS) which overcomes a number of key challenges of FBO and FL in a principled way: We (a) use random Fourier features to approximate the Gaussian process surrogate model used in BO, which naturally produces the parameters to be exchanged between agents, (b) design FTS based on Thompson sampling, which significantly reduces the number of parameters to be exchanged, and (c) provide a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO. We empirically demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance. △ Less

Submitted 22 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: Accepted to 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Extended version with proofs and additional experimental details and results, 25 pages

arXiv:2009.07925 [pdf, other]

Competitive Ratios for Online Multi-capacity Ridesharing

Authors: Meghna Lowalekar, Pradeep Varakantham, Patrick Jaillet

Abstract: In multi-capacity ridesharing, multiple requests (e.g., customers, food items, parcels) with different origin and destination pairs travel in one resource. In recent years, online multi-capacity ridesharing services (i.e., where assignments are made online) like Uber-pool, foodpanda, and on-demand shuttles have become hugely popular in transportation, food delivery, logistics and other domains. Th… ▽ More In multi-capacity ridesharing, multiple requests (e.g., customers, food items, parcels) with different origin and destination pairs travel in one resource. In recent years, online multi-capacity ridesharing services (i.e., where assignments are made online) like Uber-pool, foodpanda, and on-demand shuttles have become hugely popular in transportation, food delivery, logistics and other domains. This is because multi-capacity ridesharing services benefit all parties involved { the customers (due to lower costs), the drivers (due to higher revenues) and the matching platforms (due to higher revenues per vehicle/resource). Most importantly these services can also help reduce carbon emissions (due to fewer vehicles on roads). Online multi-capacity ridesharing is extremely challenging as the underlying matching graph is no longer bipartite (as in the unit-capacity case) but a tripartite graph with resources (e.g., taxis, cars), requests and request groups (combinations of requests that can travel together). The desired matching between resources and request groups is constrained by the edges between requests and request groups in this tripartite graph (i.e., a request can be part of at most one request group in the final assignment). While there have been myopic heuristic approaches employed for solving the online multi-capacity ridesharing problem, they do not provide any guarantees on the solution quality. To that end, this paper presents the first approach with bounds on the competitive ratio for online multi-capacity ridesharing (when resources rejoin the system at their initial location/depot after serving a group of requests). △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: 28 pages, 4 Figures

arXiv:2009.06051 [pdf, other]

Zone pAth Construction (ZAC) based Approaches for Effective Real-Time Ridesharing

Authors: Meghna Lowalekar, Pradeep Varakantham, Patrick Jaillet

Abstract: Real-time ridesharing systems such as UberPool, Lyft Line, GrabShare have become hugely popular as they reduce the costs for customers, improve per trip revenue for drivers and reduce traffic on the roads by grouping customers with similar itineraries. The key challenge in these systems is to group the "right" requests to travel together in the "right" available vehicles in real-time, so that the… ▽ More Real-time ridesharing systems such as UberPool, Lyft Line, GrabShare have become hugely popular as they reduce the costs for customers, improve per trip revenue for drivers and reduce traffic on the roads by grouping customers with similar itineraries. The key challenge in these systems is to group the "right" requests to travel together in the "right" available vehicles in real-time, so that the objective (e.g., requests served, revenue or delay) is optimized. This challenge has been addressed in existing work by: (i) generating as many relevant feasible (with respect to the available delay for customers) combinations of requests as possible in real-time; and then (ii) optimizing assignment of the feasible request combinations to vehicles. Since the number of request combinations increases exponentially with the increase in vehicle capacity and number of requests, unfortunately, such approaches have to employ ad hoc heuristics to identify a subset of request combinations for assignment. Our key contribution is in developing approaches that employ zone (abstraction of individual locations) paths instead of request combinations. Zone paths allow for generation of significantly more "relevant" combinations (in comparison to ad hoc heuristics) in real-time than competing approaches due to two reasons: (i) Each zone path can typically represent multiple request combinations; (ii) Zone paths are generated using a combination of offline and online methods. Specifically, we contribute both myopic (ridesharing assignment focussed on current requests only) and non-myopic (ridesharing assignment considers impact on expected future requests) approaches that employ zone paths. In our experimental results, we demonstrate that our myopic approach outperforms (with respect to both objective and runtime) the current best myopic approach for ridesharing on both real-world and synthetic datasets. △ Less

Submitted 13 September, 2020; originally announced September 2020.

Comments: 48 pages, 22 figures

arXiv:2008.08048 [pdf, other]

Learning Structure in Nested Logit Models

Authors: Youssef M. Aboutaleb, Moshe Ben-Akiva, Patrick Jaillet

Abstract: This paper introduces a new data-driven methodology for nested logit structure discovery. Nested logit models allow the modeling of positive correlations between the error terms of the utility specifications of the different alternatives in a discrete choice scenario through the specification of a nesting structure. Current nested logit model estimation practices require an a priori specification… ▽ More This paper introduces a new data-driven methodology for nested logit structure discovery. Nested logit models allow the modeling of positive correlations between the error terms of the utility specifications of the different alternatives in a discrete choice scenario through the specification of a nesting structure. Current nested logit model estimation practices require an a priori specification of a nesting structure by the modeler. In this we work we optimize over all possible specifications of the nested logit model that are consistent with rational utility maximization. We formulate the problem of learning an optimal nesting structure from the data as a mixed integer nonlinear programming (MINLP) optimization problem and solve it using a variant of the linear outer approximation algorithm. We exploit the tree structure of the problem and utilize the latest advances in integer optimization to bring practical tractability to the optimization problem we introduce. We demonstrate the ability of our algorithm to correctly recover the true nesting structure from synthetic data in a Monte Carlo experiment. In an empirical illustration using a stated preference survey on modes of transportation in the U.S. state of Massachusetts, we use our algorithm to obtain an optimal nesting tree representing the correlations between the unobserved effects of the different travel mode choices. We provide our implementation as a customizable and open-source code base written in the Julia programming language. △ Less

Submitted 18 August, 2020; originally announced August 2020.

arXiv:2008.07820 [pdf, other]

A Relation Analysis of Markov Decision Process Frameworks

Authors: Tien Mai, Patrick Jaillet

Abstract: We study the relation between different Markov Decision Process (MDP) frameworks in the machine learning and econometrics literatures, including the standard MDP, the entropy and general regularized MDP, and stochastic MDP, where the latter is based on the assumption that the reward function is stochastic and follows a given distribution. We show that the entropy-regularized MDP is equivalent to a… ▽ More We study the relation between different Markov Decision Process (MDP) frameworks in the machine learning and econometrics literatures, including the standard MDP, the entropy and general regularized MDP, and stochastic MDP, where the latter is based on the assumption that the reward function is stochastic and follows a given distribution. We show that the entropy-regularized MDP is equivalent to a stochastic MDP model, and is strictly subsumed by the general regularized MDP. Moreover, we propose a distributional stochastic MDP framework by assuming that the distribution of the reward function is ambiguous. We further show that the distributional stochastic MDP is equivalent to the regularized MDP, in the sense that they always yield the same optimal policies. We also provide a connection between stochastic/regularized MDP and constrained MDP. Our work gives a unified view on several important MDP frameworks, which would lead new ways to interpret the (entropy/general) regularized MDP frameworks through the lens of stochastic rewards and vice-versa. Given the recent popularity of regularized MDP in (deep) reinforcement learning, our work brings new understandings of how such algorithmic schemes work and suggest ideas to develop new ones. △ Less

Submitted 18 August, 2020; originally announced August 2020.

arXiv:2006.16679 [pdf, other]

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

Authors: Zhongxiang Dai, Yizhou Chen, Kian Hsiang Low, Patrick Jaillet, Teck-Hua Ho

Abstract: This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship amon… ▽ More This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning. △ Less

Submitted 30 June, 2020; originally announced June 2020.

Comments: Accepted to 37th International Conference on Machine Learning (ICML 2020), Extended version with proofs and additional experimental details and results, 27 pages

Showing 1–50 of 88 results for author: Jaillet, P