-
Mitigating Cognitive Biases in Multi-Criteria Crowd Assessment
Authors:
Shun Ito,
Hisashi Kashima
Abstract:
Crowdsourcing is an easy, cheap, and fast way to perform large scale quality assessment; however, human judgments are often influenced by cognitive biases, which lowers their credibility. In this study, we focus on cognitive biases associated with a multi-criteria assessment in crowdsourcing; crowdworkers who rate targets with multiple different criteria simultaneously may provide biased responses…
▽ More
Crowdsourcing is an easy, cheap, and fast way to perform large scale quality assessment; however, human judgments are often influenced by cognitive biases, which lowers their credibility. In this study, we focus on cognitive biases associated with a multi-criteria assessment in crowdsourcing; crowdworkers who rate targets with multiple different criteria simultaneously may provide biased responses due to prominence of some criteria or global impressions of the evaluation targets. To identify and mitigate such biases, we first create evaluation datasets using crowdsourcing and investigate the effect of inter-criteria cognitive biases on crowdworker responses. Then, we propose two specific model structures for Bayesian opinion aggregation models that consider inter-criteria relations. Our experiments show that incorporating our proposed structures into the aggregation model is effective to reduce the cognitive biases and help obtain more accurate aggregation results.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Speed-accuracy trade-off for the diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport
Authors:
Kotaro Ikeda,
Tomoya Uda,
Daisuke Okanohara,
Sosuke Ito
Abstract:
We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-accuracy trade-off for the diffusion models, which is a trade-off relationship between the speed and accuracy of data generation in diffusion mode…
▽ More
We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-accuracy trade-off for the diffusion models, which is a trade-off relationship between the speed and accuracy of data generation in diffusion models. Our result implies that the entropy production rate in the forward process affects the errors in data generation. From a stochastic thermodynamic perspective, our results provide quantitative insight into how best to generate data in diffusion models. The optimal learning protocol is introduced by the conservative force in stochastic thermodynamics and the geodesic of space by the 2-Wasserstein distance in optimal transport theory. We numerically illustrate the validity of the speed-accuracy trade-off for the diffusion models with different noise schedules such as the cosine schedule, the conditional optimal transport, and the optimal transport.
△ Less
Submitted 22 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of $Θ(T^{2/3})$ and its Application to Best-of-Both-Worlds
Authors:
Taira Tsuchiya,
Shinji Ito
Abstract:
Follow-the-Regularized-Leader (FTRL) is a powerful framework for various online learning problems. By designing its regularizer and learning rate to be adaptive to past observations, FTRL is known to work adaptively to various properties of an underlying environment. However, most existing adaptive learning rates are for online learning problems with a minimax regret of $Θ(\sqrt{T})$ for the numbe…
▽ More
Follow-the-Regularized-Leader (FTRL) is a powerful framework for various online learning problems. By designing its regularizer and learning rate to be adaptive to past observations, FTRL is known to work adaptively to various properties of an underlying environment. However, most existing adaptive learning rates are for online learning problems with a minimax regret of $Θ(\sqrt{T})$ for the number of rounds $T$, and there are only a few studies on adaptive learning rates for problems with a minimax regret of $Θ(T^{2/3})$, which include several important problems dealing with indirect feedback. To address this limitation, we establish a new adaptive learning rate framework for problems with a minimax regret of $Θ(T^{2/3})$. Our learning rate is designed by matching the stability, penalty, and bias terms that naturally appear in regret upper bounds for problems with a minimax regret of $Θ(T^{2/3})$. As applications of this framework, we consider two major problems dealing with indirect feedback: partial monitoring and graph bandits. We show that FTRL with our learning rate and the Tsallis entropy regularizer improves existing Best-of-Both-Worlds (BOBW) regret upper bounds, which achieve simultaneous optimality in the stochastic and adversarial regimes. The resulting learning rate is surprisingly simple compared to the existing learning rates for BOBW algorithms for problems with a minimax regret of $Θ(T^{2/3})$.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Learning with Posterior Sampling for Revenue Management under Time-varying Demand
Authors:
Kazuma Shimizu,
Junya Honda,
Shinji Ito,
Shinji Nakadai
Abstract:
This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managin…
▽ More
This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managing the remaining inventory and estimating the demand. To tackle this challenge, we first introduce an episodic generalization of the RM problem motivated by typical application scenarios. We then propose a computationally efficient algorithm based on posterior sampling, which effectively optimizes prices by solving linear programming. We derive a Bayesian regret upper bound of this algorithm for general models where demand parameters can be correlated between time periods, while also deriving a regret lower bound for generic algorithms. Our empirical study shows that the proposed algorithm performs better than other benchmark algorithms and comparably to the optimal policy in hindsight. We also propose a heuristic modification of the proposed algorithm, which further efficiently learns the pricing policy in the experiments.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Online $\mathrm{L}^{\natural}$-Convex Minimization
Authors:
Ken Yokoyama,
Shinji Ito,
Tatsuya Matsuoka,
Kei Kimura,
Makoto Yokoo
Abstract:
An online decision-making problem is a learning problem in which a player repeatedly makes decisions in order to minimize the long-term loss. These problems that emerge in applications often have nonlinear combinatorial objective functions, and developing algorithms for such problems has attracted considerable attention. An existing general framework for dealing with such objective functions is th…
▽ More
An online decision-making problem is a learning problem in which a player repeatedly makes decisions in order to minimize the long-term loss. These problems that emerge in applications often have nonlinear combinatorial objective functions, and developing algorithms for such problems has attracted considerable attention. An existing general framework for dealing with such objective functions is the online submodular minimization. However, practical problems are often out of the scope of this framework, since the domain of a submodular function is limited to a subset of the unit hypercube. To manage this limitation of the existing framework, we in this paper introduce the online $\mathrm{L}^{\natural}$-convex minimization, where an $\mathrm{L}^{\natural}$-convex function generalizes a submodular function so that the domain is a subset of the integer lattice. We propose computationally efficient algorithms for the online $\mathrm{L}^{\natural}$-convex function minimization in two major settings: the full information and the bandit settings. We analyze the regrets of these algorithms and show in particular that our algorithm for the full information setting obtains a tight regret bound up to a constant factor. We also demonstrate several motivating examples that illustrate the usefulness of the online $\mathrm{L}^{\natural}$-convex minimization.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Online Multi-Agent Pickup and Delivery with Task Deadlines
Authors:
Hiroya Makino,
Seigo Ito
Abstract:
Managing delivery deadlines in automated warehouses and factories is crucial for maintaining customer satisfaction and ensuring seamless production. This study introduces the problem of online multi-agent pickup and delivery with task deadlines (MAPD-D), an advanced variant of the online MAPD problem incorporating delivery deadlines. In the MAPD problem, agents must manage a continuous stream of d…
▽ More
Managing delivery deadlines in automated warehouses and factories is crucial for maintaining customer satisfaction and ensuring seamless production. This study introduces the problem of online multi-agent pickup and delivery with task deadlines (MAPD-D), an advanced variant of the online MAPD problem incorporating delivery deadlines. In the MAPD problem, agents must manage a continuous stream of delivery tasks online. Tasks are added at any time. Agents must complete their tasks while avoiding collisions with each other. MAPD-D introduces a dynamic, deadline-driven approach that incorporates task deadlines, challenging the conventional MAPD frameworks. To tackle MAPD-D, we propose a novel algorithm named deadline-aware token passing (D-TP). The D-TP algorithm calculates pickup deadlines and assigns tasks while balancing execution cost and deadline proximity. Additionally, we introduce the D-TP with task swaps (D-TPTS) method to further reduce task tardiness, enhancing flexibility and efficiency through task-swapping strategies. Numerical experiments were conducted in simulated warehouse environments to showcase the effectiveness of the proposed methods. Both D-TP and D-TPTS demonstrated significant reductions in task tardiness compared to existing methods. Our methods contribute to efficient operations in automated warehouses and factories with delivery deadlines.
△ Less
Submitted 27 August, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
MARPF: Multi-Agent and Multi-Rack Path Finding
Authors:
Hiroya Makino,
Yoshihiro Ohama,
Seigo Ito
Abstract:
In environments where many automated guided vehicles (AGVs) operate, planning efficient, collision-free paths is essential. Related research has mainly focused on environments with pre-defined passages, resulting in space inefficiency. We attempt to relax this assumption. In this study, we define multi-agent and multi-rack path finding (MARPF) as the problem of planning paths for AGVs to convey ta…
▽ More
In environments where many automated guided vehicles (AGVs) operate, planning efficient, collision-free paths is essential. Related research has mainly focused on environments with pre-defined passages, resulting in space inefficiency. We attempt to relax this assumption. In this study, we define multi-agent and multi-rack path finding (MARPF) as the problem of planning paths for AGVs to convey target racks to their designated locations in environments without passages. In such environments, an AGV without a rack can pass under racks, whereas one with a rack cannot pass under racks to avoid collisions. MARPF entails conveying the target racks without collisions, while the obstacle racks are relocated to prevent any interference with the target racks. We formulated MARPF as an integer linear programming problem in a network flow. To distinguish situations in which an AGV is or is not loading a rack, the proposed method introduces two virtual layers into the network. We optimized the AGVs' movements to move obstacle racks and convey the target racks. The formulation and applicability of the algorithm were validated through numerical experiments. The results indicated that the proposed algorithm addressed issues in environments with dense racks.
△ Less
Submitted 27 August, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Follow-the-Perturbed-Leader with Fréchet-type Tail Distributions: Optimality in Adversarial Bandits and Best-of-Both-Worlds
Authors:
Jongyeong Lee,
Junya Honda,
Shinji Ito,
Min-hwan Oh
Abstract:
This paper studies the optimality of the Follow-the-Perturbed-Leader (FTPL) policy in both adversarial and stochastic $K$-armed bandits. Despite the widespread use of the Follow-the-Regularized-Leader (FTRL) framework with various choices of regularization, the FTPL framework, which relies on random perturbations, has not received much attention, despite its inherent simplicity. In adversarial ban…
▽ More
This paper studies the optimality of the Follow-the-Perturbed-Leader (FTPL) policy in both adversarial and stochastic $K$-armed bandits. Despite the widespread use of the Follow-the-Regularized-Leader (FTRL) framework with various choices of regularization, the FTPL framework, which relies on random perturbations, has not received much attention, despite its inherent simplicity. In adversarial bandits, there has been conjecture that FTPL could potentially achieve $\mathcal{O}(\sqrt{KT})$ regrets if perturbations follow a distribution with a Fréchet-type tail. Recent work by Honda et al. (2023) showed that FTPL with Fréchet distribution with shape $α=2$ indeed attains this bound and, notably logarithmic regret in stochastic bandits, meaning the Best-of-Both-Worlds (BOBW) capability of FTPL. However, this result only partly resolves the above conjecture because their analysis heavily relies on the specific form of the Fréchet distribution with this shape. In this paper, we establish a sufficient condition for perturbations to achieve $\mathcal{O}(\sqrt{KT})$ regrets in the adversarial setting, which covers, e.g., Fréchet, Pareto, and Student-$t$ distributions. We also demonstrate the BOBW achievability of FTPL with certain Fréchet-type tail distributions. Our results contribute not only to resolving existing conjectures through the lens of extreme value theory but also potentially offer insights into the effect of the regularization functions in FTRL through the mapping from FTPL to FTRL.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Authors:
Masahiro Kato,
Shinji Ito
Abstract:
This study considers the linear contextual bandit problem with independent and identically distributed (i.i.d.) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy $O(\log^2(T))$ for the number of rounds $T$ in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying $O(\sqrt{T})$ in an adv…
▽ More
This study considers the linear contextual bandit problem with independent and identically distributed (i.i.d.) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy $O(\log^2(T))$ for the number of rounds $T$ in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying $O(\sqrt{T})$ in an adversarial regime. However, the dependency on $T$ has room for improvement, and the suboptimality-gap assumption can be relaxed. For this issue, this study proposes an algorithm whose regret satisfies $O(\log(T))$ in the setting when the suboptimality gap is lower-bounded. Furthermore, we introduce a margin condition, a milder assumption on the suboptimality gap. That condition characterizes the problem difficulty linked to the suboptimality gap using a parameter $β\in (0, \infty]$. We then show that the algorithm's regret satisfies $O\left(\left\{\log(T)\right\}^{\frac{1+β}{2+β}}T^{\frac{1}{2+β}}\right)$. Here, $β= \infty$ corresponds to the case in the existing studies where a lower bound exists in the suboptimality gap, and our regret satisfies $O(\log(T))$ in that case. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $α$-Linear-Contextual (LC)-Tsallis-INF.
△ Less
Submitted 3 April, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds
Authors:
Shinji Ito,
Taira Tsuchiya,
Junya Honda
Abstract:
Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret. To this end, we formulate the problem of adjusting FTRL's learning rate as a sequential decision-making problem and introduce the framework of competitive analysis. We establish a lower bound for the competitive ratio…
▽ More
Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret. To this end, we formulate the problem of adjusting FTRL's learning rate as a sequential decision-making problem and introduce the framework of competitive analysis. We establish a lower bound for the competitive ratio and propose update rules for learning rate that achieves an upper bound within a constant factor of this lower bound. Specifically, we illustrate that the optimal competitive ratio is characterized by the (approximate) monotonicity of components of the penalty term, showing that a constant competitive ratio is achievable if the components of the penalty term form a monotonically non-increasing sequence, and derive a tight competitive ratio when penalty terms are $ξ$-approximately monotone non-increasing. Our proposed update rule, referred to as \textit{stability-penalty matching}, also facilitates constructing the Best-Of-Both-Worlds (BOBW) algorithms for stochastic and adversarial environments. In these environments our result contributes to achieve tighter regret bound and broaden the applicability of algorithms for various settings such as multi-armed bandits, graph bandits, linear bandits, and contextual bandits.
△ Less
Submitted 10 March, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Fast Rates in Online Convex Optimization by Exploiting the Curvature of Feasible Sets
Authors:
Taira Tsuchiya,
Shinji Ito
Abstract:
In this paper, we explore online convex optimization (OCO) and introduce a new analysis that provides fast rates by exploiting the curvature of feasible sets. In online linear optimization, it is known that if the average gradient of loss functions is larger than a certain value, the curvature of feasible sets can be exploited by the follow-the-leader (FTL) algorithm to achieve a logarithmic regre…
▽ More
In this paper, we explore online convex optimization (OCO) and introduce a new analysis that provides fast rates by exploiting the curvature of feasible sets. In online linear optimization, it is known that if the average gradient of loss functions is larger than a certain value, the curvature of feasible sets can be exploited by the follow-the-leader (FTL) algorithm to achieve a logarithmic regret. This paper reveals that algorithms adaptive to the curvature of loss functions can also leverage the curvature of feasible sets. We first prove that if an optimal decision is on the boundary of a feasible set and the gradient of an underlying loss function is non-zero, then the algorithm achieves a regret upper bound of $O(ρ\log T)$ in stochastic environments. Here, $ρ> 0$ is the radius of the smallest sphere that includes the optimal decision and encloses the feasible set. Our approach, unlike existing ones, can work directly with convex loss functions, exploiting the curvature of loss functions simultaneously, and can achieve the logarithmic regret only with a local property of feasible sets. Additionally, it achieves an $O(\sqrt{T})$ regret even in adversarial environments where FTL suffers an $Ω(T)$ regret, and attains an $O(ρ\log T + \sqrt{C ρ\log T})$ regret bound in corrupted stochastic environments with corruption level $C$. Furthermore, by extending our analysis, we establish a regret upper bound of $O\Big(T^{\frac{q-2}{2(q-1)}} (\log T)^{\frac{q}{2(q-1)}}\Big)$ for $q$-uniformly convex feasible sets, where uniformly convex sets include strongly convex sets and $\ell_p$-balls for $p \in [1,\infty)$. This bound bridges the gap between the $O(\log T)$ regret bound for strongly convex sets ($q=2$) and the $O(\sqrt{T})$ regret bound for non-curved sets ($q\to\infty$).
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring
Authors:
Taira Tsuchiya,
Shinji Ito,
Junya Honda
Abstract:
Partial monitoring is a generic framework of online decision-making problems with limited observations. To make decisions from such limited observations, it is necessary to find an appropriate distribution for exploration. Recently, a powerful approach for this purpose, exploration by optimization (ExO), was proposed, which achieves the optimal bounds in adversarial environments with follow-the-re…
▽ More
Partial monitoring is a generic framework of online decision-making problems with limited observations. To make decisions from such limited observations, it is necessary to find an appropriate distribution for exploration. Recently, a powerful approach for this purpose, exploration by optimization (ExO), was proposed, which achieves the optimal bounds in adversarial environments with follow-the-regularized-leader for a wide range of online decision-making problems. However, a naive application of ExO in stochastic environments significantly degrades regret bounds. To resolve this problem in locally observable games, we first establish a novel framework and analysis for ExO with a hybrid regularizer. This development allows us to significantly improve the existing regret bounds of best-of-both-worlds (BOBW) algorithms, which achieves nearly optimal bounds both in stochastic and adversarial environments. In particular, we derive a stochastic regret bound of $O(\sum_{a \neq a^*} k^2 m^2 \log T / Δ_a)$, where $k$, $m$, and $T$ are the numbers of actions, observations and rounds, $a^*$ is an optimal action, and $Δ_a$ is the suboptimality gap for action $a$. This bound is roughly $Θ(k^2 \log T)$ times smaller than existing BOBW bounds. In addition, for globally observable games, we provide a new BOBW algorithm with the first $O(\log T)$ stochastic bound.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Replicability is Asymptotically Free in Multi-armed Bandits
Authors:
Junpei Komiyama,
Shinji Ito,
Yuichi Yoshida,
Souta Koshino
Abstract:
This work is motivated by the growing demand for reproducible machine learning. We study the stochastic multi-armed bandit problem. In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset. We observe that existing algorithms require $O(1/ρ^2)$ times more regret than non…
▽ More
This work is motivated by the growing demand for reproducible machine learning. We study the stochastic multi-armed bandit problem. In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset. We observe that existing algorithms require $O(1/ρ^2)$ times more regret than nonreplicable algorithms, where $ρ$ is the level of nonreplication. However, we demonstrate that this additional cost is unnecessary when the time horizon $T$ is sufficiently large for a given $ρ$, provided that the magnitude of the confidence bounds is chosen carefully. We introduce an explore-then-commit algorithm that draws arms uniformly before committing to a single arm. Additionally, we examine a successive elimination algorithm that eliminates suboptimal arms at the end of each phase. To ensure the replicability of these algorithms, we incorporate randomness into their decision-making processes. We extend the use of successive elimination to the linear bandit problem as well. For the analysis of these algorithms, we propose a principled approach to limiting the probability of nonreplication. This approach elucidates the steps that existing research has implicitly followed. Furthermore, we derive the first lower bound for the two-armed replicable bandit problem, which implies the optimality of the proposed algorithms up to a $\log\log T$ factor for the two-armed case.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Best-of-Both-Worlds Linear Contextual Bandits
Authors:
Masahiro Kato,
Shinji Ito
Abstract:
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed context and then selects an arm based on the context and past observations. After selecting an arm, the decision-maker incurs a loss corresponding to the select…
▽ More
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed context and then selects an arm based on the context and past observations. After selecting an arm, the decision-maker incurs a loss corresponding to the selected arm. The decision-maker aims to minimize the cumulative loss over the trial. The goal of this study is to develop a strategy that is effective in both stochastic and adversarial environments, with theoretical guarantees. We first formulate the problem by introducing a novel setting of bandits with adversarial corruption, referred to as the contextual adversarial regime with a self-bounding constraint. We assume linear models for the relationship between the loss and the context. Then, we propose a strategy that extends the RealLinExp3 by Neu & Olkhovskaya (2020) and the Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is shown to be upper-bounded by $O\left(\min\left\{\frac{(\log(T))^3}{Δ_{*}} + \sqrt{\frac{C(\log(T))^3}{Δ_{*}}},\ \ \sqrt{T}(\log(T))^2\right\}\right)$, where $T \in\mathbb{N}$ is the number of rounds, $Δ_{*} > 0$ is the constant minimum gap between the best and suboptimal arms for any context, and $C\in[0, T] $ is an adversarial corruption parameter. This regret upper bound implies $O\left(\frac{(\log(T))^3}{Δ_{*}}\right)$ in a stochastic environment and by $O\left( \sqrt{T}(\log(T))^2\right)$ in an adversarial environment. We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both stochastic and adversarial regimes.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
New Classes of the Greedy-Applicable Arm Feature Distributions in the Sparse Linear Bandit Problem
Authors:
Koji Ichikawa,
Shinji Ito,
Daisuke Hatano,
Hanna Sumita,
Takuro Fukunaga,
Naonori Kakimura,
Ken-ichi Kawarabayashi
Abstract:
We consider the sparse contextual bandit problem where arm feature affects reward through the inner product of sparse parameters. Recent studies have developed sparsity-agnostic algorithms based on the greedy arm selection policy. However, the analysis of these algorithms requires strong assumptions on the arm feature distribution to ensure that the greedily selected samples are sufficiently diver…
▽ More
We consider the sparse contextual bandit problem where arm feature affects reward through the inner product of sparse parameters. Recent studies have developed sparsity-agnostic algorithms based on the greedy arm selection policy. However, the analysis of these algorithms requires strong assumptions on the arm feature distribution to ensure that the greedily selected samples are sufficiently diverse; One of the most common assumptions, relaxed symmetry, imposes approximate origin-symmetry on the distribution, which cannot allow distributions that has origin-asymmetric support. In this paper, we show that the greedy algorithm is applicable to a wider range of the arm feature distributions from two aspects. Firstly, we show that a mixture distribution that has a greedy-applicable component is also greedy-applicable. Second, we propose new distribution classes, related to Gaussian mixture, discrete, and radial distribution, for which the sample diversity is guaranteed. The proposed classes can describe distributions with origin-asymmetric support and, in conjunction with the first claim, provide theoretical guarantees of the greedy policy for a very wide range of the arm feature distributions.
△ Less
Submitted 28 March, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Can Physician Judgment Enhance Model Trustworthiness? A Case Study on Predicting Pathological Lymph Nodes in Rectal Cancer
Authors:
Kazuma Kobayashi,
Yasuyuki Takamizawa,
Mototaka Miyake,
Sono Ito,
Lin Gu,
Tatsuya Nakatsuka,
Yu Akagi,
Tatsuya Harada,
Yukihide Kanemitsu,
Ryuji Hamamoto
Abstract:
Explainability is key to enhancing artificial intelligence's trustworthiness in medicine. However, several issues remain concerning the actual benefit of explainable models for clinical decision-making. Firstly, there is a lack of consensus on an evaluation framework for quantitatively assessing the practical benefits that effective explainability should provide to practitioners. Secondly, physici…
▽ More
Explainability is key to enhancing artificial intelligence's trustworthiness in medicine. However, several issues remain concerning the actual benefit of explainable models for clinical decision-making. Firstly, there is a lack of consensus on an evaluation framework for quantitatively assessing the practical benefits that effective explainability should provide to practitioners. Secondly, physician-centered evaluations of explainability are limited. Thirdly, the utility of built-in attention mechanisms in transformer-based models as an explainability technique is unclear. We hypothesize that superior attention maps should align with the information that physicians focus on, potentially reducing prediction uncertainty and increasing model reliability. We employed a multimodal transformer to predict lymph node metastasis in rectal cancer using clinical data and magnetic resonance imaging, exploring how well attention maps, visualized through a state-of-the-art technique, can achieve agreement with physician understanding. We estimated the model's uncertainty using meta-level information like prediction probability variance and quantified agreement. Our assessment of whether this agreement reduces uncertainty found no significant effect. In conclusion, this case study did not confirm the anticipated benefit of attention maps in enhancing model reliability. Superficial explanations could do more harm than good by misleading physicians into relying on uncertain predictions, suggesting that the current state of attention mechanisms in explainability should not be overestimated. Identifying explainability mechanisms truly beneficial for clinical decision-making remains essential.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
OpenLB User Guide: Associated with Release 1.6 of the Code
Authors:
Adrian Kummerländer,
Samuel J. Avis,
Halim Kusumaatmaja,
Fedor Bukreev,
Michael Crocoll,
Davide Dapelo,
Simon Großmann,
Nicolas Hafen,
Shota Ito,
Julius Jeßberger,
Eliane Kummer,
Jan E. Marquardt,
Johanna Mödl,
Tim Pertzel,
František Prinz,
Florian Raichle,
Martin Sadric,
Maximilian Schecher,
Dennis Teutscher,
Stephan Simonis,
Mathias J. Krause
Abstract:
OpenLB is an object-oriented implementation of LBM. It is the first implementation of a generic platform for LBM programming, which is shared with the open source community (GPLv2). Since the first release in 2007, the code has been continuously improved and extended which is documented by thirteen releases as well as the corresponding release notes which are available on the OpenLB website (https…
▽ More
OpenLB is an object-oriented implementation of LBM. It is the first implementation of a generic platform for LBM programming, which is shared with the open source community (GPLv2). Since the first release in 2007, the code has been continuously improved and extended which is documented by thirteen releases as well as the corresponding release notes which are available on the OpenLB website (https://www.openlb.net). The OpenLB code is written in C++ and is used by application programmers as well as developers, with the ability to implement custom models OpenLB supports complex data structures that allow simulations in complex geometries and parallel execution using MPI, OpenMP and CUDA on high-performance computers. The source code uses the concepts of interfaces and templates, so that efficient, direct and intuitive implementations of the LBM become possible. The efficiency and scalability has been checked and proved by code reviews. This user manual and a source code documentation by DoxyGen are available on the OpenLB project website.
△ Less
Submitted 7 August, 2024; v1 submitted 17 May, 2023;
originally announced July 2023.
-
Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds
Authors:
Taira Tsuchiya,
Shinji Ito,
Junya Honda
Abstract:
Adaptivity to the difficulties of a problem is a key property in sequential decision-making problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems. Aiming to further generalize this adaptivity, we develop a generic adaptive learning rate, call…
▽ More
Adaptivity to the difficulties of a problem is a key property in sequential decision-making problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems. Aiming to further generalize this adaptivity, we develop a generic adaptive learning rate, called stability-penalty-adaptive (SPA) learning rate for FTRL. This learning rate yields a regret bound jointly depending on stability and penalty of the algorithm, into which the regret of FTRL is typically decomposed. With this result, we establish several algorithms with three types of adaptivity: sparsity, game-dependency, and best-of-both-worlds (BOBW). Despite the fact that sparsity appears frequently in real problems, existing sparse multi-armed bandit algorithms with $k$-arms assume that the sparsity level $s \leq k$ is known in advance, which is often not the case in real-world scenarios. To address this issue, we first establish $s$-agnostic algorithms with regret bounds of $\tilde{O}(\sqrt{sT})$ in the adversarial regime for $T$ rounds, which matches the existing lower bound up to a logarithmic factor. Meanwhile, BOBW algorithms aim to achieve a near-optimal regret in both the stochastic and adversarial regimes. Leveraging the SPA learning rate and the technique for $s$-agnostic algorithms combined with a new analysis to bound the variation in FTRL output in response to changes in a regularizer, we establish the first BOBW algorithm with a sparsity-dependent bound. Additionally, we explore partial monitoring and demonstrate that the proposed SPA learning rate framework allows us to achieve a game-dependent bound and the BOBW simultaneously.
△ Less
Submitted 13 February, 2024; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Best-of-Three-Worlds Linear Bandit Algorithm with Variance-Adaptive Regret Bounds
Authors:
Shinji Ito,
Kei Takemura
Abstract:
This paper proposes a linear bandit algorithm that is adaptive to environments at two different levels of hierarchy. At the higher level, the proposed algorithm adapts to a variety of types of environments. More precisely, it achieves best-of-three-worlds regret bounds, i.e., of ${O}(\sqrt{T \log T})$ for adversarial environments and of…
▽ More
This paper proposes a linear bandit algorithm that is adaptive to environments at two different levels of hierarchy. At the higher level, the proposed algorithm adapts to a variety of types of environments. More precisely, it achieves best-of-three-worlds regret bounds, i.e., of ${O}(\sqrt{T \log T})$ for adversarial environments and of $O(\frac{\log T}{Δ_{\min}} + \sqrt{\frac{C \log T}{Δ_{\min}}})$ for stochastic environments with adversarial corruptions, where $T$, $Δ_{\min}$, and $C$ denote, respectively, the time horizon, the minimum sub-optimality gap, and the total amount of the corruption. Note that polynomial factors in the dimensionality are omitted here. At the lower level, in each of the adversarial and stochastic regimes, the proposed algorithm adapts to certain environmental characteristics, thereby performing better. The proposed algorithm has data-dependent regret bounds that depend on all of the cumulative loss for the optimal action, the total quadratic variation, and the path-length of the loss vector sequence. In addition, for stochastic environments, the proposed algorithm has a variance-adaptive regret bound of $O(\frac{σ^2 \log T}{Δ_{\min}})$ as well, where $σ^2$ denotes the maximum variance of the feedback loss. The proposed algorithm is based on the SCRiBLe algorithm. By incorporating into this a new technique we call scaled-up sampling, we obtain high-level adaptability, and by incorporating the technique of optimistic online learning, we obtain low-level adaptability.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Best-of-Both-Worlds Algorithms for Partial Monitoring
Authors:
Taira Tsuchiya,
Shinji Ito,
Junya Honda
Abstract:
This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 \log(T) \log(k_Π T) / Δ_{\min})$ in the stochastic regime and…
▽ More
This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 \log(T) \log(k_Π T) / Δ_{\min})$ in the stochastic regime and $O(m k^{2/3} \sqrt{T \log(T) \log k_Π})$ in the adversarial regime, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $Δ_{\min}$ is the minimum suboptimality gap, and $k_Π$ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is $O(c_{\mathcal{G}}^2 \log(T) \log(k_Π T) / Δ_{\min}^2)$ in the stochastic regime and $O((c_{\mathcal{G}}^2 \log(T) \log(k_Π T))^{1/3} T^{2/3})$ in the adversarial regime, where $c_{\mathcal{G}}$ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.
△ Less
Submitted 9 October, 2022; v1 submitted 29 July, 2022;
originally announced July 2022.
-
Information geometry of excess and housekeeping entropy production
Authors:
Artemy Kolchinsky,
Andreas Dechant,
Kohei Yoshimura,
Sosuke Ito
Abstract:
A nonequilibrium system is characterized by a set of thermodynamic forces and fluxes which give rise to entropy production (EP). We show that these forces and fluxes have an information-geometric structure, which allows us to decompose EP into contributions from different types of forces in general (linear and nonlinear) discrete systems. We focus on the excess and housekeeping decomposition, whic…
▽ More
A nonequilibrium system is characterized by a set of thermodynamic forces and fluxes which give rise to entropy production (EP). We show that these forces and fluxes have an information-geometric structure, which allows us to decompose EP into contributions from different types of forces in general (linear and nonlinear) discrete systems. We focus on the excess and housekeeping decomposition, which separates contributions from conservative and nonconservative forces. Unlike the Hatano-Sasa decomposition, our housekeeping/excess terms are always well-defined, including in systems with odd variables and nonlinear systems without steady states. Our decomposition leads to far-from-equilibrium thermodynamic uncertainty relations and speed limits. As an illustration, we derive a thermodynamic bound on the time necessary for one cycle in a chemical oscillator.
△ Less
Submitted 15 December, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds
Authors:
Shinji Ito,
Taira Tsuchiya,
Junya Honda
Abstract:
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: Δ_i>0} \frac{\log T}{Δ_i})$ for suboptimality gap $Δ_i$ of arm $i$ and time horizon $T$. As Audibert e…
▽ More
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: Δ_i>0} \frac{\log T}{Δ_i})$ for suboptimality gap $Δ_i$ of arm $i$ and time horizon $T$. As Audibert et al. [2007] have shown, however, that the performance can be improved in stochastic environments with low-variance arms. In fact, they have provided a stochastic MAB algorithm with gap-variance-dependent regret bounds of $O(\sum_{i: Δ_i>0} (\frac{σ_i^2}{Δ_i} + 1) \log T )$ for loss variance $σ_i^2$ of arm $i$. In this paper, we propose the first BOBW algorithm with gap-variance-dependent bounds, showing that the variance information can be used even in the possibly adversarial environment. Further, the leading constant factor in our gap-variance dependent bound is only (almost) twice the value for the lower bound. Additionally, the proposed algorithm enjoys multiple data-dependent regret bounds in adversarial settings and works well in stochastic settings with adversarial corruptions. The proposed algorithm is based on the follow-the-regularized-leader method and employs adaptive learning rates that depend on the empirical prediction error of the loss, which leads to gap-variance-dependent regret bounds reflecting the variance of the arms.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs
Authors:
Shinji Ito,
Taira Tsuchiya,
Junya Honda
Abstract:
This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable gra…
▽ More
This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable graphs yield minimax regret of $\tildeΘ( α^{1/2} T^{1/2} )$, while weakly observable graphs induce minimax regret of $\tildeΘ( δ^{1/3} T^{2/3} )$, where $α$ and $δ$, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph. Our proposed algorithm for strongly observable graphs has a regret bound of $\tilde{O}( α^{1/2} T^{1/2} ) $ for adversarial environments, as well as of $ {O} ( \frac{α(\ln T)^3 }{Δ_{\min}} ) $ for stochastic environments, where $Δ_{\min}$ expresses the minimum suboptimality gap. This result resolves an open question raised by Erez and Koren [2021]. We also provide an algorithm for weakly observable graphs that achieves a regret bound of $\tilde{O}( δ^{1/3}T^{2/3} )$ for adversarial environments and poly-logarithmic regret for stochastic environments. The proposed algorithms are based on the follow-the-regularized-leader approach combined with newly designed update rules for learning rates.
△ Less
Submitted 26 December, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
Hierarchical Path-planning from Speech Instructions with Spatial Concept-based Topometric Semantic Mapping
Authors:
Akira Taniguchi,
Shuya Ito,
Tadahiro Taniguchi
Abstract:
Assisting individuals in their daily activities through autonomous mobile robots, especially for users without specialized knowledge, is crucial. Specifically, the capability of robots to navigate to destinations based on human speech instructions is essential. While robots can take different paths to the same goal, the shortest path is not always the best. A preferred approach is to accommodate w…
▽ More
Assisting individuals in their daily activities through autonomous mobile robots, especially for users without specialized knowledge, is crucial. Specifically, the capability of robots to navigate to destinations based on human speech instructions is essential. While robots can take different paths to the same goal, the shortest path is not always the best. A preferred approach is to accommodate waypoint specifications flexibly, planning an improved alternative path, even with detours. Additionally, robots require real-time inference capabilities. This study aimed to realize a hierarchical spatial representation using a topometric semantic map and path planning with speech instructions, including waypoints. This paper presents Spatial Concept-based Topometric Semantic Mapping for Hierarchical Path Planning (SpCoTMHP), integrating place connectivity. This approach offers a novel integrated probabilistic generative model and fast approximate inference across hierarchy levels. A formulation based on control as probabilistic inference theoretically supports the proposed path planning algorithm. We conducted experiments in home environments using the Toyota Human Support Robot on the SIGVerse simulator and in a lab-office environment with the real robot, Albert. Users issued speech commands specifying the waypoint and goal, such as "Go to the bedroom via the corridor." Navigation experiments using speech instructions with a waypoint demonstrated a performance improvement of SpCoTMHP over the baseline hierarchical path planning method with heuristic path costs (HPP-I), in terms of the weighted success rate at which the robot reaches the closest target and passes the correct waypoints, by 0.590. The computation time was significantly accelerated by 7.14 seconds with SpCoTMHP compared to baseline HPP-I in advanced tasks.
△ Less
Submitted 20 June, 2024; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Online Task Assignment Problems with Reusable Resources
Authors:
Hanna Sumita,
Shinji Ito,
Kei Takemura,
Daisuke Hatano,
Takuro Fukunaga,
Naonori Kakimura,
Ken-ichi Kawarabayashi
Abstract:
We study online task assignment problem with reusable resources, motivated by practical applications such as ridesharing, crowdsourcing and job hiring. In the problem, we are given a set of offline vertices (agents), and, at each time, an online vertex (task) arrives randomly according to a known time-dependent distribution. Upon arrival, we assign the task to agents immediately and irrevocably. T…
▽ More
We study online task assignment problem with reusable resources, motivated by practical applications such as ridesharing, crowdsourcing and job hiring. In the problem, we are given a set of offline vertices (agents), and, at each time, an online vertex (task) arrives randomly according to a known time-dependent distribution. Upon arrival, we assign the task to agents immediately and irrevocably. The goal of the problem is to maximize the expected total profit produced by completed tasks. The key features of our problem are (1) an agent is reusable, i.e., an agent comes back to the market after completing the assigned task, (2) an agent may reject the assigned task to stay the market, and (3) a task may accommodate multiple agents. The setting generalizes that of existing work in which an online task is assigned to one agent under (1).
In this paper, we propose an online algorithm that is $1/2$-competitive for the above setting, which is tight. Moreover, when each agent can reject assigned tasks at most $Δ$ times, the algorithm is shown to have the competitive ratio $Δ/(3Δ-1)\geq 1/3$. We also evaluate our proposed algorithm with numerical experiments.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Machine-learning-enhanced quantum sensors for accurate magnetic field imaging
Authors:
Moeta Tsukamoto,
Shuji Ito,
Kensuke Ogawa,
Yuto Ashida,
Kento Sasaki,
Kensuke Kobayashi
Abstract:
Local detection of magnetic fields is crucial for characterizing nano- and micro-materials and has been implemented using various scanning techniques or even diamond quantum sensors. Diamond nanoparticles (nanodiamonds) offer an attractive opportunity to chieve high spatial resolution because they can easily be close to the target within a few 10 nm simply by attaching them to its surface. A physi…
▽ More
Local detection of magnetic fields is crucial for characterizing nano- and micro-materials and has been implemented using various scanning techniques or even diamond quantum sensors. Diamond nanoparticles (nanodiamonds) offer an attractive opportunity to chieve high spatial resolution because they can easily be close to the target within a few 10 nm simply by attaching them to its surface. A physical model for such a randomly oriented nanodiamond ensemble (NDE) is available, but the complexity of actual experimental conditions still limits the accuracy of deducing magnetic fields. Here, we demonstrate magnetic field imaging with high accuracy of 1.8 $μ$T combining NDE and machine learning without any physical models. We also discover the field direction dependence of the NDE signal, suggesting the potential application for vector magnetometry and improvement of the existing model. Our method further enriches the performance of NDE to achieve the accuracy to visualize mesoscopic current and magnetism in atomic-layer materials and to expand the applicability in arbitrarily shaped materials, including living organisms. This achievement will bridge machine learning and quantum sensing for accurate measurements.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Toward a Minecraft Mod for Early Detection of Alzheimer's Disease in Young Adults
Authors:
Satoko Ito,
Ruck Thawonmas,
Pujana Paliyawan
Abstract:
This paper proposes a Minecraft-based system for early detection of Alzheimer's disease in young adults. Early detection, where spatial navigation is a crucial key, is regarded as an important way to prevent the disease. The proposed system is compared with a recent existing and thoroughly studied system using a game called Sea Hero Quest (SHQ), by analyzing spatial navigational patterns of player…
▽ More
This paper proposes a Minecraft-based system for early detection of Alzheimer's disease in young adults. Early detection, where spatial navigation is a crucial key, is regarded as an important way to prevent the disease. The proposed system is compared with a recent existing and thoroughly studied system using a game called Sea Hero Quest (SHQ), by analyzing spatial navigational patterns of players. Our preliminary results show that spatial navigational patterns in both systems are highly correlated, indicating that the proposed system is likely as effective as the SHQ system for the detection task.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
On Optimal Robustness to Adversarial Corruption in Online Decision Problems
Authors:
Shinji Ito
Abstract:
This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be e…
▽ More
This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. More precisely, we show that two classes of algorithms, anytime Hedge with decreasing learning rate and algorithms with second-order regret bounds, achieve $O( \frac{\log N}Δ + \sqrt{ \frac{C \log N }Δ } )$-regret, where $N, Δ$, and $C$ represent the number of experts, the gap parameter, and the corruption level, respectively. We further provide a matching lower bound, which means that this regret bound is tight up to a constant factor. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
Authors:
Kei Takemura,
Shinji Ito,
Daisuke Hatano,
Hanna Sumita,
Takuro Fukunaga,
Naonori Kakimura,
Ken-ichi Kawarabayashi
Abstract:
The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds $T$. However, there is a gap of…
▽ More
The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds $T$. However, there is a gap of $\tilde{O}(\max(\sqrt{d}, \sqrt{k}))$ between the current best upper and lower bounds, where $d$ is the dimension of the feature vectors, $k$ is the number of the chosen arms in a round, and $\tilde{O}(\cdot)$ ignores the logarithmic factors. The dependence of $k$ and $d$ is of practical importance because $k$ may be larger than $T$ in real-world applications such as recommender systems. In this paper, we fill the gap by improving the upper and lower bounds. More precisely, we show that the C${}^2$UCB algorithm proposed by Qin, Chen, and Zhu (2014) has the optimal regret bound $\tilde{O}(d\sqrt{kT} + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C${}^2$UCB algorithm and demonstrate that it enjoys the optimal regret bound for a more general problem that can take into account other objectives simultaneously. We also show that our technique would be applicable to related problems. Numerical experiments support our theoretical results and considerations.
△ Less
Submitted 27 February, 2021; v1 submitted 19 January, 2021;
originally announced January 2021.
-
An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits
Authors:
Kei Takemura,
Shinji Ito
Abstract:
Combinatorial linear semi-bandits (CLS) are widely applicable frameworks of sequential decision-making, in which a learner chooses a subset of arms from a given set of arms associated with feature vectors. Existing algorithms work poorly for the clustered case, in which the feature vectors form several large clusters. This shortcoming is critical in practice because it can be found in many applica…
▽ More
Combinatorial linear semi-bandits (CLS) are widely applicable frameworks of sequential decision-making, in which a learner chooses a subset of arms from a given set of arms associated with feature vectors. Existing algorithms work poorly for the clustered case, in which the feature vectors form several large clusters. This shortcoming is critical in practice because it can be found in many applications, including recommender systems. In this paper, we clarify why such a shortcoming occurs, and we introduce a key technique of arm-wise randomization to overcome it. We propose two algorithms with this technique: the perturbed C${}^2$UCB (PC${}^2$UCB) and the Thompson sampling (TS). Our empirical evaluation with artificial and real-world datasets demonstrates that the proposed algorithms with the arm-wise randomization technique outperform the existing algorithms without this technique, especially for the clustered case. Our contributions also include theoretical analyses that provide high probability asymptotic regret bounds for our algorithms.
△ Less
Submitted 10 September, 2019; v1 submitted 5 September, 2019;
originally announced September 2019.
-
Causal Bandits with Propagating Inference
Authors:
Akihiro Yabe,
Daisuke Hatano,
Hanna Sumita,
Shinji Ito,
Naonori Kakimura,
Takuro Fukunaga,
Ken-ichi Kawarabayashi
Abstract:
Bandit is a framework for designing sequential experiments. In each experiment, a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$. This makes bandit incapable of handling an exponentially large number of arms, hence the band…
▽ More
Bandit is a framework for designing sequential experiments. In each experiment, a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$. This makes bandit incapable of handling an exponentially large number of arms, hence the bandit problem with side-information is often considered to overcome this lower bound. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information. A causal graph is a fundamental model that is frequently used with a variety of real problems. In this setting, the arms are identified with interventions on a given causal graph, and the effect of an intervention propagates throughout all over the causal graph. The task is to find the best intervention that maximizes the expected value on a target node. Existing algorithms for causal bandit overcame the $Ω(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors).
We propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ γ^*\log(|\mathcal{A}|T) / T})$ regret bound, where $γ^*$ is determined by using a causal graph structure. In particular, if the in-degree of the causal graph is bounded, then $γ^* = O(N^2)$, where $N$ is the number $N$ of nodes.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Semantical Equivalence of the Control Flow Graph and the Program Dependence Graph
Authors:
Sohei Ito
Abstract:
The program dependence graph (PDG) represents data and control dependence between statements in a program. This paper presents an operational semantics of program dependence graphs. Since PDGs exclude artificial order of statements that resides in sequential programs, executions of PDGs are not unique. However, we identified a class of PDGs that have unique final states of executions, called deter…
▽ More
The program dependence graph (PDG) represents data and control dependence between statements in a program. This paper presents an operational semantics of program dependence graphs. Since PDGs exclude artificial order of statements that resides in sequential programs, executions of PDGs are not unique. However, we identified a class of PDGs that have unique final states of executions, called deterministic PDGs. We prove that the operational semantics of control flow graphs is equivalent to that of deterministic PDGs. The class of deterministic PDGs properly include PDGs obtained from well-structured programs. Thus, our operational semantics of PDGs is more general than that of PDGs for well-structured programs, which are already established in literature.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
BiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with Fully Convolutional Networks
Authors:
Viet-Quoc Pham,
Satoshi Ito,
Tatsuo Kozakaya
Abstract:
We present a simple and effective framework for simultaneous semantic segmentation and instance segmentation with Fully Convolutional Networks (FCNs). The method, called BiSeg, predicts instance segmentation as a posterior in Bayesian inference, where semantic segmentation is used as a prior. We extend the idea of position-sensitive score maps used in recent methods to a fusion of multiple score m…
▽ More
We present a simple and effective framework for simultaneous semantic segmentation and instance segmentation with Fully Convolutional Networks (FCNs). The method, called BiSeg, predicts instance segmentation as a posterior in Bayesian inference, where semantic segmentation is used as a prior. We extend the idea of position-sensitive score maps used in recent methods to a fusion of multiple score maps at different scales and partition modes, and adopt it as a robust likelihood for instance segmentation inference. As both Bayesian inference and map fusion are performed per pixel, BiSeg is a fully convolutional end-to-end solution that inherits all the advantages of FCNs. We demonstrate state-of-the-art instance segmentation accuracy on PASCAL VOC.
△ Less
Submitted 17 July, 2017; v1 submitted 7 June, 2017;
originally announced June 2017.
-
Optimization Beyond Prediction: Prescriptive Price Optimization
Authors:
Shinji Ito,
Ryohei Fujimaki
Abstract:
This paper addresses a novel data science problem, prescriptive price optimization, which derives the optimal price strategy to maximize future profit/revenue on the basis of massive predictive formulas produced by machine learning. The prescriptive price optimization first builds sales forecast formulas of multiple products, on the basis of historical data, which reveal complex relationships betw…
▽ More
This paper addresses a novel data science problem, prescriptive price optimization, which derives the optimal price strategy to maximize future profit/revenue on the basis of massive predictive formulas produced by machine learning. The prescriptive price optimization first builds sales forecast formulas of multiple products, on the basis of historical data, which reveal complex relationships between sales and prices, such as price elasticity of demand and cannibalization. Then, it constructs a mathematical optimization problem on the basis of those predictive formulas. We present that the optimization problem can be formulated as an instance of binary quadratic programming (BQP). Although BQP problems are NP-hard in general and computationally intractable, we propose a fast approximation algorithm using a semi-definite programming (SDP) relaxation, which is closely related to the Goemans-Williamson's Max-Cut approximation. Our experiments on simulation and real retail datasets show that our prescriptive price optimization simultaneously derives the optimal prices of tens/hundreds products with practical computational time, that potentially improve 8.2% of gross profit of those products.
△ Less
Submitted 24 May, 2016; v1 submitted 17 May, 2016;
originally announced May 2016.