A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

Priyank Agrawal [email protected] Theja Tulabandhula [email protected] Vashist Avadhanula [email protected] 500 W 120th St, New York, NY 10027 University Hall, 601 S Morgan St, Chicago, IL 60607 1100 Enterprise Way, Sunnyvale, CA 94089
Abstract

In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the decision maker’s problem of dynamically learning the model parameters while optimizing cumulative revenue over the selling horizon T𝑇Titalic_T. Though this problem has recently attracted considerable attention, many existing methods often involve solving an intractable non-convex optimization problem. Their theoretical performance guarantees depend on a problem-dependent parameter which could be prohibitively large. In particular, current algorithms for this problem have regret bounded by O(κdT)𝑂𝜅𝑑𝑇O(\sqrt{\kappa dT})italic_O ( square-root start_ARG italic_κ italic_d italic_T end_ARG ), where κ𝜅\kappaitalic_κ is a problem-dependent constant that may have an exponential dependency on the number of attributes, d𝑑ditalic_d. In this paper, we propose an optimistic algorithm and show that the regret is bounded by O(dT+κ)𝑂𝑑𝑇𝜅O(\sqrt{dT}+\kappa)italic_O ( square-root start_ARG italic_d italic_T end_ARG + italic_κ ), significantly improving the performance over existing methods. Further, we propose a convex relaxation of the optimization step, which allows for tractable decision-making while retaining the favorable regret guarantee. We also demonstrate that our algorithm has robust performance for varying κ𝜅\kappaitalic_κ values through numerical experiments.

keywords:
Revenue management , OR in marketing , Multi-armed bandit , Multinomial Logit model , Sequential decision-making
journal: European Journal of Operational Research

1 Introduction

Assortment optimization problems arise in many industries, and prominent examples include retailing and online advertising (check Alfandari et al. [2021], Timonina-Farkas et al. [2020], Wang et al. [2020] and see Kök & Fisher [2007] for a detailed review). The problem faced by a decision-maker is that of selecting a subset (assortment) of items to offer from a universe of substitutable items111If all consumers have identical preferences towards same characteristics of an item, then that item is termed as substitutable such that the expected revenue is maximized. In many e-commerce applications, the data on consumer choices tends to be either limited or non-existent (similar to the cold start problem in recommendation systems). Consumer preferences must be learned by experimenting with various assortments and observing consumer choices, but this experimentation with various assortments must be balanced to maximize cumulative revenue. Furthermore, in many settings, the retailer has to consider a very large number of products that are similar (examples range from apparel to consumer electronics). The commonality in their features can be expressed with the aid of auxiliary variables which summarize product attributes. This enables a significant reduction in dimensionality but introduces additional challenges in designing policies that have to dynamically balance demand learning (exploration) while simultaneously maximizing cumulative revenues (exploitation).

Motivated by these issues, we consider the dynamic assortment optimization problem. In every round, the retailer offers a subset (assortment) of products to a consumer and observes the consumer response. Consumers purchase (at most one product from each assortment) products that maximize their utility, and the retailer enjoys revenue from the successful purchase. We assume that the products are described by a set of attributes and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the retailer’s problem of dynamically learning the model parameters while optimizing cumulative revenues over the selling horizon T𝑇Titalic_T. Specifically, we have a universe of N𝑁Nitalic_N substitutable items, and each item i𝑖iitalic_i is associated with an attribute vector xid,subscript𝑥𝑖superscript𝑑x_{i}\in\mathbb{R}^{d},italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , which is known a priori. The mean utility for the consumer for the product i𝑖iitalic_i is given by the inner product θxi,𝜃subscript𝑥𝑖\theta\cdot x_{i},italic_θ ⋅ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , where θd𝜃superscript𝑑\theta\in\mathbb{R}^{d}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is some fixed but initially unknown parameter vector. Each of the d𝑑ditalic_d coordinates of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for product i𝑖iitalic_i represent a variety of characteristics such as cost, popularity, brand, etc. Given the substitutable good assumption, the preference of all consumers towards these characteristics are identical and denoted by the same parameter222This assumption may appear quite restrictive at first. But, as described in the following paragraph and Section 2.2, the model is rich enough to model non-identical consumer behavior as well. θd𝜃superscript𝑑\theta\in\mathbb{R}^{d}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Further, any two products i𝑖iitalic_i and j𝑗jitalic_j could vary in terms of these characteristics and hence are associated with different vectors xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xjdsubscript𝑥𝑗superscript𝑑x_{j}\in\mathbb{R}^{d}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT respectively. Our goal is to offer assortments 𝒬1,,𝒬Tsubscript𝒬1subscript𝒬𝑇\mathcal{Q}_{1},\cdots,\mathcal{Q}_{T}caligraphic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , caligraphic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT at times 1,,T1𝑇1,\cdots,T1 , ⋯ , italic_T from a feasible collection of assortments such that the cumulative expected revenue of the retailer over the said horizon is maximized. In general, the feasible set of assortments can reflect the constraints of retailers and online platforms (such as cardinality, inventory availability and other related constraints).

For an intuitive understanding of the choice model, consider an example of an online furniture retailer that offers N𝑁Nitalic_N distinct products where the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT product has an attribute vector xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (in general, this attribute can vary over time, representing varying consumers’ choices, and is more appropriately represented by xt,isubscript𝑥𝑡𝑖x_{t,i}italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT). Suppose consumers query for a specific product category, say tables. In this example, the θ𝜃\thetaitalic_θ parameter will be a distinct vector corresponding to the product category: table. As discussed before, th true θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT that determines consumer choice behavior is unknown. With each interaction with the consumer, the online retailer is learning which of the N𝑁Nitalic_N products offers the most utility (captured by θxi𝜃subscript𝑥𝑖\theta\cdot x_{i}italic_θ ⋅ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for each product i𝑖iitalic_i) to the consumer by observing the past purchase decisions of the consumers. The online furniture retailer is constrained to offer at most K𝐾Kitalic_K of N𝑁Nitalic_N products in each interaction with the consumer. Such a constraint may be encountered in practical situations: limitation of the online consumer interface to display large number of products; consumer preferring to examine only a subset of products at a time etc. Out of N𝑁Nitalic_N furniture items, some particular table j𝑗jitalic_j could have high utility, θxj𝜃subscript𝑥𝑗\theta\cdot x_{j}italic_θ ⋅ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, whereas, for a some product k𝑘kitalic_k (say, a table with unpopular color, bad design or inferior material etc.) θxk𝜃subscript𝑥𝑘\theta\cdot x_{k}italic_θ ⋅ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT could be low. The consumer may purchase one or none of the K𝐾Kitalic_K presented products. Later in Section 2.2, we demonstrate that when the consumer’s propensity to purchase a specific product is driven by its utility, the retailer’s expected revenue at each round is given by a softmax function.

The rest of this section is organized as follows: We first describe the related literature and qualitative significance of the parameter κ𝜅\kappaitalic_κ. Then, we highlight our contributions and end the section by contrasting them with recent notable research works.

1.1 Related literature

The MNL model is a widely used choice model for capturing consumer purchase behavior in assortment selection models (see Flores et al. [2019] and Avadhanula [2019]). Recently, large-scale field experiments at Alibaba [Feldman et al., 2018] have demonstrated the efficacy of the MNL model in boosting revenues. Rusmevichientong et al. [2010] and Sauré & Zeevi [2013] were a couple of early works that studied explore-then-commit strategies for the dynamic assortment selection problem under the MNL model when there are no contexts/product features. The works of Agrawal et al. [2019] and Agrawal et al. [2017] revisited this problem and presented adaptive online learning algorithms based on the Upper Confidence Bounds(UCB) and Thompson Sampling (TS) ideas. These approaches, unlike earlier ideas, did not require prior information about the problem parameters and had near-optimal regret bounds. Following these developments, the contextual variant of the problem has received considerable attention. Cheung & Simchi-Levi [2017] and Oh & Iyengar [2019] propose TS-based approaches and establish Bayesian regret bounds on their performance333Our results give worst-case regret bound which is strictly stronger than Bayesian regret bound. Worst-case regret bounds directly imply Bayesian regret bounds with same order dependence.. Chen et al. [2020] present a UCB-based algorithm and establish min-max regret bounds. However, these contextual MNL algorithms and their performance bounds depend on a problem parameter κ𝜅\kappaitalic_κ that can be prohibitively large, even for simple real-life examples. See Figure 1 for an illustration and Section 1.2 for a detailed discussion.

Refer to caption
Figure 1: Illustration of the impact of the κ𝜅\kappaitalic_κ parameter (logistic case, multinomial logit case closely follows): A representative plot of the derivative of the reward function. The x-axis represents the linear function xθsuperscript𝑥top𝜃x^{\top}\thetaitalic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ and the y-axis is proportional to 1/κ1𝜅1/\kappa1 / italic_κ. Parameter κ𝜅\kappaitalic_κ is small only in the narrow region around 00 and grows arbitrarily large depending on the problem instance (captured by xθsuperscript𝑥top𝜃x^{\top}\thetaitalic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ values).

We note that Ou et al. [2018] also consider a similar problem of developing an online algorithm for the MNL model with linear utility parameters. Though they establish a regret bound that does not depend on the aforementioned parameter κ𝜅\kappaitalic_κ, they work with an inaccurate version of the MNL model. More specifically, in the MNL model, the probability of a consumer preferring an item is proportional to the exponential of the utility parameter and is not linear in the utility parameter as assumed in Ou et al. [2018].

The multi-armed bandit problem, which underlies these dynamic decision making settings, has been well studied in the literature (see Xu et al. [2021], Grant & Szechtman [2021]). Our problem is closely related to the parametric bandit problem, where a common unknown parameter connects the rewards of each arm. In particular, for linear bandits, each arm aA𝑎𝐴a\in Aitalic_a ∈ italic_A (consider A𝐴Aitalic_A to be the set of all arms) is associated with a d𝑑ditalic_d-dimensional vector xadsubscript𝑥𝑎superscript𝑑x_{a}\in\mathbb{R}^{d}italic_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT known a priori. And the expected reward upon selecting arm aA𝑎𝐴a\in Aitalic_a ∈ italic_A is given by the inner product θxa𝜃subscript𝑥𝑎\theta\cdot x_{a}italic_θ ⋅ italic_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, for some unknown parameter vector θ𝜃\thetaitalic_θ (see Dani et al. [2008], Rusmevichientong & Tsitsiklis [2010], Abbasi-Yadkori et al. [2011]). The key difference is that the rewards (i.e., the revenue of the retailer) corresponding to an assortment under the MNL cannot be modeled in the framework of linear payoffs. Closer to our formulation is the literature on generalized linear bandits (see Filippi et al. [2010] and Faury et al. [2020]), where the expected payoff upon selecting arm a𝑎aitalic_a is given by f(θxa)𝑓𝜃subscript𝑥𝑎f(\theta\cdot x_{a})italic_f ( italic_θ ⋅ italic_x start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ), where f𝑓fitalic_f is a real-valued, non-linear function. However, unlike our setting, where an arm could be a collection of K𝐾Kitalic_K products (thus involving K𝐾Kitalic_K d𝑑ditalic_d-dimensional vectors), f(.)f(.)italic_f ( . ) is a single variable function in these prior works.

1.2 On the parameter κ𝜅\kappaitalic_κ

As discussed earlier, the retailer’s revenue (reward function) is the softmax function. Intuitively, the curvature of the reward function influences how easy (or difficult) it is to learn the true choice parameter θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. In Section 2, we explicitly define κ𝜅\kappaitalic_κ as inversely proportional to the lower bound on of the derivative of the reward function in the entire decision region. Existence of a global lower bound on the curvature of the reward function is a necessary assumption for the maximum likelihood estimation of θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT.

In previous works on generalized linear bandits and variants [Filippi et al., 2010, Li et al., 2017, Oh & Iyengar, 2019], the quantity κ𝜅\kappaitalic_κ features in regret guarantees as a multiplicative factor of the primary term (i.e., as O~(κT)~O𝜅𝑇\tilde{\mathrm{O}}(\kappa\sqrt{T})over~ start_ARG roman_O end_ARG ( italic_κ square-root start_ARG italic_T end_ARG )), and this is because they ignore the local effect of the curvature, and use global properties (via κ𝜅\kappaitalic_κ) leading to loose worst-case bounds. For a cleaner exposition of this issue, lets take K=1𝐾1K=1italic_K = 1, i.e., the rewards are given by a sigmoid function of θx𝜃𝑥\theta\cdot xitalic_θ ⋅ italic_x. The derivative of sigmoid is “bell”-shaped (see Figure 1). When θx𝜃𝑥\theta\cdot xitalic_θ ⋅ italic_x is very high (i.e., the assortment contains products with high utilities) or when θx𝜃𝑥\theta\cdot xitalic_θ ⋅ italic_x is very low (i.e., the assortment contains products with low utility), the value of κ𝜅\kappaitalic_κ will be large. From Assumption 2, for K=1𝐾1K=1italic_K = 1, κ𝜅\kappaitalic_κ is equivalent to max1a(1a)1𝑎1𝑎\max\frac{1}{a(1-a)}roman_max divide start_ARG 1 end_ARG start_ARG italic_a ( 1 - italic_a ) end_ARG, for some a(0,1)𝑎01a\in(0,1)italic_a ∈ ( 0 , 1 ). Thus, when a𝑎aitalic_a is close to 1111 oder 00, the value of κ𝜅\kappaitalic_κ will be large. The exponential dependence for K=1𝐾1K=1italic_K = 1 case follows when we replace a𝑎aitalic_a with a sigmoid function. In the context of our problem, this translates to an exponential dependence of the per-round regret on the magnitude of utilities (i.e., θx𝜃𝑥\theta\cdot xitalic_θ ⋅ italic_x).

1.3 Contributions

In this paper, we build on recent developments for generalized linear bandits (Faury et al. [2020]) to propose a new optimistic algorithm, CB-MNL for the problem of contextual multinomial logit bandits. CB-MNL follows the standard template of optimistic parameter search strategies (also known as optimism in the face of uncertainty approaches)  [Abbasi-Yadkori et al., 2011, Abeille et al., 2021]. We use Bernstein-style concentration for self-normalized martingales, which were previously proposed in the context of scalar logistic bandits in Faury et al. [2020], to define our confidence set over the true parameter, taking into account the effects of the local curvature of the reward function. We show that the performance of CB-MNL (as measured by regret) is bounded as O~\deldT+κ~O\del𝑑𝑇𝜅\tilde{\mathrm{O}}\del{d\sqrt{T}+\kappa}over~ start_ARG roman_O end_ARG italic_d square-root start_ARG italic_T end_ARG + italic_κ, significantly improving the theoretical performance over existing algorithms where κ𝜅\kappaitalic_κ appears as a multiplicative factor in the leading term. We also leverage a self-concordance [Bach, 2010] like relation for the multinomial logit reward function [Zhang & Lin, 2015], which helps us limit the effect of κ𝜅\kappaitalic_κ on the final regret upper bound to only the higher-order terms. Finally, we propose a different convex confidence set for the optimization problem in the decision set of CB-MNL, which reduces the optimization problem to a constrained convex problem.

In summary, our work establishes strong worst-case regret guarantees by carefully accounting for local gradient information and using second-order function approximation for the estimation error.

1.4 Comparison with notable prior works

Comparison with Filippi et al. [2010] Our setting is different from the standard generalized linear bandit of Filippi et al. [2010]. In our setting, the reward due to an action (assortment) can be dependent on up to K𝐾Kitalic_K variables (θxt,i,i𝒬tsubscript𝜃subscript𝑥𝑡𝑖𝑖subscript𝒬𝑡\theta_{*}\cdot x_{t,i},\,i\in\mathcal{Q}_{t}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ⋅ italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT , italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT) instead of a single variable. Further, we focus on removing the multiplicative dependence on κ𝜅\kappaitalic_κ from the regret bounds. This leads to a more involved technical treatment in our work.

Comparison with Oh & Iyengar [2019] The Thompson Sampling based approach is inherently different from our Optimism in the face of uncertainty (OFU) style Algorithm CB-MNL. However, the main result in Oh & Iyengar [2019] also relies on a confidence set based analysis along the lines of Filippi et al. [2010] but has a multiplicative κ𝜅\kappaitalic_κ factor in the bound.

Comparison with Faury et al. [2020] Faury et al. [2020] use a bonus term for optimization in each round, and their algorithm performs non-trivial projections on the admissible log-odds. While we do reuse the Bernstein-style concentration inequality as proposed by them, their results do not seem to extend directly to the MNL setting without requiring significantly more work. Further, our algorithm CB-MNL performs an optimistic parameter search for making decisions instead of using a bonus term, which allow for a cleaner and shorter analysis.

Comparison with Oh & Iyengar [2021] While the authors in Oh & Iyengar [2021] provide sharper bounds by a factor of O~(d)~O𝑑\tilde{\mathrm{O}}(\sqrt{d})over~ start_ARG roman_O end_ARG ( square-root start_ARG italic_d end_ARG ), they still retain the κ𝜅\kappaitalic_κ multiplicative factor in their regret bounds. Their focus is on improving the dependence on the dimension parameter d𝑑ditalic_d for the dynamic assortment optimization problem.

Comparison with Abeille et al. [2021]  Abeille et al. [2021] recently proposed the idea of convex relaxation of the confidence set for the more straightforward logistic bandit setting. Our work can be viewed as an extension of their construction to the MNL setting.

Comparison with Amani & Thrampoulidis [2021] While the authors in Amani & Thrampoulidis [2021] also extend the algorithms of Faury et al. [2020] to a multinomial problem, their setting is materially different from ours. They model various click-types for the same advertisement (action) via the multinomial distribution. further, they consider actions played at each round to be non-combinatorial, i.e., a single action as opposed to a bundle of actions, which differs from the assortment optimization setting in this work. Therefore, their approach and technical analysis are different from ours.

2 Preliminaries

2.1 Notations

For a vector xd𝑥superscript𝑑x\,\in\,\mathbb{R}^{d}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, xsuperscript𝑥topx^{\top}italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT denotes the transpose. Given a positive definite matrix 𝐌d×d𝐌superscript𝑑𝑑\mathbf{M}\,\in\,\mathbb{R}^{d\times d}bold_M ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT, the induced norm is given by x𝐌=x𝐌xsubscriptnorm𝑥𝐌𝑥𝐌𝑥||x||_{\mathbf{M}}=\sqrt{x\mathbf{M}x}| | italic_x | | start_POSTSUBSCRIPT bold_M end_POSTSUBSCRIPT = square-root start_ARG italic_x bold_M italic_x end_ARG. For two symmetric matrices 𝐌𝟏subscript𝐌1\mathbf{M_{1}}bold_M start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT and 𝐌𝟐subscript𝐌2\mathbf{M_{2}}bold_M start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT, 𝐌𝟏𝐌𝟐succeeds-or-equalssubscript𝐌1subscript𝐌2\mathbf{M_{1}}\succeq\mathbf{M_{2}}bold_M start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ⪰ bold_M start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT means that 𝐌𝟏𝐌𝟐subscript𝐌1subscript𝐌2\mathbf{M_{1}}-\mathbf{M_{2}}bold_M start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT - bold_M start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT is positive semi-definite. For any positive integer n𝑛nitalic_n, [n]{1,2,3,,n}delimited-[]𝑛123𝑛[n]\coloneqq\{1,2,3,\cdots,n\}[ italic_n ] ≔ { 1 , 2 , 3 , ⋯ , italic_n }. 𝐈dsubscript𝐈𝑑\mathbf{I}_{d}bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT denotes an identity matrix of dimension d×d𝑑𝑑d\times ditalic_d × italic_d. The platform (i.e. the learner) is referred using the pronouns she/her/hers.

2.2 Model setting

Rewards Model:

At every round t𝑡titalic_t, the platform (learner) is presented with set 𝒩𝒩\mathcal{N}caligraphic_N of distinct items, indexed by i[N]𝑖delimited-[]𝑁i\,\in\,[N]italic_i ∈ [ italic_N ] and their attribute vectors (contexts): {xt,i}i=1Nsuperscriptsubscriptsubscript𝑥𝑡𝑖𝑖1𝑁\{x_{t,i}\}_{i=1}^{N}{ italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT such that i[N],xt,idformulae-sequencefor-all𝑖delimited-[]𝑁subscript𝑥𝑡𝑖superscript𝑑\forall\,i\,\in[N],\,x_{t,i}\,\in\,\mathbb{R}^{d}∀ italic_i ∈ [ italic_N ] , italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT, where N=|𝒩|𝑁𝒩N=|\mathcal{N}|italic_N = | caligraphic_N | is the cardinality of set 𝒩𝒩\mathcal{N}caligraphic_N. The platform then selects an assortment 𝒬t𝒩subscript𝒬𝑡𝒩\mathcal{Q}_{t}\subset\mathcal{N}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊂ caligraphic_N and the interacting consumer (environment) offers the reward rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to the platform. The assortments have a cardinality of at most K𝐾Kitalic_K, i.e. |𝒬t|Ksubscript𝒬𝑡𝐾|\mathcal{Q}_{t}|\leq K| caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | ≤ italic_K. The platform’s decision is based on the entire history of interaction. The history is represented by the filtration set t{0,σ({{xs,i}i=1N,𝒬s}s=1t1)}subscript𝑡subscript0𝜎superscriptsubscriptsuperscriptsubscriptsubscript𝑥𝑠𝑖𝑖1𝑁subscript𝒬𝑠𝑠1𝑡1\mathcal{F}_{t}\coloneqq\{\mathcal{F}_{0},\sigma(\{\{x_{s,i}\}_{i=1}^{N},% \mathcal{Q}_{s}\}_{s=1}^{t-1})\}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≔ { caligraphic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_σ ( { { italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT , caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) },444σ({})𝜎\sigma(\{\cdot\})italic_σ ( { ⋅ } ) denotes the σ𝜎\sigmaitalic_σ-algebra set over the sequence {}\{\cdot\}{ ⋅ }. where 0subscript0\mathcal{F}_{0}caligraphic_F start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is any prior information available to the platform. The interaction lasts for t=1,2,,T𝑡12𝑇t=1,2,\cdots,Titalic_t = 1 , 2 , ⋯ , italic_T rounds. Conditioned on tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the reward rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is a binary vector such that rt{0,1}Nsubscript𝑟𝑡superscript01𝑁r_{t}\,\in\,\{0,1\}^{N}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and the vector {rt,i}i𝒬tsubscriptsubscript𝑟𝑡𝑖𝑖subscript𝒬𝑡\{r_{t,i}\}_{i\in\mathcal{Q}_{t}}{ italic_r start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT follows a multinomial distribution. We have rt,i=0,i𝒬tformulae-sequencesubscript𝑟𝑡𝑖0for-all𝑖subscript𝒬𝑡r_{t,i}=0,\forall\,i\,\notin\,\mathcal{Q}_{t}italic_r start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 0 , ∀ italic_i ∉ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Specifically, the probability that rt,i=1,i𝒬tformulae-sequencesubscript𝑟𝑡𝑖1for-all𝑖subscript𝒬𝑡r_{t,i}=1,\forall\,i\,\in\,\mathcal{Q}_{t}italic_r start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 , ∀ italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is given by the softmax function:

(rt,i)=1|𝒬t,t=μi(𝒬t,θ)exp(xt,iθ)1+j𝒬texp(xt,jθ),\displaystyle\mathbb{P}(r_{t,i)=1|\mathcal{Q}_{t},\mathcal{F}_{t}}=\mu_{i}(% \mathcal{Q}_{t},\theta_{*})\coloneqq\frac{\exp(x_{t,i}^{\top}\theta_{*})}{1+% \sum_{j\in\mathcal{Q}_{t}}\exp(x_{t,j}^{\top}\theta_{*})},blackboard_P ( italic_r start_POSTSUBSCRIPT italic_t , italic_i ) = 1 | caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ≔ divide start_ARG roman_exp ( italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_x start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_ARG , (1)

where θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT is an unknown time-invariant parameter. The numeral 1111 in the denominator accounts for the case when the consumer purchases none of the items in the assortment. By definition, i𝒬trt,i1subscript𝑖subscript𝒬𝑡subscript𝑟𝑡𝑖1\sum_{i\in\mathcal{Q}_{t}}r_{t,i}\leq 1∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT ≤ 1, i.e., rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is multinomial with a single trial. Also, the expected revenue due to the assortment555Each item i𝑖iitalic_i is also associated with a price (or revenue) parameter, pt,isubscript𝑝𝑡𝑖p_{t,i}italic_p start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT for round t𝑡titalic_t. We assume pt,i=1subscript𝑝𝑡𝑖1p_{t,i}=1italic_p start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 for all items and rounds for an uncluttered exposition of results. If pt,isubscript𝑝𝑡𝑖p_{t,i}italic_p start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT is not 1111, then it features as a fixed factor in the definition of μi()subscript𝜇𝑖\mu_{i}(\cdot)italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) and the analysis exactly follows as that presented here pt,i=1subscript𝑝𝑡𝑖1p_{t,i}=1italic_p start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = 1 for all rounds and items. 𝒬tsubscript𝒬𝑡\mathcal{Q}_{t}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is given by:

μ(𝒬t,θ)i𝒬tμi(𝒬t,θ).𝜇subscript𝒬𝑡subscript𝜃subscript𝑖subscript𝒬𝑡subscript𝜇𝑖subscript𝒬𝑡subscript𝜃\mu(\mathcal{Q}_{t},\theta_{*})\coloneqq\sum_{i\in\mathcal{Q}_{t}}\mu_{i}(% \mathcal{Q}_{t},\theta_{*}).italic_μ ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ≔ ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) . (2)

Also, {xt,i}subscript𝑥𝑡𝑖\{x_{t,i}\}{ italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT } may vary adversarially in each round in our model, unlike in Li et al. [2017], where the attribute vectors are assumed to be drawn from an unknown i.i.d. distribution. When K=1𝐾1K=1italic_K = 1, the above model reduces to the case of the logistic bandit.

Choice Modeling Perspective:

Eq 1 can be considered from a discrete choice modeling viewpoint, where the platform presents an assortment of items to a user, and the user selects at most one item from this assortment. In this interpretation, the probability of choosing an item i𝑖iitalic_i is given by μi(𝒬t,θ)subscript𝜇𝑖subscript𝒬𝑡subscript𝜃\mu_{i}(\mathcal{Q}_{t},\theta_{*})italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ). Likewise, the probability of the user not selecting any item is given by: 1/(1+j𝒬texp(xt,jθ))11subscript𝑗subscript𝒬𝑡superscriptsubscript𝑥𝑡𝑗topsubscript𝜃\nicefrac{{1}}{{(1+\sum_{j\in\mathcal{Q}_{t}}\exp(x_{t,j}^{\top}\theta_{*}))}}/ start_ARG 1 end_ARG start_ARG ( 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_x start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ) end_ARG. The platform is motivated to offer such an assortment that the user’s propensity to make a successful selection is high.

Regret:

The platform does not know the value of θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. Our learning algorithm CB-MNL (see Algorithm 1) sequentially makes the assortment selection decisions, 𝒬1,𝒬2,,𝒬Tsubscript𝒬1subscript𝒬2subscript𝒬𝑇\mathcal{Q}_{1},\mathcal{Q}_{2},\cdots,\mathcal{Q}_{T}caligraphic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_Q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , caligraphic_Q start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT so that the cumulative expected revenue t=1Tμ(𝒬t,θ)superscriptsubscript𝑡1𝑇𝜇subscript𝒬𝑡subscript𝜃\sum_{t=1}^{T}\mu(\mathcal{Q}_{t},\theta_{*})∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_μ ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) is high. Its performance is quantified by pseudo-regret, which is the gap between the expected revenue generated by the algorithm and that of the optimal assortments in hindsight. The learning goal is to minimize the cumulative pseudo-regret up to time T𝑇Titalic_T, defined as:

𝐑Tt=1T[μ(𝒬t,θ)μ(𝒬t,θ)],subscript𝐑𝑇superscriptsubscript𝑡1𝑇delimited-[]𝜇superscriptsubscript𝒬𝑡subscript𝜃𝜇subscript𝒬𝑡subscript𝜃\mathbf{R}_{T}\coloneqq\sum_{t=1}^{T}[\mu(\mathcal{Q}_{t}^{*},\theta_{*})-\mu(% \mathcal{Q}_{t},\theta_{*})],bold_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ italic_μ ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_μ ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ] , (3)

where 𝒬tsuperscriptsubscript𝒬𝑡\mathcal{Q}_{t}^{*}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the offline optimal assortment at round t𝑡titalic_t under full information of θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT, defined as: 𝒬targmax𝒬𝒩μ(𝒬,θ).superscriptsubscript𝒬𝑡subscriptargmax𝒬𝒩𝜇𝒬subscript𝜃\mathcal{Q}_{t}^{*}\coloneqq\operatorname*{argmax}_{\mathcal{Q}\subset\mathcal% {N}}\mu(\mathcal{Q},\theta_{*}).caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≔ roman_argmax start_POSTSUBSCRIPT caligraphic_Q ⊂ caligraphic_N end_POSTSUBSCRIPT italic_μ ( caligraphic_Q , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) .

As in the case of contextual linear bandits Abbasi-Yadkori et al. [2011], Chu et al. [2011], the emphasis here is to make good sequential decisions while tracking the true parameter θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT with a close estimate θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (see Section 2.4). Our algorithm (like others) does not necessarily improve the estimate at each round. However, it ensures that θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT is always within a confidence interval of the estimate of θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT (with high probability) and the future analysis demonstrates that the aggregate prediction error over all T𝑇Titalic_T rounds is bounded.

Our model is fairly general, as the contextual information xt,isubscript𝑥𝑡𝑖x_{t,i}italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT may be used to model combined information of the item i𝑖iitalic_i in the set 𝒩𝒩\mathcal{N}caligraphic_N and the user at round t𝑡titalic_t. Suppose the user at round t𝑡titalic_t is represented by a vector vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the item i𝑖iitalic_i has attribute vector as wt,isubscript𝑤𝑡𝑖w_{t,i}italic_w start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT, then xt,i=vec(vtwt,i)subscript𝑥𝑡𝑖vecsubscript𝑣𝑡superscriptsubscript𝑤𝑡𝑖topx_{t,i}=\text{vec}(v_{t}w_{t,i}^{\top})italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = vec ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) (vectorized outer product of vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and wt,isubscript𝑤𝑡𝑖w_{t,i}italic_w start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT). We assume that the platform knows the interaction horizon T𝑇Titalic_T.
Additional notations: 𝐗𝒬tsubscript𝐗subscript𝒬𝑡\mathbf{X}_{\mathcal{Q}_{t}}bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT denotes a design matrix whose columns are the attribute vectors (xt,isubscript𝑥𝑡𝑖x_{t,i}italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT) of the items in the assortment 𝒬tsubscript𝒬𝑡\mathcal{Q}_{t}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Also, we now denote μ(𝒬t,θ)𝜇subscript𝒬𝑡subscript𝜃\mu(\mathcal{Q}_{t},\theta_{*})italic_μ ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) as μ(𝐗𝒬tθ)𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃\mu(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) to signify that μ(𝒬t,θ):|𝒬t|:𝜇subscript𝒬𝑡subscript𝜃superscriptsubscript𝒬𝑡\mu(\mathcal{Q}_{t},\theta_{*}):\mathbb{R}^{|\mathcal{Q}_{t}|}\to\mathbb{R}italic_μ ( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) : blackboard_R start_POSTSUPERSCRIPT | caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT → blackboard_R.

2.3 Assumptions

Following Filippi et al. [2010], Li et al. [2017], Oh & Iyengar [2019], Faury et al. [2020], we introduce the following assumptions on the problem structure.

Assumption 1 (Bounded parameters).

θΘsubscript𝜃Θ\theta_{*}\,\in\,\Thetaitalic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ roman_Θ, where ΘΘ\Thetaroman_Θ is a compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. SmaxθΘθ2𝑆subscript𝜃Θsubscriptnorm𝜃2S\coloneqq\max_{\theta\in\Theta}||\theta||_{2}italic_S ≔ roman_max start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT | | italic_θ | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is known to the learner. Further, xt,i21subscriptnormsubscript𝑥𝑡𝑖21||x_{t,i}||_{2}\leq 1| | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 for all values of t𝑡titalic_t and i𝑖iitalic_i.

This assumption simplifies analysis and removes scaling constants from the equations.

Assumption 2.

There exists κ>0𝜅0\kappa>0italic_κ > 0 such that for every item i𝒬t𝑖subscript𝒬𝑡i\,\in\,\mathcal{Q}_{t}italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and for any 𝒬t𝒩subscript𝒬𝑡𝒩\mathcal{Q}_{t}\subset\mathcal{N}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊂ caligraphic_N and all rounds t𝑡titalic_t:

inf𝒬t𝒩,θdμi(𝐗𝒬tθ)(1μi(𝐗𝒬tθ))1κ.subscriptinfimumformulae-sequencesubscript𝒬𝑡𝒩𝜃superscript𝑑subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡top𝜃1subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡top𝜃1𝜅\inf_{\mathcal{Q}_{t}\subset\mathcal{N},\theta\in\mathbb{R}^{d}}\mu_{i}(% \mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta)(1-\mu_{i}(\mathbf{X}_{\mathcal{Q}_{% t}}^{\top}\theta))\geq\frac{1}{\kappa}.roman_inf start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊂ caligraphic_N , italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) ( 1 - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) ) ≥ divide start_ARG 1 end_ARG start_ARG italic_κ end_ARG .

Note that μi(𝐗𝒬tθ)(1μi(𝐗𝒬tθ))subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡top𝜃1subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡top𝜃\mu_{i}(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta)(1-\mu_{i}(\mathbf{X}_{% \mathcal{Q}_{t}}^{\top}\theta))italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) ( 1 - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) ) denotes the derivative of the softmax function along the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT direction. This assumption is necessary from the likelihood theory Lehmann & Casella [2006] as it ensures that the fisher matrix for θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT estimation is invertible for all possible input instances. We refer to Oh & Iyengar [2019] for a detailed discussion in this regard. We denote L𝐿Litalic_L and M𝑀Mitalic_M as the upper bounds on the first and second derivatives of the softmax function along any component, respectively. We have L,M1𝐿𝑀1L,M\leq 1italic_L , italic_M ≤ 1 [Gao & Pavel, 2017] for all problem instances.

2.4 Maximum likelihood estimate

CB-MNL, described in Algorithm 1, uses a regularized maximum likelihood estimator to compute an estimate θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. Since {rt,i}i𝒬tsubscriptsubscript𝑟𝑡𝑖𝑖subscript𝒬𝑡\{r_{t,i}\}_{i\in\mathcal{Q}_{t}}{ italic_r start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT follows a multinomial distribution, the regularized log-likelihood (negative cross entropy loss) function, till the (t1)thsubscript𝑡1𝑡(t-1)_{th}( italic_t - 1 ) start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT round, under parameter θ𝜃\thetaitalic_θ could be written as:

tλt(θ)=s=1t1i𝒬srs,ilog(μi(𝐗𝒬sθ))λt2θ22,superscriptsubscript𝑡subscript𝜆𝑡𝜃superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑟𝑠𝑖subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃subscript𝜆𝑡2superscriptsubscriptnorm𝜃22\displaystyle\mathcal{L}_{t}^{\lambda_{t}}(\theta)=\sum_{s=1}^{t-1}\sum_{i\in% \mathcal{Q}_{s}}r_{s,i}\log(\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta)% )-\frac{\lambda_{t}}{2}||\theta||_{2}^{2},caligraphic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_θ ) = ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT roman_log ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) ) - divide start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG | | italic_θ | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (4)

tλt(θ)superscriptsubscript𝑡subscript𝜆𝑡𝜃\mathcal{L}_{t}^{\lambda_{t}}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_θ ) is concave in θ𝜃\thetaitalic_θ for λt>0subscript𝜆𝑡0\lambda_{t}>0italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 0, and the maximum likelihood estimator is given by calculating the critical point of tλt(θ)superscriptsubscript𝑡subscript𝜆𝑡𝜃\mathcal{L}_{t}^{\lambda_{t}}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_θ ). Setting θtλt(θ)=0subscript𝜃superscriptsubscript𝑡subscript𝜆𝑡𝜃0\nabla_{\theta}\mathcal{L}_{t}^{\lambda_{t}}(\theta)=0∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_θ ) = 0, we get θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as the solution of:

s=1t1i𝒬s[μi(𝐗𝒬sθ^t)rt,i]xs,i+λtθ^t=0.superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠delimited-[]subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript^𝜃𝑡subscript𝑟𝑡𝑖subscript𝑥𝑠𝑖subscript𝜆𝑡subscript^𝜃𝑡0\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}[\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}% }^{\top}\hat{\theta}_{t})-r_{t,i}]x_{s,i}+\lambda_{t}\hat{\theta}_{t}=0.∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_r start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT ] italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 0 . (5)

For future analysis we also define

gt(θ)s=1t1i𝒬sμi(𝐗𝒬sθ)xs,i+λtθ,gt(θ^t)s=1t1i𝒬srs,ixs,i.formulae-sequencesubscript𝑔𝑡𝜃superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃subscript𝑥𝑠𝑖subscript𝜆𝑡𝜃subscript𝑔𝑡subscript^𝜃𝑡superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑟𝑠𝑖subscript𝑥𝑠𝑖\displaystyle g_{t}(\theta)\coloneqq\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}% \mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta)x_{s,i}+\lambda_{t}\theta,% \quad g_{t}(\hat{\theta}_{t})\coloneqq\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s% }}r_{s,i}x_{s,i}.italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT . (6)

At the start of the interaction, when no contexts have been observed, θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is well-defined by Eq (5) when λt>0subscript𝜆𝑡0\lambda_{t}>0italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 0. Therefore, the regularization parameter λtsubscript𝜆𝑡\lambda_{t}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT makes CB-MNL burn-in period free, in contrast to some previous works, e.g. Filippi et al. [2010].

2.5 Confidence sets

Algorithm 1 follows the template of in the face of uncertainty (OFU) strategies [Auer et al., 2002, Filippi et al., 2010, Faury et al., 2020]. Technical analysis of OFU algorithms relies on two key factors: the design of the confidence set and the ease of choosing an action using the confidence set.

In Section 4, we derive Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) (defined below) as the confidence set on θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT such that θCt(δ),tsubscript𝜃subscript𝐶𝑡𝛿for-all𝑡\theta_{*}\in C_{t}(\delta),\,\forall titalic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) , ∀ italic_t with probability at least 1δ1𝛿1-\delta1 - italic_δ (randomness is over user choices). Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) used for making decisions at each round (see Eq (12)) by CB-MNL in Algorithm 1:

Et(δ){θΘ,tλt(θ)tλt(θ^t)βt2(δ)},subscript𝐸𝑡𝛿formulae-sequence𝜃Θsubscriptsuperscriptsubscript𝜆𝑡𝑡𝜃subscriptsuperscriptsubscript𝜆𝑡𝑡subscript^𝜃𝑡subscriptsuperscript𝛽2𝑡𝛿E_{t}(\delta)\coloneqq\{\theta\in\Theta,\,\mathcal{L}^{\lambda_{t}}_{t}(\theta% )-\mathcal{L}^{\lambda_{t}}_{t}(\hat{\theta}_{t})\leq\beta^{2}_{t}(\delta)\},italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ≔ { italic_θ ∈ roman_Θ , caligraphic_L start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - caligraphic_L start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) } , (7)

where βt(δ)γt(δ)+γt2(δ)λtsubscript𝛽𝑡𝛿subscript𝛾𝑡𝛿superscriptsubscript𝛾𝑡2𝛿subscript𝜆𝑡\beta_{t}(\delta)\coloneqq\gamma_{t}(\delta)+\frac{\gamma_{t}^{2}(\delta)}{% \lambda_{t}}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ≔ italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) + divide start_ARG italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_δ ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG, and

γt(δ)subscript𝛾𝑡𝛿absent\displaystyle\gamma_{t}(\delta)\coloneqqitalic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ≔ λt2+2λtlog((λt+LKt/d)d/2λtd/2δ)+2dλtlog(2).subscript𝜆𝑡22subscript𝜆𝑡superscriptsubscript𝜆𝑡𝐿𝐾𝑡𝑑𝑑2superscriptsubscript𝜆𝑡𝑑2𝛿2𝑑subscript𝜆𝑡2\displaystyle\frac{\sqrt{\lambda_{t}}}{2}+\frac{2}{\sqrt{\lambda_{t}}}\log(% \frac{(\lambda_{t}+LKt/d)^{d/2}\lambda_{t}^{-d/2}}{\delta})+\frac{2d}{\sqrt{% \lambda_{t}}}\log(2).divide start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 end_ARG + divide start_ARG 2 end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( divide start_ARG ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L italic_K italic_t / italic_d ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG ) + divide start_ARG 2 italic_d end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( 2 ) . (8)

A confidence set similar to Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) in Eq (7) was recently proposed in Abeille et al. [2021] for the simpler logisitic bandit setting. Here, we extend its construction to the MNL setting. The set Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) is convex since the log-loss function is convex. This makes the decision step in Eq (12) a constraint convex optimization problem. However, it is difficult to prove bounds directly with Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ). Therefore we leverage a result in Faury et al. [2020], where the authors proposed a new Bernstein-like tail inequality for self-normalized vectorial martingales (see Appendix A.1), to derive another confidence set on θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT:

Ct(δ){θΘ,gt(θ)gt(θ^t)𝐇t1(θ)γt(δ)}.subscript𝐶𝑡𝛿formulae-sequence𝜃Θsubscriptnormsubscript𝑔𝑡𝜃subscript𝑔𝑡subscript^𝜃𝑡superscriptsubscript𝐇𝑡1𝜃subscript𝛾𝑡𝛿C_{t}(\delta)\coloneqq\{\theta\in\Theta,\,||g_{t}(\theta)-g_{t}(\hat{\theta}_{% t})||_{\mathbf{H}_{t}^{-1}(\theta)}\leq\gamma_{t}(\delta)\}.italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ≔ { italic_θ ∈ roman_Θ , | | italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ ) end_POSTSUBSCRIPT ≤ italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) } . (9)

where

𝐇t(θ1)s=1t1i𝒬sμ˙i(𝐗𝒬sθ1)xs,ixs,i+λt𝐈d.subscript𝐇𝑡subscript𝜃1superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\mathbf{H}_{t}(\theta_{1})\coloneqq\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}% \dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1})x_{s,i}x_{s,i}^{% \top}+\lambda_{t}\mathbf{I}_{d}.bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT . (10)

μ˙i()subscript˙𝜇𝑖\dot{\mu}_{i}(\cdot)over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) is the partial derivative of μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the direction of the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT component of the assortment and γt(δ)subscript𝛾𝑡𝛿\gamma_{t}(\delta)italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) is defined in Eq (8). The value of γt(δ)subscript𝛾𝑡𝛿\gamma_{t}(\delta)italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) is an outcome of the concentration result of Faury et al. [2020]. As a consequence of this concentration, we have θCt(δ)subscript𝜃subscript𝐶𝑡𝛿\theta_{*}\,\in\,C_{t}(\delta)italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) with probability at least 1δ1𝛿1-\delta1 - italic_δ (randomness is over user choices). The Bernstein-like concentration inequality used here is similar to Theorem 1 of Abbasi-Yadkori et al. [2011] with the difference that we take into account local variance information (hence local curvature information of the reward function) in defining 𝐇tsubscript𝐇𝑡\mathbf{H}_{t}bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The above discussion is formalized in Appendix A.1.

The set Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) is non-convex, which follows from the non-linearity of 𝐇t1(θ)subscriptsuperscript𝐇1𝑡𝜃\mathbf{H}^{-1}_{t}(\theta)bold_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ). We use Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) directly to prove regret guarantees. In Section 4.3, we mention how the a convex set Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) is related to Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) and share many useful properties of Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ). Till then, to maintain ease of technical flow and to compare it with the previous work Faury et al. [2020], we assume that the algorithm uses Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) as the confidence set. We highlight that for the confidence sets, Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) and Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), Algorithm CB-MNL is identical except for the calculation in Eq (12). For later sections we also define the following norm inducing design matrix based on all the contexts observed till time t1𝑡1t-1italic_t - 1:

𝐕ts=1t1i𝒬sxs,ixs,i+λt𝐈d.subscript𝐕𝑡superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\mathbf{V}_{t}\coloneqq\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}x_{s,i}x_{s,i% }^{\top}+\lambda_{t}\mathbf{I}_{d}.bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT . (11)

3 Algorithm

At each round t𝑡titalic_t, the attribute parameters (contexts) {xt,1,xt,2,,xt,N}subscript𝑥𝑡1subscript𝑥𝑡2subscript𝑥𝑡𝑁\{x_{t,1},x_{t,2},\cdots,x_{t,N}\}{ italic_x start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t , 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_t , italic_N end_POSTSUBSCRIPT } are made available to the algorithm (online platform) CB-MNL. The algorithm calculates an estimate of the true parameter θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT according to Eq (5). The algorithm keeps track of the confidence set Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) (Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ )) as defined in Eq (9) (Eq (7). Let the set 𝒜𝒜\mathbf{\mathcal{A}}caligraphic_A contain all feasible assortments of 𝒩𝒩\mathcal{N}caligraphic_N with cardinality up to K𝐾Kitalic_K. The algorithm makes the following decision:

(𝒬t,θt)=argmaxAt𝒜,θCt(δ)μ(𝐗Atθ).subscript𝒬𝑡subscript𝜃𝑡subscriptargmaxformulae-sequencesubscript𝐴𝑡𝒜𝜃subscript𝐶𝑡𝛿𝜇superscriptsubscript𝐗subscript𝐴𝑡top𝜃(\mathcal{Q}_{t},\theta_{t})=\operatorname*{argmax}_{A_{t}\in\mathcal{A},% \theta\in C_{t}(\delta)}\mu(\mathbf{X}_{A_{t}}^{\top}\theta).( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = roman_argmax start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A , italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT italic_μ ( bold_X start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) . (12)

In each round t𝑡titalic_t, the reward of the online platform is denoted by the vector rtsubscript𝑟𝑡r_{t}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Also, the prediction error of θ𝜃\thetaitalic_θ at 𝐗𝒬tsubscript𝐗subscript𝒬𝑡\mathbf{X}_{\mathcal{Q}_{t}}bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, defined as:

Δpred(𝐗𝒬t,θ)|μ(𝐗𝒬tθ)μ(𝐗𝒬tθ)|.superscriptΔpredsubscript𝐗subscript𝒬𝑡𝜃𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃𝜇superscriptsubscript𝐗subscript𝒬𝑡top𝜃\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta)\coloneqq|\mu(\mathbf% {X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})-\mu(\mathbf{X}_{\mathcal{Q}_{t}}^{\top% }\theta)|.roman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ ) ≔ | italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) | . (13)

Δpred\del𝐗𝒬t,θsuperscriptΔpred\delsubscript𝐗subscript𝒬𝑡𝜃\Delta^{\text{pred}}\del{\mathbf{X}_{\mathcal{Q}_{t}},\theta}roman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ represents the difference in perceived rewards due to the inaccuracy in the estimation of the parameter θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT.

Remark 1 (Optimistic parameter search).

CB-MNL enforces optimism via an optimistic parameter search (e.g. in Abbasi-Yadkori et al. [2011]), which is in contrast to the use of an exploration bonus as seen in Faury et al. [2020], Filippi et al. [2010]. Optimistic parameter search provides a cleaner description of the learning strategy. In non-linear reward models, both approaches may not follow similar trajectory but may have overlapping analysis styles (see Filippi et al. [2010] for a short discussion).

Remark 2 (Tractable decision-making).

In Section 4.3, we show that the decision problem of Eq (12) can be relaxed to an convex optimization problem by using a convex set Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), instead of Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), while keeping the regret performance of Algorithm 1 intact up to constant factors.

Input: regularization parameters: λt,t[T]subscript𝜆𝑡for-all𝑡delimited-[]𝑇\lambda_{t},\forall\,t\,\in\,[T]italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ∀ italic_t ∈ [ italic_T ], N𝑁Nitalic_N distinct items: 𝒩𝒩\mathcal{N}caligraphic_N, K𝐾Kitalic_K
for t 1𝑡1t\,\geq\,1italic_t ≥ 1  do

       Given: Set {xt,1,xt,2,,xt,N}subscript𝑥𝑡1subscript𝑥𝑡2subscript𝑥𝑡𝑁\{x_{t,1},x_{t,2},\cdots,x_{t,N}\}{ italic_x start_POSTSUBSCRIPT italic_t , 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_t , 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_t , italic_N end_POSTSUBSCRIPT } of d𝑑ditalic_d-dimensional parameters.
Estimate θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT according to Eq (5).
Construct Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) as defined in Eq (9).
Construct the set 𝒜𝒜\mathbf{\mathcal{A}}caligraphic_A of all feasible assortments of 𝒩𝒩\mathcal{N}caligraphic_N with cardinality upto K𝐾Kitalic_K.
Play (𝒬t,θt)=argmaxAt𝒜,θCt(δ)μ(𝐗Atθ)subscript𝒬𝑡subscript𝜃𝑡subscriptargmaxformulae-sequencesubscript𝐴𝑡𝒜𝜃subscript𝐶𝑡𝛿𝜇superscriptsubscript𝐗subscript𝐴𝑡top𝜃(\mathcal{Q}_{t},\theta_{t})=\operatorname*{argmax}_{A_{t}\in\mathcal{A},% \theta\in C_{t}(\delta)}\mu(\mathbf{X}_{A_{t}}^{\top}\theta)( caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = roman_argmax start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A , italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) end_POSTSUBSCRIPT italic_μ ( bold_X start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ).
Observe rewards 𝐫tsubscript𝐫𝑡\mathbf{r}_{t}bold_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
end for
Algorithm 1 CB-MNL

4 Main results

We present a regret upper bound for the CB-MNL algorithm in Theorem 1.

Theorem 1.

With probability at least 1δ1𝛿1-\delta1 - italic_δ over the randomness of user choices:

𝐑Tsubscript𝐑𝑇absent\displaystyle\mathbf{R}_{T}\leqbold_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≤ C1γT(δ)2dlog(1+LKTdλT)T+C2κγT(δ)2dlog(1+KTdλT),subscript𝐶1subscript𝛾𝑇𝛿2𝑑1𝐿𝐾𝑇𝑑subscript𝜆𝑇𝑇subscript𝐶2𝜅subscript𝛾𝑇superscript𝛿2𝑑1𝐾𝑇𝑑subscript𝜆𝑇\displaystyle C_{1}\gamma_{T}(\delta)\sqrt{2d\log(1+\frac{LKT}{d\lambda_{T}})T% }+C_{2}\kappa\gamma_{T}(\delta)^{2}d\log(1+\frac{KT}{d\lambda_{T}}),italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_δ ) square-root start_ARG 2 italic_d roman_log ( 1 + divide start_ARG italic_L italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG ) italic_T end_ARG + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_κ italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d roman_log ( 1 + divide start_ARG italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_ARG ) ,

where the constants are given as C1=(4+8S)subscript𝐶148𝑆C_{1}=(4+8S)italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( 4 + 8 italic_S ), C2=4(4+8S)3/2Msubscript𝐶24superscript48𝑆32𝑀C_{2}=4(4+8S)^{\nicefrac{{3}}{{2}}}Mitalic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4 ( 4 + 8 italic_S ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_M, and γT(δ)subscript𝛾𝑇𝛿\gamma_{T}(\delta)italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_δ ) is given by Eq (8).

The formal proof is deferred to the technical Appendix, in this section we discuss the key technical ideas leading to this result. The order dependence on the model parameters is made explicit by the following corollary.

Corollary 2.

Setting the regularization parameter λT=O(dlog(KT))subscript𝜆𝑇O𝑑𝐾𝑇\lambda_{T}=\mathrm{O}(d\log(KT))italic_λ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = roman_O ( italic_d roman_log ( italic_K italic_T ) ), where K𝐾Kitalic_K is the maximum cardinality of the assortments to be selected, makes γT(δ)=O(d1/2log1/2(KT))subscript𝛾𝑇𝛿Osuperscript𝑑12superscript12𝐾𝑇\gamma_{T}(\delta)=\mathrm{O}(d^{\nicefrac{{1}}{{2}}}\log^{\nicefrac{{1}}{{2}}% }(KT))italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_δ ) = roman_O ( italic_d start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( italic_K italic_T ) ). The regret upper bound is given by 𝐑T=O(dTlog(KT)+κd2log2(KT))subscript𝐑𝑇O𝑑𝑇𝐾𝑇𝜅superscript𝑑2superscript2𝐾𝑇\mathbf{R}_{T}=\mathrm{O}(d\sqrt{T}\log(KT)+\kappa d^{2}\log^{2}(KT))bold_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = roman_O ( italic_d square-root start_ARG italic_T end_ARG roman_log ( italic_K italic_T ) + italic_κ italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_K italic_T ) ).

Recall the expression for cumulative regret

𝐑Tsubscript𝐑𝑇\displaystyle\mathbf{R}_{T}bold_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT =t=1T[μ(𝐗𝒬tθ)μ(𝐗𝒬tθ)]absentsuperscriptsubscript𝑡1𝑇delimited-[]𝜇superscriptsubscript𝐗subscriptsuperscript𝒬𝑡topsubscript𝜃𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃\displaystyle=\sum_{t=1}^{T}[\mu(\mathbf{X}_{\mathcal{Q}^{*}_{t}}^{\top}\theta% _{*})-\mu(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})]= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ]
=t=1T[μ(𝐗𝒬tθ)μ(𝐗𝒬tθt)]pessimism+t=1T[μ(𝐗𝒬tθt)μ(𝐗𝒬tθ)]prediction error,absentsuperscriptsubscript𝑡1𝑇subscriptdelimited-[]𝜇superscriptsubscript𝐗subscriptsuperscript𝒬𝑡topsubscript𝜃𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃𝑡pessimismsuperscriptsubscript𝑡1𝑇subscriptdelimited-[]𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃𝑡𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃prediction error\displaystyle=\sum_{t=1}^{T}\underbrace{[\mu(\mathbf{X}_{\mathcal{Q}^{*}_{t}}^% {\top}\theta_{*})-\mu(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{t})]}_{\text{% pessimism}}+\sum_{t=1}^{T}\underbrace{[\mu(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}% \theta_{t})-\mu(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})]}_{\text{% prediction error}},= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT under⏟ start_ARG [ italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] end_ARG start_POSTSUBSCRIPT pessimism end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT under⏟ start_ARG [ italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ] end_ARG start_POSTSUBSCRIPT prediction error end_POSTSUBSCRIPT ,

where pessimism is the additive inverse of the optimism (difference between the payoffs under true parameters and those estimated by CB-MNL). Due to optimistic decision-making and the fact that θCt(δ)subscript𝜃subscript𝐶𝑡𝛿\theta_{*}\in C_{t}(\delta)italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) (see Eq (12)), pessimism is non-positive, for all rounds. Thus, the regret is upper bounded by the sum of the prediction error for T𝑇Titalic_T rounds. In Section 4.1 we derive an the expression for prediction error upper bound for a single round t𝑡titalic_t. We also contrast with the previous works Filippi et al. [2010], Li et al. [2017], Oh & Iyengar [2021] and point out specific technical differences which allow us to use Bernstein-like tail concentration inequality and therefore, achieve stronger regret guarantees. In Section 4.2, we describe the additional steps leading to the statement of Theorem 1. The style of the arguments is simpler and shorter than that in Faury et al. [2020]. Finally, in Section 4.3, we discuss the relationship between two confidence sets Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) and Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) and show that even using Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) in place of Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), we get the regret upper bounds with same parameter dependence as in Corollary 2. Lemma 3 gives the expression for an upper bound on the prediction error.

4.1 Bounds on prediction error

Lemma 3.

For θtCt(δ)subscript𝜃𝑡subscript𝐶𝑡𝛿\theta_{t}\in C_{t}(\delta)italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) (see Eq (12)) with probability at least 1δ1𝛿1-\delta1 - italic_δ:

Δpred(𝐗𝒬t,θt)superscriptΔpredsubscript𝐗subscript𝒬𝑡subscript𝜃𝑡absent\displaystyle\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{t})\leqroman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ (2+4S)γt(δ)i𝒬tμ˙i(𝐗𝒬tθ)xt,i𝐇t1(θ)24𝑆subscript𝛾𝑡𝛿subscript𝑖subscript𝒬𝑡subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃subscriptnormsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle(2+4S)\gamma_{t}(\delta)\sum_{i\in\mathcal{Q}_{t}}\dot{\mu}_{i}(% \mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})||x_{t,i}||_{\mathbf{H}_{t}^{-1}% (\theta_{*})}( 2 + 4 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT
+4κ(1+2S)2Mγt(δ)2i𝒬sxt,i𝐕t12,4𝜅superscript12𝑆2𝑀subscript𝛾𝑡superscript𝛿2subscript𝑖subscript𝒬𝑠subscriptsuperscriptnormsubscript𝑥𝑡𝑖2superscriptsubscript𝐕𝑡1\displaystyle+4\kappa(1+2S)^{2}M\gamma_{t}(\delta)^{2}\sum_{i\in\mathcal{Q}_{s% }}||x_{t,i}||^{2}_{\mathbf{V}_{t}^{-1}},+ 4 italic_κ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , (14)

where 𝐕t1superscriptsubscript𝐕𝑡1\mathbf{V}_{t}^{-1}bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is given by Eq (11).

The detailed proof is provided in A.4. Here we develop the main ideas leading to this result and develop an analytical flow which will be re-used while working with convex confidence set Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) in Section 4.3. In the previous works Filippi et al. [2010], Li et al. [2017], Oh & Iyengar [2021], global upper and lower bounds of the derivative of the link function (here softmax) are employed early in the analysis, leading to loss of local information carried by the MLE estimate θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. In those previous works the first step was to upper bound the prediction error by the Lipschitz constant (which is a global property) of the softmax (or sigmoid for the logistic bandit case) function, as:

|μ(𝐗𝒬tθ)μ(𝐗𝒬tθt)|L|𝐗𝒬t(θθt)|.𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃𝑡𝐿superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃subscript𝜃𝑡|\mu(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})-\mu(\mathbf{X}_{\mathcal{Q% }_{t}}^{\top}\theta_{t})|\leq L|\mathbf{X}_{\mathcal{Q}_{t}}^{\top}(\theta_{*}% -\theta_{t})|.| italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | ≤ italic_L | bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | . (15)

For building intuition, assume that 𝐗𝒬tθsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*}bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT lies on “flatter” region of μ()𝜇\mu(\cdot)italic_μ ( ⋅ ), then Eq (15) is a loose upper bound.

Next we show how using a global lower bound in form of κ𝜅\kappaitalic_κ (see Assumption 2) early in the analysis in the works Filippi et al. [2010], Li et al. [2017], Oh & Iyengar [2021] lead to loose prediction error upper bound. For this we first introduce a new notation:

αi(𝐗𝒬t,θt,θ)xt,i(θθt)μi(𝐗𝒬tθ)μi(𝐗𝒬tθt).subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝑡subscript𝜃superscriptsubscript𝑥𝑡𝑖topsubscript𝜃subscript𝜃𝑡subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃𝑡\displaystyle\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{t},\theta_{*})x_{% t,i}^{\top}(\theta_{*}-\theta_{t})\coloneqq\mu_{i}(\mathbf{X}_{\mathcal{Q}_{t}% }^{\top}\theta_{*})-\mu_{i}(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{t}).italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≔ italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (16)

We also define 𝐆t(θt,θ)s=1t1i𝒬sαi(𝐗𝒬s,θt,θ)xs,ixs,i+λ𝐈d.subscript𝐆𝑡subscript𝜃𝑡subscript𝜃superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃𝑡subscript𝜃subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖top𝜆subscript𝐈𝑑\mathbf{G}_{t}(\theta_{t},\theta_{*})\coloneqq\sum_{s=1}^{t-1}\sum_{i\in% \mathcal{Q}_{s}}\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{s}},\theta_{t},\theta_{*})% x_{s,i}x_{s,i}^{\top}+\lambda\mathbf{I}_{d}.bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT . From Eq (6), we obtain (see A.2 for details of this derivation):

g(θ)g(θt)=𝐆t(θt,θ)(θθt).𝑔subscript𝜃𝑔subscript𝜃𝑡subscript𝐆𝑡subscript𝜃𝑡subscript𝜃subscript𝜃subscript𝜃𝑡\displaystyle g(\theta_{*})-g(\theta_{t})=\mathbf{G}_{t}(\theta_{t},\theta_{*}% )(\theta_{*}-\theta_{t}).italic_g ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_g ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (17)

From Assumption 2, 𝐆t(θt,θ)subscript𝐆𝑡subscript𝜃𝑡subscript𝜃\mathbf{G}_{t}(\theta_{t},\theta_{*})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) is a positive definite matrix for λ>0𝜆0\lambda>0italic_λ > 0 and therefore can be used to define a norm. Using Cauchy-Schwarz inequality with Eq (17) simplifies the prediction error as:

Δpred(𝐗𝒬t,θt)|i𝒬tαi(𝐗𝒬t,θt,θ)xt,i𝐆t1(θt,θ)θθt𝐆t(θt,θ)|superscriptΔpredsubscript𝐗subscript𝒬𝑡subscript𝜃𝑡subscript𝑖subscript𝒬𝑡subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝑡subscript𝜃subscriptnormsubscript𝑥𝑡𝑖superscriptsubscript𝐆𝑡1subscript𝜃𝑡subscript𝜃subscriptnormsubscript𝜃subscript𝜃𝑡subscript𝐆𝑡subscript𝜃𝑡subscript𝜃\displaystyle\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{t})\leq% \big{|}\sum_{i\in\mathcal{Q}_{t}}\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},% \theta_{t},\theta_{*})||x_{t,i}||_{\mathbf{G}_{t}^{-1}(\theta_{t},\theta_{*})}% ||\theta_{*}-\theta_{t}||_{\mathbf{G}_{t}(\theta_{t},\theta_{*})}\big{|}roman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ | ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | | italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | (18)

The previous literature Filippi et al. [2010], Oh & Iyengar [2021] has utilized 𝐆t1(θt,θ)κ1𝐕tsucceeds-or-equalssuperscriptsubscript𝐆𝑡1subscript𝜃𝑡subscript𝜃superscript𝜅1subscript𝐕𝑡\mathbf{G}_{t}^{-1}(\theta_{t},\theta_{*})\succeq\kappa^{-1}\mathbf{V}_{t}bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ⪰ italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and upper bounded αi(𝐗𝒬t,θt,θ)subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝑡subscript𝜃\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{t},\theta_{*})italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) by Lipschitz constant (directly at this stage), thereby incurring loose regret bounds. Instead, here we work with the norm induced by 𝐇t(θ)subscript𝐇𝑡subscript𝜃\mathbf{H}_{t}(\theta_{*})bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) and retain the location information in αi(𝐗𝒬t,θt,θ)subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝑡subscript𝜃\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{t},\theta_{*})italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ).

Δpred(𝐗𝒬t,θt)|i𝒬tαi(𝐗𝒬t,θt,θ)xt,i𝐇t1(θ)θθt𝐇t(θ)|superscriptΔpredsubscript𝐗subscript𝒬𝑡subscript𝜃𝑡subscript𝑖subscript𝒬𝑡subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝑡subscript𝜃subscriptnormsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃subscriptnormsubscript𝜃subscript𝜃𝑡subscript𝐇𝑡subscript𝜃\displaystyle\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{t})\leq% \big{|}\sum_{i\in\mathcal{Q}_{t}}\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},% \theta_{t},\theta_{*})||x_{t,i}||_{\mathbf{H}_{t}^{-1}(\theta_{*})}||\theta_{*% }-\theta_{t}||_{\mathbf{H}_{t}(\theta_{*})}\big{|}roman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ | ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | | italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | (19)

It is not straight-forward to bound θθt𝐇t(θ)subscriptnormsubscript𝜃subscript𝜃𝑡subscript𝐇𝑡subscript𝜃||\theta_{*}-\theta_{t}||_{\mathbf{H}_{t}(\theta_{*})}| | italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT, we extend the self-concordance style relations from Faury et al. [2020] for the multinomial logit function which allow us to relate 𝐆t1(θt,θ)superscriptsubscript𝐆𝑡1subscript𝜃𝑡subscript𝜃\mathbf{G}_{t}^{-1}(\theta_{t},\theta_{*})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) and 𝐇t1(θt)superscriptsubscript𝐇𝑡1subscript𝜃𝑡\mathbf{H}_{t}^{-1}(\theta_{t})bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (or 𝐇t1(θ)superscriptsubscript𝐇𝑡1subscript𝜃\mathbf{H}_{t}^{-1}(\theta_{*})bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT )) to develop a bound on θθ𝐇t(θ)subscriptnormsubscript𝜃𝜃subscript𝐇𝑡subscript𝜃||\theta_{*}-\theta||_{\mathbf{H}_{t}(\theta_{*})}| | italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT.

Lemma 4.

For all θ1,θ2Θsubscript𝜃1subscript𝜃2Θ\theta_{1},\theta_{2}\,\in\,\Thetaitalic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Θ, the following inequalities hold:

𝐆t(θ1,θ2)(1+2S)1𝐇t(θ1),𝐆t(θ1,θ2)(1+2S)1𝐇t(θ2)formulae-sequencesucceeds-or-equalssubscript𝐆𝑡subscript𝜃1subscript𝜃2superscript12𝑆1subscript𝐇𝑡subscript𝜃1succeeds-or-equalssubscript𝐆𝑡subscript𝜃1subscript𝜃2superscript12𝑆1subscript𝐇𝑡subscript𝜃2\displaystyle\mathbf{G}_{t}(\theta_{1},\theta_{2})\succeq(1+2S)^{-1}\mathbf{H}% _{t}(\theta_{1}),\quad\mathbf{G}_{t}(\theta_{1},\theta_{2})\succeq(1+2S)^{-1}% \mathbf{H}_{t}(\theta_{2})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⪰ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⪰ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
Lemma 5.

For θtCt(δ)subscript𝜃𝑡subscript𝐶𝑡𝛿\theta_{t}\in C_{t}(\delta)italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), we have the following relation with probability at least 1δ1𝛿1-\delta1 - italic_δ: θtθ𝐇t(θ)2(1+2S)γt(δ).subscriptnormsubscript𝜃𝑡subscript𝜃subscript𝐇𝑡subscript𝜃212𝑆subscript𝛾𝑡𝛿||\theta_{t}-\theta_{*}||_{\mathbf{H}_{t}(\theta_{*})}\leq 2(1+2S)\gamma_{t}(% \delta).| | italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ 2 ( 1 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) .

Proofs of Lemma 4 and 5 have been deferred to A.3. Notice that Lemma 5 is a key result which characterizes worthiness of the confidence set Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ). Recall that γT(δ)=O(dlog(KT))subscript𝛾𝑇𝛿O𝑑𝐾𝑇\gamma_{T}(\delta)=\mathrm{O}(\sqrt{d\log(KT)})italic_γ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_δ ) = roman_O ( square-root start_ARG italic_d roman_log ( italic_K italic_T ) end_ARG ) (with a tuned λ𝜆\lambdaitalic_λ as in Corollary 2). Therefore, any θCt(δ)𝜃subscript𝐶𝑡𝛿\theta\,\in\,C_{t}(\delta)italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) is not too far from the optimal θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT under the norm induced by ||||𝐇t(θ)||\cdot||_{\mathbf{H}_{t}(\theta_{*})}| | ⋅ | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT. Now, we use Lemma 5 in Eq (19) to get:

Δpred(𝐗𝒬t,θt)2(1+2S)γt(δ)i𝒬t|αi(𝐗𝒬t,θ,θt)xt,i𝐇t1(θ)|.superscriptΔpredsubscript𝐗subscript𝒬𝑡subscript𝜃𝑡212𝑆subscript𝛾𝑡𝛿subscript𝑖subscript𝒬𝑡subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃subscript𝜃𝑡subscriptnormsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{t})\leq 2% (1+2S)\gamma_{t}(\delta)\sum_{i\in\mathcal{Q}_{t}}|\alpha_{i}(\mathbf{X}_{% \mathcal{Q}_{t}},\theta_{*},\theta_{t})||x_{t,i}||_{\mathbf{H}_{t}^{-1}(\theta% _{*})}|.roman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ≤ 2 ( 1 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | . (20)

The quantity αi(𝐗𝒬t,θt,θ)subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝑡subscript𝜃\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{t},\theta_{*})italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) as described in the Eq (16) is upper bounded in the following result

Lemma 6.

For the assortment chosen by the algorithm CB-MNL, 𝒬tsubscript𝒬𝑡\mathcal{Q}_{t}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as given by Eq (12) and any θCt(δ)𝜃subscript𝐶𝑡𝛿\theta\in C_{t}(\delta)italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) the following holds with probability at least 1δ1𝛿1-\delta1 - italic_δ: αi(𝐗𝒬t,θ,θ)μ˙i(𝐗𝒬tθ)+2(1+2S)Mγt(δ)xt,i𝐇t1(θ).subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃212𝑆𝑀subscript𝛾𝑡𝛿subscriptnormsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{*},\theta)\leq\dot{\mu}_{i}(% \mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})+2(1+2S)M\gamma_{t}(\delta)||x_{% t,i}||_{\mathbf{H}_{t}^{-1}(\theta_{*})}.italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) ≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + 2 ( 1 + 2 italic_S ) italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT .

We use the result of Lemma 6 in Eq (20) followed by an application of Lemma 4 and the relation i𝒬txt,i𝐇t1(θt)2κi𝒬txt,i𝐕t12subscript𝑖subscript𝒬𝑡subscriptsuperscriptnormsubscript𝑥𝑡𝑖2superscriptsubscript𝐇𝑡1subscript𝜃𝑡𝜅subscript𝑖subscript𝒬𝑡subscriptsuperscriptnormsubscript𝑥𝑡𝑖2superscriptsubscript𝐕𝑡1\sum_{i\in\mathcal{Q}_{t}}||x_{t,i}||^{2}_{\mathbf{H}_{t}^{-1}(\theta_{t})}% \leq\kappa\sum_{i\in\mathcal{Q}_{t}}||x_{t,i}||^{2}_{\mathbf{V}_{t}^{-1}}∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ italic_κ ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT from Assumption 2, to arrive at the statement of Lemma 3.

4.2 Regret calculation

The complete technical work is provided in A.4 and A.5. The key step to retrieve the upper bounds of Theorem 1 is to calculate T𝑇Titalic_T rounds summation of the prediction error as given in Eq (3). Compared to the previous literature Filippi et al. [2010], Li et al. [2017], Oh & Iyengar [2021], the term i𝒬tμ˙i(𝐗𝒬tθ)xt,i𝐇t1(θ)subscript𝑖subscript𝒬𝑡subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃subscriptnormsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\sum_{i\in\mathcal{Q}_{t}}\dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}% \theta_{*})||x_{t,i}||_{\mathbf{H}_{t}^{-1}(\theta_{*})}∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) | | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT is new here. Further, our treatment of this term is much simpler and straight-forward as compared to that in Faury et al. [2020]

4.3 Convex relaxation of the optimization step

Sections 4.14.2 provide an analytical framework for calculating the regret bounds given: (1) the confidence set (Eq (9)) with the guarantee that θCt(δ)subscript𝜃subscript𝐶𝑡𝛿\theta_{*}\,\in\,C_{t}(\delta)italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) with probability at least 1δ1𝛿1-\delta1 - italic_δ; (2) the assurance that the confidence set is small (Lemma 5). In order to re-use previously developed techniques, we show: (1) Et(δ)Ct(δ)subscript𝐶𝑡𝛿subscript𝐸𝑡𝛿E_{t}(\delta)\supseteq C_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ⊇ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) (see Eq (7)) and therefore θEt(δ)subscript𝜃subscript𝐸𝑡𝛿\theta_{*}\,\in\,E_{t}(\delta)italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) with probability at least 1δ1𝛿1-\delta1 - italic_δ (see Lemma 7; (2) an analog of Lemma 5 using Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) (see Lemma 8). The proof of Theorem 1 is therefore repeated while using Lemma 8, following steps as sketched in sections 4.14.2. The order dependence of the regret upper bound is retained (see Corollary 2).

Lemma 7.

Et(δ)Ct(δ)subscript𝐶𝑡𝛿subscript𝐸𝑡𝛿E_{t}(\delta)\supseteq C_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ⊇ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), therefore for any θCt(δ)𝜃subscript𝐶𝑡𝛿\theta\in C_{t}(\delta)italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), we also have θEt(δ)𝜃subscript𝐸𝑡𝛿\theta\,\in\,E_{t}(\delta)italic_θ ∈ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) (see Eq (7)).

The complete proof is provided in A.6. We highlight the usefulness of Lemma 7. Since all of set Ct(δ)subscript𝐶𝑡𝛿C_{t}(\delta)italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) lies within Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), the consequence of the concentration inequality also implies (t1,θEt(δ))1δformulae-sequencefor-all𝑡1subscript𝜃subscript𝐸𝑡𝛿1𝛿\mathbb{P}(\forall t\geq 1,\theta_{*}\in E_{t}(\delta))\geq 1-\deltablackboard_P ( ∀ italic_t ≥ 1 , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ) ≥ 1 - italic_δ.

Lemma 8.

Under the event θCt(δ)subscript𝜃subscript𝐶𝑡𝛿\theta_{*}\,\in\,C_{t}(\delta)italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), the following holds θEt(δ)for-all𝜃subscript𝐸𝑡𝛿\forall\,\theta\,\in\,E_{t}(\delta)∀ italic_θ ∈ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ):

θθ𝐇t(θ)(2+2S)γt(δ)+21+Sβt(δ).subscriptnorm𝜃subscript𝜃subscript𝐇𝑡subscript𝜃22𝑆subscript𝛾𝑡𝛿21𝑆subscript𝛽𝑡𝛿\displaystyle||\theta-\theta_{*}||_{\mathbf{H}_{t}(\theta_{*})}\leq(2+2S)% \gamma_{t}(\delta)+2\sqrt{1+S}\beta_{t}(\delta).| | italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ ( 2 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) + 2 square-root start_ARG 1 + italic_S end_ARG italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) .

When λt=O(dlog(Kt))subscript𝜆𝑡O𝑑𝐾𝑡\lambda_{t}=\mathrm{O}(d\log(Kt))italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_O ( italic_d roman_log ( italic_K italic_t ) ), then γt(δ)=O~(dlog(t))subscript𝛾𝑡𝛿~O𝑑𝑡\gamma_{t}(\delta)=\tilde{\mathrm{O}}(\sqrt{d\log(t)})italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) = over~ start_ARG roman_O end_ARG ( square-root start_ARG italic_d roman_log ( italic_t ) end_ARG ), βt(δ)=O~(dlog(t))subscript𝛽𝑡𝛿~O𝑑𝑡\beta_{t}(\delta)=\tilde{\mathrm{O}}(\sqrt{d\log(t)})italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) = over~ start_ARG roman_O end_ARG ( square-root start_ARG italic_d roman_log ( italic_t ) end_ARG ), and θθ𝐇t(θ)=O~(dlog(t))subscriptnorm𝜃subscript𝜃subscript𝐇𝑡subscript𝜃~O𝑑𝑡||\theta-\theta_{*}||_{\mathbf{H}_{t}(\theta_{*})}=\tilde{\mathrm{O}}(\sqrt{d% \log(t)})| | italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT = over~ start_ARG roman_O end_ARG ( square-root start_ARG italic_d roman_log ( italic_t ) end_ARG ).

The complete proof can be found in A.6.

5 Numerical experiments

In this section we compare the empirical performance of our proposed algorithm CB-MNL with the previous state of the art in the MNL contextual bandit literature: UCB-MNL[Oh & Iyengar, 2021] and TS-MNL[Oh & Iyengar, 2019] on artificial data. We focus on performance comparison for varying values of parameter κ𝜅\kappaitalic_κ, and show that our algorithm has a consistently superior performance for different κ𝜅\kappaitalic_κ values in Figure 2. This highlights the primary contribution of our theoretical analysis. Refer to A.8 for additional empirical analysis.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Comparison of cumulative regret as a function of time for varying κ𝜅\kappaitalic_κ ( left to right: κ\delTmuch-greater-than𝜅\del𝑇\kappa\gg\del{\sqrt{T}}italic_κ ≫ square-root start_ARG italic_T end_ARG, κ<\delT𝜅\del𝑇\kappa<\del{\sqrt{T}}italic_κ < square-root start_ARG italic_T end_ARG, and κ\delTmuch-less-than𝜅\del𝑇\kappa\ll\del{\sqrt{T}}italic_κ ≪ square-root start_ARG italic_T end_ARG)

For each experimental configuration, we consider a problem instance with inventory size N=15𝑁15N=15italic_N = 15, instance dimensions d=5𝑑5d=5italic_d = 5, maximum assortment size K=4𝐾4K=4italic_K = 4, and time horizon T=100𝑇100T=100italic_T = 100, averaged over 25252525 Monte Carlo simulation runs. θdsubscript𝜃superscript𝑑\theta_{*}\in\mathbb{R}^{d}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is a dlimit-from𝑑d-italic_d -dimensional random vector with each coordinate in [0,1]01[0,1][ 0 , 1 ], independently and uniformly distributed. The contexts follow a multivariate Gaussian distribution. The λ𝜆\lambdaitalic_λ parameter is manually tuned. Algorithm CB-MNL only knows the value of N,T,K,d𝑁𝑇𝐾𝑑N,T,K,ditalic_N , italic_T , italic_K , italic_d. In contrast, algorithms TS-MNLand UCB-MNL also need to know the value of κ𝜅\kappaitalic_κ for their implementation. We observe that that our algorithm CB-MNL has robust performance for varying values of κ𝜅\kappaitalic_κ.

6 Conclusion and discussion

In this work, we proposed an optimistic algorithm for learning under the MNL contextual bandit framework. Using techniques from Faury et al. [2020], we developed an improved technical analysis to deal with the non-linear nature of the MNL reward function. As a result, the leading term in our regret bound does not suffer from the problem-dependent parameter κ𝜅\kappaitalic_κ. This contribution is significant as κ𝜅\kappaitalic_κ can be very large (refer to Section 1.2). For example, for κ=O(T)𝜅O𝑇\kappa=\mathrm{O}(\sqrt{T})italic_κ = roman_O ( square-root start_ARG italic_T end_ARG ), the results of Oh & Iyengar [2021, 2019] suffer O~(T)~O𝑇\tilde{\mathrm{O}}(T)over~ start_ARG roman_O end_ARG ( italic_T ) regret, while our algorithm continues to enjoy O~(T)~O𝑇\tilde{\mathrm{O}}(\sqrt{T})over~ start_ARG roman_O end_ARG ( square-root start_ARG italic_T end_ARG ). Further, we also presented a tractable version of the decision-making step of the algorithm by constructing a convex relaxation of the confidence set.

Our result is still O(d)O𝑑\mathrm{O}(\sqrt{d})roman_O ( square-root start_ARG italic_d end_ARG ) away from the minimax lower of bound Chu et al. [2011] known for the linear contextual bandit. In the case of logistic bandits, Li et al. [2017] makes an i.i.d. assumption on the contexts to bridge the gap (however, they still retain the κ𝜅\kappaitalic_κ factor). Improving the worst-case regret bound by O(d)O𝑑\mathrm{O}(\sqrt{d})roman_O ( square-root start_ARG italic_d end_ARG ) while keeping κ𝜅\kappaitalic_κ as an additive term is an open problem. It may be possible to improve the dependence on κ𝜅\kappaitalic_κ by using a higher-order approximation for estimation error. Finding a lower bound on dependence κ𝜅\kappaitalic_κ is an interesting open problem and may require newer techniques than presented in this work.

Oh & Iyengar [2019] gave a Thompson sampling (TS) based learning strategy for the MNL contextual bandit. Thompson sampling approaches may not have to search the entire action space to take decisions as optimistic algorithms (such as ours) do. TS-based strategies are likely to have better empirical performance. Authors in Oh & Iyengar [2019] use a confidence set based analysis to bound the estimation error term. However, results in Oh & Iyengar [2019] suffer from the prohibitive scaling of the problem-dependent parameter κ𝜅\kappaitalic_κ that we have overcome here. Modifying our analysis for a TS-based learning strategy could bring together the best of both worlds.

References

  • Abbasi-Yadkori et al. [2011] Abbasi-Yadkori, Y., Pál, D., & Szepesvári, C. (2011). Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24, 2312–2320.
  • Abeille et al. [2021] Abeille, M., Faury, L., & Calauzènes, C. (2021). Instance-wise minimax-optimal algorithms for logistic bandits. In International Conference on Artificial Intelligence and Statistics (pp. 3691–3699). PMLR.
  • Agrawal et al. [2017] Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2017). Thompson sampling for the mnl-bandit. In Conference on Learning Theory (pp. 76–78). PMLR.
  • Agrawal et al. [2019] Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2019). Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research, 67, 1453–1485. doi:10.1287/opre.2018.1832.
  • Alfandari et al. [2021] Alfandari, L., Hassanzadeh, A., & Ljubić, I. (2021). An exact method for assortment optimization under the nested logit model. European Journal of Operational Research, 291, 830–845. doi:https://doi.org/10.1016/j.ejor.2020.12.007.
  • Amani & Thrampoulidis [2021] Amani, S., & Thrampoulidis, C. (2021). Ucb-based algorithms for multinomial logistic regression bandits. Advances in Neural Information Processing Systems, 34, 2913–2924.
  • Auer et al. [2002] Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, 47, 235–256.
  • Avadhanula [2019] Avadhanula, V. (2019). The MNL-Bandit Problem: Theory and Applications. Ph.D. thesis Columbia University.
  • Bach [2010] Bach, F. (2010). Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4, 384 – 414. URL: https://doi.org/10.1214/09-EJS521. doi:10.1214/09-EJS521.
  • Chen et al. [2020] Chen, X., Wang, Y., & Zhou, Y. (2020). Dynamic assortment optimization with changing contextual information. Journal of Machine Learning Research, 21, 1–44.
  • Cheung & Simchi-Levi [2017] Cheung, W. C., & Simchi-Levi, D. (2017). Thompson sampling for online personalized assortment optimization problems with multinomial logit choice models. Available at SSRN 3075658, .
  • Chu et al. [2011] Chu, W., Li, L., Reyzin, L., & Schapire, R. (2011). Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (pp. 208–214).
  • Dani et al. [2008] Dani, V., Hayes, T. P., & Kakade, S. M. (2008). Stochastic linear optimization under bandit feedback. In Conference on Learning Theory.
  • Faury et al. [2020] Faury, L., Abeille, M., Calauzènes, C., & Fercoq, O. (2020). Improved optimistic algorithms for logistic bandits. In International Conference on Machine Learning (pp. 3052–3060). PMLR.
  • Feldman et al. [2018] Feldman, J., Zhang, D., Liu, X., & Zhang, N. (2018). Taking assortment optimization from theory to practice: Evidence from large field experiments on alibaba. Available at SSRN, .
  • Filippi et al. [2010] Filippi, S., Cappe, O., Garivier, A., & Szepesvári, C. (2010). Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems (pp. 586–594).
  • Flores et al. [2019] Flores, A., Berbeglia, G., & Van Hentenryck, P. (2019). Assortment optimization under the sequential multinomial logit model. European Journal of Operational Research, 273, 1052–1064. doi:https://doi.org/10.1016/j.ejor.2018.08.047.
  • Gao & Pavel [2017] Gao, B., & Pavel, L. (2017). On the properties of the softmax function with application in game theory and reinforcement learning. arXiv preprint arXiv:1704.00805, .
  • Grant & Szechtman [2021] Grant, J. A., & Szechtman, R. (2021). Filtered poisson process bandit on a continuum. European Journal of Operational Research, 295, 575–586. doi:https://doi.org/10.1016/j.ejor.2021.03.033.
  • Kök & Fisher [2007] Kök, A. G., & Fisher, M. L. (2007). Demand estimation and assortment optimization under substitution: Methodology and application. Operations Research, 55, 1001–1021. doi:https://doi.org/10.1287/opre.1070.0409.
  • Lehmann & Casella [2006] Lehmann, E. L., & Casella, G. (2006). Theory of point estimation. Springer Science & Business Media.
  • Li et al. [2017] Li, L., Lu, Y., & Zhou, D. (2017). Provably optimal algorithms for generalized linear contextual bandits. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 2071–2080).
  • Oh & Iyengar [2019] Oh, M.-h., & Iyengar, G. (2019). Thompson sampling for multinomial logit contextual bandits. In Advances in Neural Information Processing Systems (pp. 3151–3161).
  • Oh & Iyengar [2021] Oh, M.-h., & Iyengar, G. (2021). Multinomial logit contextual bandits: Provable optimality and practicality. In Proceedings of the AAAI Conference on Artificial Intelligence (pp. 9205–9213). volume 35.
  • Ou et al. [2018] Ou, M., Li, N., Zhu, S., & Jin, R. (2018). Multinomial logit bandit with linear utility functions. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (pp. 2602–2608).
  • Perivier & Goyal [2022] Perivier, N., & Goyal, V. (2022). Dynamic pricing and assortment under a contextual mnl demand. Advances in Neural Information Processing Systems, 35, 3461–3474.
  • Rusmevichientong et al. [2010] Rusmevichientong, P., Shen, Z.-J. M., & Shmoys, D. B. (2010). Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Operations research, 58, 1666–1680. doi:https://doi.org/10.1287/opre.1100.0866.
  • Rusmevichientong & Tsitsiklis [2010] Rusmevichientong, P., & Tsitsiklis, J. N. (2010). Linearly parameterized bandits. Mathematics of Operations Research, 35, 395–411. doi:https://doi.org/10.1287/moor.1100.0446.
  • Sauré & Zeevi [2013] Sauré, D., & Zeevi, A. (2013). Optimal dynamic assortment planning with demand learning. Manufacturing & Service Operations Management, 15, 387–404. doi:https://doi.org/10.1287/msom.2013.0429.
  • Timonina-Farkas et al. [2020] Timonina-Farkas, A., Katsifou, A., & Seifert, R. W. (2020). Product assortment and space allocation strategies to attract loyal and non-loyal customers. European Journal of Operational Research, 285, 1058–1076. doi:https://doi.org/10.1016/j.ejor.2020.02.019.
  • Wang et al. [2020] Wang, X., Zhao, X., & Liu, B. (2020). Design and pricing of extended warranty menus based on the multinomial logit choice model. European Journal of Operational Research, 287, 237–250. doi:https://doi.org/10.1016/j.ejor.2020.05.012.
  • Xu et al. [2021] Xu, J., Chen, L., & Tang, O. (2021). An online algorithm for the risk-aware restless bandit. European Journal of Operational Research, 290, 622–639. doi:https://doi.org/10.1016/j.ejor.2020.08.028.
  • Zhang & Lin [2015] Zhang, Y., & Lin, X. (2015). Disco: Distributed optimization for self-concordant empirical loss. In International conference on machine learning (pp. 362–370).

Appendix A Appendix

A.1 Confidence set

In this section, we justify the design of confidence set defined in Eq (9). This particular choice is based on the following concentration inequality for self-normalized vectorial martingales.

Theorem 9.

Appears as Theorem 4 in Abeille et al. [2021] Let {t}t=1superscriptsubscriptsubscript𝑡𝑡1\{\mathcal{F}_{t}\}_{t=1}^{\infty}{ caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a filtration. Let {xt}t=1superscriptsubscriptsubscript𝑥𝑡𝑡1\{x_{t}\}_{t=1}^{\infty}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a stochastic process in 2(d)subscript2𝑑\mathcal{B}_{2}(d)caligraphic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_d ) such that xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT measurable. Let {εt}t=2superscriptsubscriptsubscript𝜀𝑡𝑡2\{\varepsilon_{t}\}_{t=2}^{\infty}{ italic_ε start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a martingale difference sequence such that εt+1subscript𝜀𝑡1\varepsilon_{t+1}italic_ε start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is t+1subscript𝑡1\mathcal{F}_{t+1}caligraphic_F start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT measurable. Furthermore, assume that conditionally on tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT we have |εt+1|1subscript𝜀𝑡11|\varepsilon_{t+1}|\leq 1| italic_ε start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | ≤ 1 almost surely, and note σt2𝔼[εt+12|t]superscriptsubscript𝜎𝑡2𝔼delimited-[]conditionalsuperscriptsubscript𝜀𝑡12subscript𝑡\sigma_{t}^{2}\coloneqq\mathbb{E}\left[\varepsilon_{t+1}^{2}|\mathcal{F}_{t}\right]italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≔ blackboard_E [ italic_ε start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ]. Let {λt}t=1superscriptsubscriptsubscript𝜆𝑡𝑡1\{\lambda_{t}\}_{t=1}^{\infty}{ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a predictable sequence of non-negative scalars. Define:

𝐇ts=1t1σs2xsxsT+λt𝐈d,Sts=1t1εs+1xs.formulae-sequencesubscript𝐇𝑡superscriptsubscript𝑠1𝑡1superscriptsubscript𝜎𝑠2subscript𝑥𝑠superscriptsubscript𝑥𝑠𝑇subscript𝜆𝑡subscript𝐈𝑑subscript𝑆𝑡superscriptsubscript𝑠1𝑡1subscript𝜀𝑠1subscript𝑥𝑠\displaystyle\mathbf{H}_{t}\coloneqq\sum_{s=1}^{t-1}\sigma_{s}^{2}x_{s}x_{s}^{% T}+\lambda_{t}\mathbf{I}_{d},\qquad S_{t}\coloneqq\sum_{s=1}^{t-1}\varepsilon_% {s+1}x_{s}.bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT italic_s + 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT .

Then for any δ(0,1]𝛿01\delta\in(0,1]italic_δ ∈ ( 0 , 1 ]:

(t1,St𝐇t1λt2+2λtlog(det(𝐇t)12λtd2δ)+2λtdlog(2))δ.formulae-sequence𝑡1subscriptdelimited-∥∥subscript𝑆𝑡superscriptsubscript𝐇𝑡1subscript𝜆𝑡22subscript𝜆𝑡superscriptsubscript𝐇𝑡12superscriptsubscript𝜆𝑡𝑑2𝛿2subscript𝜆𝑡𝑑2𝛿\displaystyle\mathbb{P}\Bigg{(}\exists t\geq 1,\,\left\lVert S_{t}\right\rVert% _{\mathbf{H}_{t}^{-1}}\!\geq\!\frac{\sqrt{\lambda_{t}}}{2}\!+\!\frac{2}{\sqrt{% \lambda_{t}}}\log\!\left(\frac{\det\left(\mathbf{H}_{t}\right)^{\frac{1}{2}}\!% \lambda_{t}^{-\frac{d}{2}}}{\delta}\right)+\frac{2}{\sqrt{\lambda_{t}}}d\log(2% )\Bigg{)}\leq\delta.blackboard_P ( ∃ italic_t ≥ 1 , ∥ italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≥ divide start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 end_ARG + divide start_ARG 2 end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( divide start_ARG roman_det ( bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG italic_d end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG ) + divide start_ARG 2 end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG italic_d roman_log ( 2 ) ) ≤ italic_δ .

Theorem 9 cannot be directly used in our setting as in the MNL model the actual rewards (for any time step s𝑠sitalic_s) {rs,i}iQssubscriptsubscript𝑟𝑠𝑖𝑖subscript𝑄𝑠\{r_{s,i}\}_{i\in Q_{s}}{ italic_r start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ italic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT are correlated. Hence a concentration almost identical (varying only in minor constant modification) to Theorem 9, appearing as Theorem C.6 in Perivier & Goyal [2022] is used instead.

Lemma 10 (confidence bounds for multinomial logistic rewards).

With θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as the regularized maximum log-likelihood estimate as defined in Eq (5), the following follows with probability at least 1δ1𝛿1-\delta1 - italic_δ:

t 1,\enVertgt(θ^t)gt(θ)𝐇t1γt\delδformulae-sequencefor-all𝑡1\enVertsubscript𝑔𝑡subscript^𝜃𝑡subscript𝑔𝑡subscriptsubscript𝜃superscriptsubscript𝐇𝑡1subscript𝛾𝑡\del𝛿\displaystyle\forall t\,\geq\,1,\quad\enVert{g_{t}(\hat{\theta}_{t})-g_{t}(% \theta_{*})}_{\mathbf{H}_{t}^{-1}}\leq\gamma_{t}\del{\delta}∀ italic_t ≥ 1 , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ

where 𝐇t\delθ1=s=1t1i𝒬sμ˙i\del𝐗𝒬sθ1xs,ixs,i+λ𝐈dsubscript𝐇𝑡\delsubscript𝜃1superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖top𝜆subscript𝐈𝑑\mathbf{H}_{t}\del{\theta_{1}}=\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{% \mu}_{i}\del{\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1}}x_{s,i}x_{s,i}^{% \top}+\lambda\mathbf{I}_{d}bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and gt()subscript𝑔𝑡g_{t}(\cdot)italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) is defined in Eq (6).

Proof.

θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the maximizer of the regularized log-likelihood:

tλt(θ)superscriptsubscript𝑡subscript𝜆𝑡𝜃\displaystyle\mathcal{L}_{t}^{\lambda_{t}}(\theta)caligraphic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_θ ) =s=1t1i𝒬srs,ilog(μi(𝐗𝒬sθ))λt2\enVertθ22,absentsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑟𝑠𝑖subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃subscript𝜆𝑡2\enVertsuperscriptsubscript𝜃22\displaystyle=\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}r_{s,i}\log\left(\mu_{% i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta)\right)-\frac{\lambda_{t}}{2}% \enVert{\theta}_{2}^{2},= ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT roman_log ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) ) - divide start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where μi(𝐗𝒬sθ)subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta)italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) is given by Eq (1) as exs,iθ1+j𝒬sexs,jθsuperscript𝑒subscriptsuperscript𝑥top𝑠𝑖𝜃1subscript𝑗subscript𝒬𝑠superscript𝑒subscriptsuperscript𝑥top𝑠𝑗𝜃\frac{e^{x^{\top}_{s,i}\theta}}{1+\sum_{j\in\mathcal{Q}_{s}}e^{x^{\top}_{s,j}% \theta}}divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_θ end_POSTSUPERSCRIPT end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT italic_θ end_POSTSUPERSCRIPT end_ARG. Solving for θtλt=0subscript𝜃superscriptsubscript𝑡subscript𝜆𝑡0\nabla_{\theta}\mathcal{L}_{t}^{\lambda_{t}}=0∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 0, we obtain:

s=1t1i𝒬sμi(𝐗𝒬sθ)xs,i+λtθ^t=s=1t1i𝒬srs,ixs,isuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃subscript𝑥𝑠𝑖subscript𝜆𝑡subscript^𝜃𝑡superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑟𝑠𝑖subscript𝑥𝑠𝑖\displaystyle\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\mu_{i}(\mathbf{X}_{% \mathcal{Q}_{s}}^{\top}\theta)x_{s,i}+\lambda_{t}\hat{\theta}_{t}=\sum_{s=1}^{% t-1}\sum_{i\in\mathcal{Q}_{s}}r_{s,i}x_{s,i}∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT

This result, combined with the definition of gt(θ)=s=1t1i𝒬sμi(𝐗𝒬sθ)xs,i+λtθsubscript𝑔𝑡subscript𝜃superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃subscript𝑥𝑠𝑖subscript𝜆𝑡subscript𝜃g_{t}(\theta_{*})=\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\mu_{i}(\mathbf{X}% _{\mathcal{Q}_{s}}^{\top}\theta_{*})x_{s,i}\\ +\lambda_{t}\theta_{*}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT yields:

gt(θ^t)gt(θ)subscript𝑔𝑡subscript^𝜃𝑡subscript𝑔𝑡subscript𝜃\displaystyle g_{t}(\hat{\theta}_{t})-g_{t}(\theta_{*})italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) =s=1t1i𝒬sεs,ixs,iλtθabsentsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝜀𝑠𝑖subscript𝑥𝑠𝑖subscript𝜆𝑡subscript𝜃\displaystyle=\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\varepsilon_{s,i}x_{s,% i}-\lambda_{t}\theta_{*}= ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT
=St,Kλtθabsentsubscript𝑆𝑡𝐾subscript𝜆𝑡subscript𝜃\displaystyle=S_{t,K}-\lambda_{t}\theta_{*}= italic_S start_POSTSUBSCRIPT italic_t , italic_K end_POSTSUBSCRIPT - italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT

where we denoted εs,irs,iμi(𝐗𝒬sθ)subscript𝜀𝑠𝑖subscript𝑟𝑠𝑖subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃\varepsilon_{s,i}\coloneqq r_{s,i}-\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}% \theta_{*})italic_ε start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT ≔ italic_r start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) for all s1𝑠1s\geq 1italic_s ≥ 1 and i[K]𝑖delimited-[]𝐾i\in[K]italic_i ∈ [ italic_K ] and St,Ks=1t1i𝒬sεs,ixs,isubscript𝑆𝑡𝐾superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝜀𝑠𝑖subscript𝑥𝑠𝑖S_{t,K}\coloneqq\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\varepsilon_{s,i}x_{% s,i}italic_S start_POSTSUBSCRIPT italic_t , italic_K end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT for all t1𝑡1t\geq 1italic_t ≥ 1. For any λt1subscript𝜆𝑡1\lambda_{t}\geq 1italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ 1, from the definition of 𝐇t(θ)subscript𝐇𝑡subscript𝜃\mathbf{H}_{t}(\theta_{*})bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) it follows that 𝐇t1(θ)𝐈dprecedes-or-equalssubscriptsuperscript𝐇1𝑡subscript𝜃subscript𝐈𝑑\mathbf{H}^{-1}_{t}(\theta_{*})\preceq\mathbf{I}_{d}bold_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ⪯ bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Hence, \enVertλtθ𝐇t1(θ)\enVertλtθ2\enVertsubscript𝜆𝑡subscriptsubscript𝜃subscriptsuperscript𝐇1𝑡subscript𝜃\enVertsubscript𝜆𝑡subscriptsubscript𝜃2\enVert{\lambda_{t}\theta_{*}}_{\mathbf{H}^{-1}_{t}(\theta_{*})}\leq\enVert{% \lambda_{t}\theta_{*}}_{2}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Later in the proof of Theorem 1, we present our choice of λtsubscript𝜆𝑡\lambda_{t}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which always ensures λt1subscript𝜆𝑡1\lambda_{t}\geq 1italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ 1.

\enVertgt(θ^t)gt(θ)𝐇t1(θ)\enVertSt,K𝐇t1(θ)+λtS\enVertsubscript𝑔𝑡subscript^𝜃𝑡subscript𝑔𝑡subscriptsubscript𝜃superscriptsubscript𝐇𝑡1subscript𝜃\enVertsubscriptsubscript𝑆𝑡𝐾superscriptsubscript𝐇𝑡1subscript𝜃subscript𝜆𝑡𝑆\displaystyle\enVert{g_{t}(\hat{\theta}_{t})-g_{t}(\theta_{*})}_{\mathbf{H}_{t% }^{-1}(\theta_{*})}\leq\enVert{S_{t,K}}_{\mathbf{H}_{t}^{-1}(\theta_{*})}+% \sqrt{\lambda_{t}}Sitalic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ italic_S start_POSTSUBSCRIPT italic_t , italic_K end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT + square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_S (21)

Conditioned on the filtration set t,isubscript𝑡𝑖\mathcal{F}_{t,i}caligraphic_F start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT (see Section 2.2 to review the definition of the filtration set), εs,isubscript𝜀𝑠𝑖\varepsilon_{s,i}italic_ε start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT is a martingale difference is bounded by 1111 as we assume the maximum reward that is accrued at any round is upper bounded by 1111. We calculate for all s1𝑠1s\geq 1italic_s ≥ 1:

𝔼[εs,i2|t]=𝔼[\delrs,iμi(𝐗𝒬sθ)2|t]𝔼delimited-[]conditionalsubscriptsuperscript𝜀2𝑠𝑖subscript𝑡𝔼delimited-[]\delsubscript𝑟𝑠𝑖conditionalsubscript𝜇𝑖superscriptsuperscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃2subscript𝑡\displaystyle\mathbb{E}\left[\varepsilon^{2}_{s,i}\big{|}\mathcal{F}_{t}\right% ]=\mathbb{E}\left[\del{r_{s,i}-\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}% \theta_{*})}^{2}\bigg{|}\mathcal{F}_{t}\right]blackboard_E [ italic_ε start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = blackboard_E [ italic_r start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ]
=\displaystyle== 𝕍[rs,i|t]=μi(𝐗𝒬sθ)\del1μi(𝐗𝒬sθ).𝕍delimited-[]conditionalsubscript𝑟𝑠𝑖subscript𝑡subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃\del1subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃\displaystyle\mathbb{V}\left[r_{s,i}|\mathcal{F}_{t}\right]=\mu_{i}(\mathbf{X}% _{\mathcal{Q}_{s}}^{\top}\theta_{*})\del{1-\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}% }^{\top}\theta_{*})}.blackboard_V [ italic_r start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT | caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) 1 - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) . (22)

Also from Remark 3, we have :

μ˙i(𝐗𝒬sθ)=μi(𝐗𝒬sθ)\del1μi(𝐗𝒬sθ).subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃\del1subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃\dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{*})=\mu_{i}(\mathbf{X% }_{\mathcal{Q}_{s}}^{\top}\theta_{*})\del{1-\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s% }}^{\top}\theta_{*})}.over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) 1 - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) .

Therefore setting Htsubscript𝐻𝑡{H}_{t}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as 𝐇t(θ)=s=1t1i𝒬sμ˙i(𝐗𝒬sθ)xs,ixs,i+λt𝐈dsubscript𝐇𝑡subscript𝜃superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\mathbf{H}_{t}(\theta_{*})=\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{\mu}% _{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{*})x_{s,i}x_{s,i}^{\top}+% \lambda_{t}\mathbf{I}_{d}bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and Utsubscript𝑈𝑡U_{t}italic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as St,Ksubscript𝑆𝑡𝐾S_{t,K}italic_S start_POSTSUBSCRIPT italic_t , italic_K end_POSTSUBSCRIPT we invoke an instance of Theorem C.6 in Perivier & Goyal [2022] to obtain:

1δ1𝛿absent\displaystyle 1-\delta\leq1 - italic_δ ≤ (t1,\enVertSt𝐇t1(θ)λt2+2λtlog(2ddet(𝐇t(θ))1/2λtd/2δ)\displaystyle\mathbb{P}\left(\forall t\geq 1,\enVert{S_{t}}_{\mathbf{H}_{t}^{-% 1}(\theta_{*})}\leq\frac{\sqrt{\lambda_{t}}}{2}+\frac{2}{\sqrt{\lambda_{t}}}% \log\left(\frac{2^{d}\det(\mathbf{H}_{t}(\theta_{*}))^{1/2}\lambda_{t}^{-d/2}}% {\delta}\right)\right.blackboard_P ( ∀ italic_t ≥ 1 , italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ divide start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 end_ARG + divide start_ARG 2 end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( divide start_ARG 2 start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_det ( bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG )
+2dλtlog(2))\displaystyle+\left.\frac{2d}{\sqrt{\lambda_{t}}}\log(2)\right)+ divide start_ARG 2 italic_d end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( 2 ) )

We simplify det(𝐇t(θ))subscript𝐇𝑡subscript𝜃\det(\mathbf{H}_{t}(\theta_{*}))roman_det ( bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ), using the fact that the multinomial logistic function is L𝐿Litalic_L-Lipschitz (see Assumption 2):

det(𝐇t(θ))=subscript𝐇𝑡subscript𝜃absent\displaystyle\det(\mathbf{H}_{t}(\theta_{*}))=roman_det ( bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ) = det(s=1t1i𝒬sμ˙i(𝐗𝒬sθ)xs,ixs,i+λt𝐈d)superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\displaystyle\det\left(\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{\mu}_{i}% (\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{*})x_{s,i}x_{s,i}^{\top}+\lambda_{% t}\mathbf{I}_{d}\right)roman_det ( ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT )
\displaystyle\leq Lddet(s=1t1i𝒬sxs,ixs,i+λtL𝐈d).superscript𝐿𝑑superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡𝐿subscript𝐈𝑑\displaystyle L^{d}\det\left(\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}x_{s,i}% x_{s,i}^{\top}+\frac{\lambda_{t}}{L}\mathbf{I}_{d}\right).italic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_det ( ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_L end_ARG bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) .

Further, using Lemma 18 and using \enVertxs,i21\enVertsubscriptsubscript𝑥𝑠𝑖21\enVert{x_{s,i}}_{2}\leq 1italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 we write:

Lddet(s=1t1i𝒬sxs,ixs,i+λtL𝐈d)(λt+LKtd)d.superscript𝐿𝑑superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡𝐿subscript𝐈𝑑superscriptsubscript𝜆𝑡𝐿𝐾𝑡𝑑𝑑\displaystyle L^{d}\det\left(\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}x_{s,i}% x_{s,i}^{\top}+\frac{\lambda_{t}}{L}\mathbf{I}_{d}\right)\leq\left(\lambda_{t}% +\frac{LKt}{d}\right)^{d}.italic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_det ( ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + divide start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_L end_ARG bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) ≤ ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG italic_L italic_K italic_t end_ARG start_ARG italic_d end_ARG ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

This we simplify Eq (10) as:

1δ1𝛿absent\displaystyle 1-\delta\leq1 - italic_δ ≤ (t1,\enVertSt𝐇t1(θ)λt2+2λtlog((λt+LKt/d)d/2λtd/2δ)\displaystyle\mathbb{P}\left(\forall t\geq 1,\enVert{S_{t}}_{\mathbf{H}_{t}^{-% 1}(\theta_{*})}\leq\frac{\sqrt{\lambda_{t}}}{2}+\frac{2}{\sqrt{\lambda_{t}}}% \log\left(\frac{\left(\lambda_{t}+LKt/d\right)^{d/2}\lambda_{t}^{-d/2}}{\delta% }\right)\right.blackboard_P ( ∀ italic_t ≥ 1 , italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ divide start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 end_ARG + divide start_ARG 2 end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( divide start_ARG ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L italic_K italic_t / italic_d ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG )
+2dλtlog(2))\displaystyle+\left.\frac{2d}{\sqrt{\lambda_{t}}}\log(2)\right)+ divide start_ARG 2 italic_d end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( 2 ) )
(t1,\enVertSt𝐇t1(θ)λt2+2λtlog((1+LKtλtd)d/2δ)\displaystyle\leq\mathbb{P}\left(\forall t\geq 1,\enVert{S_{t}}_{\mathbf{H}_{t% }^{-1}(\theta_{*})}\leq\frac{\sqrt{\lambda_{t}}}{2}+\frac{2}{\sqrt{\lambda_{t}% }}\log\left(\frac{\left(1+\frac{LKt}{\lambda_{t}d}\right)^{d/2}}{\delta}\right% )\right.≤ blackboard_P ( ∀ italic_t ≥ 1 , italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ divide start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG start_ARG 2 end_ARG + divide start_ARG 2 end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( divide start_ARG ( 1 + divide start_ARG italic_L italic_K italic_t end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d end_ARG ) start_POSTSUPERSCRIPT italic_d / 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG )
+2dλtlog(2))\displaystyle+\left.\frac{2d}{\sqrt{\lambda_{t}}}\log(2)\right)+ divide start_ARG 2 italic_d end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_ARG roman_log ( 2 ) )
=(t1,\enVertSt𝐇t1(θ)γt(δ)λtS)absentformulae-sequencefor-all𝑡1\enVertsubscriptsubscript𝑆𝑡superscriptsubscript𝐇𝑡1subscript𝜃subscript𝛾𝑡𝛿subscript𝜆𝑡𝑆\displaystyle=\mathbb{P}\left(\forall t\geq 1,\enVert{S_{t}}_{\mathbf{H}_{t}^{% -1}(\theta_{*})}\leq\gamma_{t}(\delta)-\sqrt{\lambda_{t}}S\right)= blackboard_P ( ∀ italic_t ≥ 1 , italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) - square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_S ) (24)

Combining Eq ((21)) and Eq ((24)) yields:

(t1,\enVertgt(θ^t)gt(θ)𝐇t1(θ)γt(δ))formulae-sequencefor-all𝑡1\enVertsubscript𝑔𝑡subscript^𝜃𝑡subscript𝑔𝑡subscriptsubscript𝜃superscriptsubscript𝐇𝑡1subscript𝜃subscript𝛾𝑡𝛿\displaystyle\mathbb{P}\left(\forall t\geq 1,\,\enVert{g_{t}(\hat{\theta}_{t})% -g_{t}(\theta_{*})}_{\mathbf{H}_{t}^{-1}(\theta_{*})}\leq\gamma_{t}(\delta)\right)blackboard_P ( ∀ italic_t ≥ 1 , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) )
(t1,\enVertSt𝐇t1(θ)+λtSγt(δ))absentformulae-sequencefor-all𝑡1\enVertsubscriptsubscript𝑆𝑡superscriptsubscript𝐇𝑡1subscript𝜃subscript𝜆𝑡𝑆subscript𝛾𝑡𝛿\displaystyle\geq\mathbb{P}\left(\forall t\geq 1,\,\enVert{S_{t}}_{\mathbf{H}_% {t}^{-1}(\theta_{*})}+\sqrt{\lambda_{t}}S\leq\gamma_{t}(\delta)\right)≥ blackboard_P ( ∀ italic_t ≥ 1 , italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT + square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_S ≤ italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) )
1δ.absent1𝛿\displaystyle\geq 1-\delta.≥ 1 - italic_δ .

This completes the proof. ∎

It is insightful to compare Theorem 9 with Theorem 1 of Abbasi-Yadkori et al. [2011]. The later is re-stated below:

Theorem 11.

Let {}t=0superscriptsubscript𝑡0\{\mathcal{F}\}_{t=0}^{\infty}{ caligraphic_F } start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a filtration. Let {η}t=1superscriptsubscript𝜂𝑡1\{\eta\}_{t=1}^{\infty}{ italic_η } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be a real-valued stochastic process such that ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is tsubscript𝑡\mathcal{F}_{t}caligraphic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT-measurable and ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is conditionally R𝑅Ritalic_R-sub-Gaussian for some R0𝑅0R\geq 0italic_R ≥ 0, i.e

λt,𝔼\sbrexp(λtηt)t1exp\delλt2R22.formulae-sequencefor-allsubscript𝜆𝑡conditional𝔼\sbrsubscript𝜆𝑡subscript𝜂𝑡subscript𝑡1\delsuperscriptsubscript𝜆𝑡2superscript𝑅22\forall\,\lambda_{t}\,\in\mathbb{R},\qquad\mathbb{E}\sbr{\exp(\lambda_{t}\eta_% {t})\mid\mathcal{F}_{t-1}}\leq\exp\del{\frac{\lambda_{t}^{2}R^{2}}{2}}.∀ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R , blackboard_E roman_exp ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∣ caligraphic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ≤ roman_exp divide start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG .

Let {xt}t=1superscriptsubscriptsubscript𝑥𝑡𝑡1\{x_{t}\}_{t=1}^{\infty}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT be an dlimit-fromsuperscript𝑑\mathbb{R}^{d}-blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT -valued stochastic process such that Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is t1subscript𝑡1\mathcal{F}_{t-1}caligraphic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT-measurable. Assume 𝐕𝐕\mathbf{V}bold_V is a d×d𝑑𝑑d\times ditalic_d × italic_d positive definite matrix. For any t0𝑡0t\geq 0italic_t ≥ 0, define:

𝐕¯t=𝐕+s=1txsxs,St=s=1tηsxs.formulae-sequencesubscript¯𝐕𝑡𝐕superscriptsubscript𝑠1𝑡subscript𝑥𝑠superscriptsubscript𝑥𝑠topsubscript𝑆𝑡superscriptsubscript𝑠1𝑡subscript𝜂𝑠subscript𝑥𝑠\overline{\mathbf{V}}_{t}=\mathbf{V}+\sum_{s=1}^{t}x_{s}x_{s}^{\top},\qquad% \quad S_{t}=\sum_{s=1}^{t}\eta_{s}x_{s}.over¯ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_V + ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_η start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT .

Then, for any δ>0𝛿0\delta>0italic_δ > 0, with probability at least 1δ1𝛿1-\delta1 - italic_δ. for all t0𝑡0t\geq 0italic_t ≥ 0,

St𝐕¯t12Rlog\deldet(𝐕¯)1/2det(𝐕)1/2δ.subscriptnormsubscript𝑆𝑡superscriptsubscript¯𝐕𝑡12𝑅\delsuperscript¯𝐕12superscript𝐕12𝛿||S_{t}||_{\overline{\mathbf{V}}_{t}^{-1}}\leq 2R\log\del{\frac{\det(\overline% {\mathbf{V}})^{-\nicefrac{{1}}{{2}}}\det(\mathbf{V})^{-\nicefrac{{1}}{{2}}}}{% \delta}}.| | italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT over¯ start_ARG bold_V end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≤ 2 italic_R roman_log divide start_ARG roman_det ( over¯ start_ARG bold_V end_ARG ) start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_det ( bold_V ) start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_δ end_ARG .

Theorem 11 makes an uniform sub-Gaussian assumption and unlike Theorem 9 does not take into account local variance information.

A.2 Local information preserving norm

Deviating from the previous analyses as in Filippi et al. [2010], Li et al. [2017], we describe norm which preserves the local information The matrix 𝐗𝒬ssubscript𝐗subscript𝒬𝑠\mathbf{X}_{\mathcal{Q}_{s}}bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the design matrix composed of the contexts xs,1,xs,2,,xs,Ksubscript𝑥𝑠1subscript𝑥𝑠2subscript𝑥𝑠𝐾x_{s,1},x_{s,2},\cdots,x_{s,K}italic_x start_POSTSUBSCRIPT italic_s , 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_s , 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_s , italic_K end_POSTSUBSCRIPT received at time step s𝑠sitalic_s as its columns. The expected reward due to the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT item in the assortment is given by:

μi(𝐗𝒬sθ)=exs,iθ1+j𝒬sexs,jθ.subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃superscript𝑒subscriptsuperscript𝑥top𝑠𝑖𝜃1subscript𝑗subscript𝒬𝑠superscript𝑒subscriptsuperscript𝑥top𝑠𝑗𝜃\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta)=\frac{e^{x^{\top}_{s,i}% \theta}}{1+\sum_{j\in\mathcal{Q}_{s}}e^{x^{\top}_{s,j}\theta}}.italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_θ end_POSTSUPERSCRIPT end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT italic_θ end_POSTSUPERSCRIPT end_ARG .

Further, we consider the following integral:

ν=01μ˙i\delν𝐗𝒬sθ2+\del1ν𝐗𝒬sθ1dνsuperscriptsubscript𝜈01subscript˙𝜇𝑖\del𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃2\del1𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1𝑑𝜈\displaystyle\int_{\nu=0}^{1}\dot{\mu}_{i}\del{\nu\mathbf{X}_{\mathcal{Q}_{s}}% ^{\top}\theta_{2}+\del{1-\nu}\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1}}% \cdot d\nu∫ start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 - italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_d italic_ν =xs,iθ1xs,iθ21xs,i(θ2θ1)μ˙i(ti)𝑑ti,absentsuperscriptsubscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃21superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2subscript𝜃1subscript˙𝜇𝑖subscript𝑡𝑖differential-dsubscript𝑡𝑖\displaystyle=\int_{x_{s,i}^{\top}\theta_{1}}^{x_{s,i}^{\top}\theta_{2}}\frac{% 1}{x_{s,i}^{\top}(\theta_{2}-\theta_{1})}\dot{\mu}_{i}(t_{i})\cdot dt_{i},= ∫ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (25)

where μ˙isubscript˙𝜇𝑖\dot{\mu}_{i}over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the partial derivative of μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the direction of the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT component and xs,iθ1xs,iθ2μ˙i(ti)𝑑tisuperscriptsubscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2subscript˙𝜇𝑖subscript𝑡𝑖differential-dsubscript𝑡𝑖\int_{x_{s,i}^{\top}\theta_{1}}^{x_{s,i}^{\top}\theta_{2}}\dot{\mu}_{i}(t_{i})% \cdot dt_{i}∫ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents integration of μ˙()˙𝜇\dot{\mu}(\cdot)over˙ start_ARG italic_μ end_ARG ( ⋅ ) with respect to the coordinate tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( hence the limits of the integration only consider change in the coordinate tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). For notation purposes which would become clear later, we define:

αi(𝐗𝒬s,θ1,θ2)xs,i(θ2θ1)subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃1subscript𝜃2superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2subscript𝜃1\displaystyle\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{s}},\theta_{1},\theta_{2})x_{% s,i}^{\top}(\theta_{2}-\theta_{1})italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) μi(𝐗𝒬sθ2)μi(𝐗𝒬sθ1)absentsubscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃2subscript𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1\displaystyle\coloneqq\mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{2})-% \mu_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1})≔ italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
=exs,iθ21+j𝒬sexs,jθ2exs,iθ11+j𝒬sexs,jθ1absentsuperscript𝑒superscriptsubscript𝑥𝑠𝑖topsubscript𝜃21subscript𝑗subscript𝒬𝑠superscript𝑒superscriptsubscript𝑥𝑠𝑗topsubscript𝜃2superscript𝑒superscriptsubscript𝑥𝑠𝑖topsubscript𝜃11subscript𝑗subscript𝒬𝑠superscript𝑒superscriptsubscript𝑥𝑠𝑗topsubscript𝜃1\displaystyle=\frac{e^{x_{s,i}^{\top}\theta_{2}}}{1+\sum_{j\in\mathcal{Q}_{s}}% e^{x_{s,j}^{\top}\theta_{2}}}-\frac{e^{x_{s,i}^{\top}\theta_{1}}}{1+\sum_{j\in% \mathcal{Q}_{s}}e^{x_{s,j}^{\top}\theta_{1}}}= divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG (26)
=xs,iθ1xs,iθ2μ˙i(ti)𝑑ti,absentsuperscriptsubscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2subscript˙𝜇𝑖subscript𝑡𝑖differential-dsubscript𝑡𝑖\displaystyle=\int_{x_{s,i}^{\top}\theta_{1}}^{x_{s,i}^{\top}\theta_{2}}\dot{% \mu}_{i}(t_{i})\cdot dt_{i},= ∫ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where the second step is due to Fundamental Theorem of Calculus. We have exploited the two ways to view the multinomial logit function: sum of individual probabilities and a vector valued function. We write:

i𝒬sαi(𝐗𝒬s,θ1,θ2)xs,i(θ2θ1)=i𝒬sν=01μ˙i\delν𝐗𝒬sθ2+\del1ν𝐗𝒬sθ1dνsubscript𝑖subscript𝒬𝑠subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃1subscript𝜃2superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2subscript𝜃1subscript𝑖subscript𝒬𝑠superscriptsubscript𝜈01subscript˙𝜇𝑖\del𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃2\del1𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1𝑑𝜈\sum_{i\in\mathcal{Q}_{s}}\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{s}},\theta_{1},% \theta_{2})x_{s,i}^{\top}(\theta_{2}-\theta_{1})=\sum_{i\in\mathcal{Q}_{s}}% \int_{\nu=0}^{1}\dot{\mu}_{i}\del{\nu\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta% _{2}+\del{1-\nu}\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1}}\cdot d\nu∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 - italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_d italic_ν (27)

We also have:

μ(𝐗𝒬sθ1)μ(𝐗𝒬sθ2)=i=1Kαi(𝐗𝒬s,θ2,θ1)xs,i(θ1θ2).𝜇superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1𝜇superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃2superscriptsubscript𝑖1𝐾subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃2subscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃1subscript𝜃2\displaystyle\mu(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1})-\mu(\mathbf{X}% _{\mathcal{Q}_{s}}^{\top}\theta_{2})=\sum_{i=1}^{K}\alpha_{i}(\mathbf{X}_{% \mathcal{Q}_{s}},\theta_{2},\theta_{1})x_{s,i}^{\top}(\theta_{1}-\theta_{2}).italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) . (28)

It follows that:

g(θ1)g(θ2)=𝑔subscript𝜃1𝑔subscript𝜃2absent\displaystyle g(\theta_{1})-g(\theta_{2})=italic_g ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_g ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = s=1t1i𝒬s\delexs,iθ11+j𝒬sexs,jθ1exs,iθ21+j𝒬sexs,jθ2xs,isuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠\delsuperscript𝑒superscriptsubscript𝑥𝑠𝑖topsubscript𝜃11subscript𝑗subscript𝒬𝑠superscript𝑒superscriptsubscript𝑥𝑠𝑗topsubscript𝜃1superscript𝑒superscriptsubscript𝑥𝑠𝑖topsubscript𝜃21subscript𝑗subscript𝒬𝑠superscript𝑒superscriptsubscript𝑥𝑠𝑗topsubscript𝜃2subscript𝑥𝑠𝑖\displaystyle\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\del{\frac{e^{x_{s,i}^{% \top}\theta_{1}}}{1+\sum_{j\in\mathcal{Q}_{s}}e^{x_{s,j}^{\top}\theta_{1}}}-% \frac{e^{x_{s,i}^{\top}\theta_{2}}}{1+\sum_{j\in\mathcal{Q}_{s}}e^{x_{s,j}^{% \top}\theta_{2}}}}x_{s,i}∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT
+λt(θ1θ2)subscript𝜆𝑡subscript𝜃1subscript𝜃2\displaystyle+\lambda_{t}(\theta_{1}-\theta_{2})+ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
=\displaystyle== s=1t1i𝒬sαi(𝐗𝒬s,θ2,θ1)xsxx(θ1θ2)+λt(θ1θ2)superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃2subscript𝜃1subscript𝑥𝑠superscriptsubscript𝑥𝑥topsubscript𝜃1subscript𝜃2subscript𝜆𝑡subscript𝜃1subscript𝜃2\displaystyle\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\alpha_{i}(\mathbf{X}_{% \mathcal{Q}_{s}},\theta_{2},\theta_{1})x_{s}x_{x}^{\top}(\theta_{1}-\theta_{2}% )+\lambda_{t}(\theta_{1}-\theta_{2})∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
=\displaystyle== 𝐆t(θ2,θ1)(θ1θ2),subscript𝐆𝑡subscript𝜃2subscript𝜃1subscript𝜃1subscript𝜃2\displaystyle\mathbf{G}_{t}(\theta_{2},\theta_{1})(\theta_{1}-\theta_{2}),bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ,

where 𝐆t\delθ1,θ2s=1t1i𝒬sαi\del𝐗𝒬s,θ1,θ2xsxs+λt𝐈dformulae-sequencesubscript𝐆𝑡\delsubscript𝜃1subscript𝜃2superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝛼𝑖\delsubscript𝐗subscript𝒬𝑠subscript𝜃1subscript𝜃2subscript𝑥𝑠superscriptsubscript𝑥𝑠topsubscript𝜆𝑡subscript𝐈𝑑\mathbf{G}_{t}\del{\theta_{1},\theta_{2}}\coloneqq\sum_{s=1}^{t-1}\sum_{i\in% \mathcal{Q}_{s}}\alpha_{i}\del{\mathbf{X}_{\mathcal{Q}_{s}},\theta_{1},\theta_% {2}}x_{s}x_{s}^{\top}+\lambda_{t}\mathbf{I}_{d}bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. Since α\del𝐗𝒬s,θ1,θ21κ𝛼\delsubscript𝐗subscript𝒬𝑠subscript𝜃1subscript𝜃21𝜅\alpha\del{\mathbf{X}_{\mathcal{Q}_{s}},\theta_{1},\theta_{2}}\geq\frac{1}{\kappa}italic_α bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ divide start_ARG 1 end_ARG start_ARG italic_κ end_ARG (from Assumption 2), therefore 𝐆t(θ1,θ2)𝐎d×dsucceedssubscript𝐆𝑡subscript𝜃1subscript𝜃2subscript𝐎𝑑𝑑\mathbf{G}_{t}(\theta_{1},\theta_{2})\succ\mathbf{O}_{d\times d}bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ≻ bold_O start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT. Hence we get:

\enVertθ1θ2𝐆t(θ2,θ1)=\enVertg(θ1)g(θ2)𝐆t1(θ2,θ1).\enVertsubscript𝜃1subscriptsubscript𝜃2subscript𝐆𝑡subscript𝜃2subscript𝜃1\enVert𝑔subscript𝜃1𝑔subscriptsubscript𝜃2superscriptsubscript𝐆𝑡1subscript𝜃2subscript𝜃1\enVert{\theta_{1}-\theta_{2}}_{\mathbf{G}_{t}(\theta_{2},\theta_{1})}=\enVert% {g(\theta_{1})-g(\theta_{2})}_{\mathbf{G}_{t}^{-1}(\theta_{2},\theta_{1})}.italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT = italic_g ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_g ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT . (29)

A.3 Self-Concordance Style Relations for Multinomial Logistic Function

Lemma 12.

For an assortment 𝒬ssubscript𝒬𝑠\mathcal{Q}_{s}caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and θ1,θ2Θsubscript𝜃1subscript𝜃2Θ\theta_{1},\theta_{2}\,\in\,\Thetaitalic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Θ, the following holds:

i𝒬sαi(𝐗𝒬s,θ2,θ1)subscript𝑖subscript𝒬𝑠subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃2subscript𝜃1\displaystyle\sum_{i\in\mathcal{Q}_{s}}\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{s}}% ,\theta_{2},\theta_{1})∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) =i𝒬sν=01μ˙i\delν𝐗𝒬sθ2+\del1ν𝐗𝒬sθ1dνabsentsubscript𝑖subscript𝒬𝑠superscriptsubscript𝜈01subscript˙𝜇𝑖\del𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃2\del1𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1𝑑𝜈\displaystyle=\sum_{i\in\mathcal{Q}_{s}}\int_{\nu=0}^{1}\dot{\mu}_{i}\del{\nu% \mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{2}+\del{1-\nu}\mathbf{X}_{\mathcal{% Q}_{s}}^{\top}\theta_{1}}\cdot d\nu= ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 - italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_d italic_ν
i𝒬sμ˙i(𝐗𝒬sθ1)\del1+|xs,iθ1xs,iθ2|1absentsubscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1\del1superscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃21\displaystyle\geq\sum_{i\in\mathcal{Q}_{s}}\dot{\mu}_{i}(\mathbf{X}_{\mathcal{% Q}_{s}}^{\top}\theta_{1})\del{1+|x_{s,i}^{\top}\theta_{1}-x_{s,i}^{\top}\theta% _{2}|}^{-1}≥ ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) 1 + | italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT
Proof.

We write:

ν=01μ˙i\delν𝐗𝒬sθ2+\del1ν𝐗𝒬sθ1dνsuperscriptsubscript𝜈01subscript˙𝜇𝑖\del𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃2\del1𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1𝑑𝜈\displaystyle\int_{\nu=0}^{1}\dot{\mu}_{i}\del{\nu\mathbf{X}_{\mathcal{Q}_{s}}% ^{\top}\theta_{2}+\del{1-\nu}\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1}}% \cdot d\nu∫ start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 - italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_d italic_ν =i𝒬sxs,iθ1xs,iθ21xs,i(θ2θ1)μ˙i(ti)𝑑ti,absentsubscript𝑖subscript𝒬𝑠superscriptsubscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃21superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2subscript𝜃1subscript˙𝜇𝑖subscript𝑡𝑖differential-dsubscript𝑡𝑖\displaystyle=\sum_{i\in\mathcal{Q}_{s}}\int_{x_{s,i}^{\top}\theta_{1}}^{x_{s,% i}^{\top}\theta_{2}}\frac{1}{x_{s,i}^{\top}(\theta_{2}-\theta_{1})}\dot{\mu}_{% i}(t_{i})\cdot dt_{i},= ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (30)

where xs,iθ1xs,iθ2μ˙i(ti)𝑑tisuperscriptsubscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2subscript˙𝜇𝑖subscript𝑡𝑖differential-dsubscript𝑡𝑖\int_{x_{s,i}^{\top}\theta_{1}}^{x_{s,i}^{\top}\theta_{2}}\dot{\mu}_{i}(t_{i})% \cdot dt_{i}∫ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents integration of μ˙()˙𝜇\dot{\mu}(\cdot)over˙ start_ARG italic_μ end_ARG ( ⋅ ) with respect to the coordinate tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( hence the limits of the integration only consider change in the coordinate tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT). For some z>z1𝑧subscript𝑧1z>z_{1}\,\in\,\mathbb{R}italic_z > italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R, consider:

z1zddtilog\delμ˙i(ti)𝑑ti=z1z2μi,i(ti)μ˙i(ti)𝑑ti,superscriptsubscriptsubscript𝑧1𝑧𝑑𝑑subscript𝑡𝑖\delsubscript˙𝜇𝑖subscript𝑡𝑖differential-dsubscript𝑡𝑖superscriptsubscriptsubscript𝑧1𝑧superscript2subscript𝜇𝑖𝑖subscript𝑡𝑖subscript˙𝜇𝑖subscript𝑡𝑖differential-dsubscript𝑡𝑖\displaystyle\int_{z_{1}}^{z}\frac{d}{dt_{i}}\log\del{\dot{\mu}_{i}(t_{i})}% \cdot dt_{i}=\int_{z_{1}}^{z}\frac{\nabla^{2}\mu_{i,i}(t_{i})}{\dot{\mu}_{i}(t% _{i})}dt_{i},∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT divide start_ARG italic_d end_ARG start_ARG italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_log over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT divide start_ARG ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where 2μi,i()superscript2subscript𝜇𝑖𝑖\nabla^{2}\mu_{i,i}(\cdot)∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT ( ⋅ ) is the double derivative of μ()𝜇\mu(\cdot)italic_μ ( ⋅ ). Using Lemma 16, we have 12μi,i()μ˙i()11superscript2subscript𝜇𝑖𝑖subscript˙𝜇𝑖1-1\leq\frac{\nabla^{2}\mu_{i,i}(\cdot)}{\dot{\mu}_{i}(\cdot)}\leq 1- 1 ≤ divide start_ARG ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_μ start_POSTSUBSCRIPT italic_i , italic_i end_POSTSUBSCRIPT ( ⋅ ) end_ARG start_ARG over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) end_ARG ≤ 1. Thus we get:

(zz1)z1zddtilog\delμ˙i(ti).dti(zz1)formulae-sequence𝑧subscript𝑧1superscriptsubscriptsubscript𝑧1𝑧𝑑𝑑subscript𝑡𝑖\delsubscript˙𝜇𝑖subscript𝑡𝑖𝑑subscript𝑡𝑖𝑧subscript𝑧1-(z-z_{1})\leq\int_{z_{1}}^{z}\frac{d}{dt_{i}}\log\del{\dot{\mu}_{i}(t_{i})}.% dt_{i}\leq(z-z_{1})- ( italic_z - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≤ ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT divide start_ARG italic_d end_ARG start_ARG italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_log over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) . italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ ( italic_z - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

Using Fundamental Theorem of Calculus, we get:

(zz1)log\delμ˙i(z)log\delμ˙i(z1)(zz1)𝑧subscript𝑧1\delsubscript˙𝜇𝑖𝑧\delsubscript˙𝜇𝑖subscript𝑧1𝑧subscript𝑧1\displaystyle-(z-z_{1})\leq\log\del{\dot{\mu}_{i}(z)}-\log\del{\dot{\mu}_{i}(z% _{1})}\leq(z-z_{1})- ( italic_z - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≤ roman_log over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z ) - roman_log over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≤ ( italic_z - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
therefore\displaystyle\therefore μ˙i(z1)exp((zz1))μ˙i(z)μ˙i(z1)exp(zz1)subscript˙𝜇𝑖subscript𝑧1𝑧subscript𝑧1subscript˙𝜇𝑖𝑧subscript˙𝜇𝑖subscript𝑧1𝑧subscript𝑧1\displaystyle~{}\dot{\mu}_{i}(z_{1})\exp(-(z-z_{1}))\leq\dot{\mu}_{i}(z)\leq% \dot{\mu}_{i}(z_{1})\exp(z-z_{1})over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) roman_exp ( - ( italic_z - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z ) ≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) roman_exp ( italic_z - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) (31)

Using Eq (A.3) and for z2z1subscript𝑧2subscript𝑧1z_{2}\geq z_{1}\,\in\,\mathbb{R}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R, and for all i[K]𝑖delimited-[]𝐾i\,\in\,[K]italic_i ∈ [ italic_K ] and we have:

μ˙i(z1)\del1exp((z2z1))z1z2μ˙(ti)𝑑tiμ˙i(z1)\delexp(z2z1)1subscript˙𝜇𝑖subscript𝑧1\del1subscript𝑧2subscript𝑧1superscriptsubscriptsubscript𝑧1subscript𝑧2˙𝜇subscript𝑡𝑖differential-dsubscript𝑡𝑖subscript˙𝜇𝑖subscript𝑧1\delsubscript𝑧2subscript𝑧11\displaystyle\dot{\mu}_{i}(z_{1})\del{1-\exp(-(z_{2}-z_{1}))}\leq\int_{z_{1}}^% {z_{2}}\dot{\mu}(t_{i})dt_{i}\leq\dot{\mu}_{i}(z_{1})\del{\exp(z_{2}-z_{1})-1}over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) 1 - roman_exp ( - ( italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ≤ ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) roman_exp ( italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - 1
therefore\displaystyle\therefore μ˙i(z1)1exp((z2z1))z2z11z2z1z1z2μ˙(ti)𝑑tiμ˙i(z1)exp(z2z1)1z2z1.subscript˙𝜇𝑖subscript𝑧11subscript𝑧2subscript𝑧1subscript𝑧2subscript𝑧11subscript𝑧2subscript𝑧1superscriptsubscriptsubscript𝑧1subscript𝑧2˙𝜇subscript𝑡𝑖differential-dsubscript𝑡𝑖subscript˙𝜇𝑖subscript𝑧1subscript𝑧2subscript𝑧11subscript𝑧2subscript𝑧1\displaystyle~{}\dot{\mu}_{i}(z_{1})\frac{1-\exp(-(z_{2}-z_{1}))}{z_{2}-z_{1}}% \leq\frac{1}{z_{2}-z_{1}}\int_{z_{1}}^{z_{2}}\dot{\mu}(t_{i})dt_{i}\leq\dot{% \mu}_{i}(z_{1})\frac{\exp(z_{2}-z_{1})-1}{z_{2}-z_{1}}.over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) divide start_ARG 1 - roman_exp ( - ( italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) end_ARG start_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ≤ divide start_ARG 1 end_ARG start_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) divide start_ARG roman_exp ( italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - 1 end_ARG start_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG . (32)

Reversing the role of z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, such that z2z1subscript𝑧2subscript𝑧1z_{2}\leq z_{1}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT then again by using Eq (A.3) we write:

μ˙i(z1)exp((z1z2))1z2z11z2z1z1z2μ˙(ti)𝑑tiμ˙i(z1)exp(z1z2)1z2z1.subscript˙𝜇𝑖subscript𝑧1subscript𝑧1subscript𝑧21subscript𝑧2subscript𝑧11subscript𝑧2subscript𝑧1superscriptsubscriptsubscript𝑧1subscript𝑧2˙𝜇subscript𝑡𝑖differential-dsubscript𝑡𝑖subscript˙𝜇𝑖subscript𝑧1subscript𝑧1subscript𝑧21subscript𝑧2subscript𝑧1\displaystyle\dot{\mu}_{i}(z_{1})\frac{\exp(-(z_{1}-z_{2}))-1}{z_{2}-z_{1}}% \leq\frac{1}{z_{2}-z_{1}}\int_{z_{1}}^{z_{2}}\dot{\mu}(t_{i})dt_{i}\leq\dot{% \mu}_{i}(z_{1})\frac{\exp(z_{1}-z_{2})-1}{z_{2}-z_{1}}.over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) divide start_ARG roman_exp ( - ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) - 1 end_ARG start_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ≤ divide start_ARG 1 end_ARG start_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) divide start_ARG roman_exp ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - 1 end_ARG start_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG . (33)

Combining Eq (A.3) and (33) and for all i[K]𝑖delimited-[]𝐾i\,\in\,[K]italic_i ∈ [ italic_K ] we get:

μ˙i(z1)1exp(|z1z2|)|z1z2|1z2z1z1z2μ˙(ti)𝑑ti.subscript˙𝜇𝑖subscript𝑧11subscript𝑧1subscript𝑧2subscript𝑧1subscript𝑧21subscript𝑧2subscript𝑧1superscriptsubscriptsubscript𝑧1subscript𝑧2˙𝜇subscript𝑡𝑖differential-dsubscript𝑡𝑖\displaystyle\dot{\mu}_{i}(z_{1})\frac{1-\exp(-|z_{1}-z_{2}|)}{|z_{1}-z_{2}|}% \leq\frac{1}{z_{2}-z_{1}}\int_{z_{1}}^{z_{2}}\dot{\mu}(t_{i})dt_{i}.over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) divide start_ARG 1 - roman_exp ( - | italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ) end_ARG start_ARG | italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG ≤ divide start_ARG 1 end_ARG start_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (34)

If x0𝑥0x\geq 0italic_x ≥ 0, then ex(1+x)1superscript𝑒𝑥superscript1𝑥1e^{-x}\leq(1+x)^{-1}italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT ≤ ( 1 + italic_x ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, and therefore (1ex)/x(1+x)11superscript𝑒𝑥𝑥superscript1𝑥1(1-e^{-x})/x\geq(1+x)^{-1}( 1 - italic_e start_POSTSUPERSCRIPT - italic_x end_POSTSUPERSCRIPT ) / italic_x ≥ ( 1 + italic_x ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Thus we lower bound the left hand side of Eq (34) as:

μ˙i(z1)\del1+|z1z2|1μ˙i(z1)1exp(|z1z2|)|z1z2|1z2z1z1z2μ˙(ti)𝑑ti.subscript˙𝜇𝑖subscript𝑧1\del1superscriptsubscript𝑧1subscript𝑧21subscript˙𝜇𝑖subscript𝑧11subscript𝑧1subscript𝑧2subscript𝑧1subscript𝑧21subscript𝑧2subscript𝑧1superscriptsubscriptsubscript𝑧1subscript𝑧2˙𝜇subscript𝑡𝑖differential-dsubscript𝑡𝑖\displaystyle\dot{\mu}_{i}(z_{1})\del{1+|z_{1}-z_{2}|}^{-1}\leq\dot{\mu}_{i}(z% _{1})\frac{1-\exp(-|z_{1}-z_{2}|)}{|z_{1}-z_{2}|}\leq\frac{1}{z_{2}-z_{1}}\int% _{z_{1}}^{z_{2}}\dot{\mu}(t_{i})dt_{i}.over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) 1 + | italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) divide start_ARG 1 - roman_exp ( - | italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ) end_ARG start_ARG | italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG ≤ divide start_ARG 1 end_ARG start_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Using above with z2=xs,iθ2subscript𝑧2superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2z_{2}=x_{s,i}^{\top}\theta_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and z1=xs,iθ1subscript𝑧1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃1z_{1}=x_{s,i}^{\top}\theta_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in Eq (30) gives:

i𝒬sν=01μ˙i\delν𝐗𝒬sθ2+\del1ν𝐗𝒬sθ1dνsubscript𝑖subscript𝒬𝑠superscriptsubscript𝜈01subscript˙𝜇𝑖\del𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃2\del1𝜈superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1𝑑𝜈\displaystyle\sum_{i\in\mathcal{Q}_{s}}\int_{\nu=0}^{1}\dot{\mu}_{i}\del{\nu% \mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{2}+\del{1-\nu}\mathbf{X}_{\mathcal{% Q}_{s}}^{\top}\theta_{1}}\cdot d\nu∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 - italic_ν bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_d italic_ν
=\displaystyle== i𝒬sxs,iθ1xs,iθ21xs,i(θ2θ1)μ˙i(ti)𝑑tii𝒬sμ˙i(𝐗𝒬sθ1)\del1+|xs,iθ1xs,iθ2|1.subscript𝑖subscript𝒬𝑠superscriptsubscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃21superscriptsubscript𝑥𝑠𝑖topsubscript𝜃2subscript𝜃1subscript˙𝜇𝑖subscript𝑡𝑖differential-dsubscript𝑡𝑖subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1\del1superscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃21\displaystyle\sum_{i\in\mathcal{Q}_{s}}\int_{x_{s,i}^{\top}\theta_{1}}^{x_{s,i% }^{\top}\theta_{2}}\frac{1}{x_{s,i}^{\top}(\theta_{2}-\theta_{1})}\dot{\mu}_{i% }(t_{i})\cdot dt_{i}\geq\sum_{i\in\mathcal{Q}_{s}}\dot{\mu}_{i}(\mathbf{X}_{% \mathcal{Q}_{s}}^{\top}\theta_{1})\del{1+|x_{s,i}^{\top}\theta_{1}-x_{s,i}^{% \top}\theta_{2}|}^{-1}.∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_d italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) 1 + | italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT .

Lemma 4.

For all θ1,θ2Θsubscript𝜃1subscript𝜃2Θ\theta_{1},\theta_{2}\,\in\,\Thetaitalic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Θ such that SmaxθΘ\enVertθ2𝑆subscript𝜃Θ\enVertsubscript𝜃2S\coloneqq\max_{\theta\,\in\,\Theta}\enVert{\theta}_{2}italic_S ≔ roman_max start_POSTSUBSCRIPT italic_θ ∈ roman_Θ end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (Assumption 1), the following inequalities hold:

𝐆t(θ1,θ2)(1+2S)1𝐇t(θ1)succeeds-or-equalssubscript𝐆𝑡subscript𝜃1subscript𝜃2superscript12𝑆1subscript𝐇𝑡subscript𝜃1\displaystyle\mathbf{G}_{t}(\theta_{1},\theta_{2})\succeq(1+2S)^{-1}\mathbf{H}% _{t}(\theta_{1})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⪰ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
𝐆t(θ1,θ2)(1+2S)1𝐇t(θ2)succeeds-or-equalssubscript𝐆𝑡subscript𝜃1subscript𝜃2superscript12𝑆1subscript𝐇𝑡subscript𝜃2\displaystyle\mathbf{G}_{t}(\theta_{1},\theta_{2})\succeq(1+2S)^{-1}\mathbf{H}% _{t}(\theta_{2})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ⪰ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )
Proof.

From Lemma 12, we have:

i𝒬sαi(𝐗𝒬s,θ2,θ1)subscript𝑖subscript𝒬𝑠subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃2subscript𝜃1\displaystyle\sum_{i\in\mathcal{Q}_{s}}\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{s}}% ,\theta_{2},\theta_{1})∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) i𝒬s\del1+|xs,iθ1xs,iθ2|1μ˙i(𝐗𝒬sθ1)absentsubscript𝑖subscript𝒬𝑠\del1superscriptsuperscriptsubscript𝑥𝑠𝑖topsubscript𝜃1superscriptsubscript𝑥𝑠𝑖topsubscript𝜃21subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1\displaystyle\geq\sum_{i\in\mathcal{Q}_{s}}\del{1+|x_{s,i}^{\top}\theta_{1}-x_% {s,i}^{\top}\theta_{2}|}^{-1}\dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}% \theta_{1})≥ ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT 1 + | italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
i𝒬s(1+\enVertxs,i2\enVertθ1θ22)1μ˙i(𝐗𝒬sθ1)absentsubscript𝑖subscript𝒬𝑠superscript1\enVertsubscriptsubscript𝑥𝑠𝑖2\enVertsubscript𝜃1subscriptsubscript𝜃221subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1\displaystyle\geq\sum_{i\in\mathcal{Q}_{s}}\left(1+\enVert{x_{s,i}}_{2}\enVert% {\theta_{1}-\theta_{2}}_{2}\right)^{-1}\dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{% s}}^{\top}\theta_{1})≥ ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 + italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) (Cauchy-Schwartz)
i𝒬s(1+2S)1μ˙i(𝐗𝒬sθ1)absentsubscript𝑖subscript𝒬𝑠superscript12𝑆1subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1\displaystyle\geq\sum_{i\in\mathcal{Q}_{s}}\left(1+2S\right)^{-1}\dot{\mu}_{i}% (\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1})≥ ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) (θ1,θ2Θ,xs,i21formulae-sequencesubscript𝜃1subscript𝜃2Θsubscriptnormsubscript𝑥𝑠𝑖21\theta_{1},\theta_{2}\in\Theta,\,||x_{s,i}||_{2}\leq 1italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Θ , | | italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1)

Now we write 𝐆t(θ1,θ2)subscript𝐆𝑡subscript𝜃1subscript𝜃2\mathbf{G}_{t}(\theta_{1},\theta_{2})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) as:

𝐆t(θ1,θ2)subscript𝐆𝑡subscript𝜃1subscript𝜃2\displaystyle\mathbf{G}_{t}(\theta_{1},\theta_{2})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) =s=1t1i𝒬sαi(𝐗𝒬s,θ2,θ1)xs,ixs,i+λt𝐈dabsentsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃2subscript𝜃1subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\displaystyle=\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\alpha_{i}(\mathbf{X}_% {\mathcal{Q}_{s}},\theta_{2},\theta_{1})x_{s,i}x_{s,i}^{\top}+\lambda_{t}% \mathbf{I}_{d}= ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
(1+2S)1s=1t1i𝒬sμ˙i(𝐗𝒬sθ1)xs,ixs,i+λt𝐈dsucceeds-or-equalsabsentsuperscript12𝑆1superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\displaystyle\succeq(1+2S)^{-1}\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{% \mu}_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1})x_{s,i}x_{s,i}^{\top}+% \lambda_{t}\mathbf{I}_{d}⪰ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
=(1+2S)1\dels=1t1i𝒬sμ˙i(𝐗𝒬sθ1)xs,ixs,i+(1+2S)λt𝐈dabsentsuperscript12𝑆1\delsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖top12𝑆subscript𝜆𝑡subscript𝐈𝑑\displaystyle=(1+2S)^{-1}\del{\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{% \mu}_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1})x_{s,i}x_{s,i}^{\top}+(% 1+2S)\lambda_{t}\mathbf{I}_{d}}= ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + ( 1 + 2 italic_S ) italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
(1+2S)1\dels=1t1i𝒬sμ˙i(𝐗𝒬sθ1)xs,ixs,i+λt𝐈dsucceeds-or-equalsabsentsuperscript12𝑆1\delsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠topsubscript𝜃1subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\displaystyle\succeq(1+2S)^{-1}\del{\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}% \dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta_{1})x_{s,i}x_{s,i}^{% \top}+\lambda_{t}\mathbf{I}_{d}}⪰ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
=(1+2S)1𝐇t(θ1).absentsuperscript12𝑆1subscript𝐇𝑡subscript𝜃1\displaystyle=(1+2S)^{-1}\mathbf{H}_{t}(\theta_{1}).= ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) .

Since, θ1subscript𝜃1\theta_{1}italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and θ2subscript𝜃2\theta_{2}italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT have symmetric roles in the definition of αi(𝐗𝒬s,θ2,θ1)subscript𝛼𝑖subscript𝐗subscript𝒬𝑠subscript𝜃2subscript𝜃1\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{s}},\theta_{2},\theta_{1})italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), we also obtain the second relation by a change of variable directly. ∎

The following Lemma presents a crucial bound over the deviation (θθ)𝜃subscript𝜃(\theta-\theta_{*})( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ), which we extensively use in our derivations.

Lemma 5.

For θCt\delδ𝜃subscript𝐶𝑡\del𝛿\theta\in C_{t}\del{\delta}italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ, we have the following relation with probability at least 1δ1𝛿1-\delta1 - italic_δ:

\enVertθθ𝐇t(θ)2(1+2S)γt(δ).\enVert𝜃subscriptsubscript𝜃subscript𝐇𝑡𝜃212𝑆subscript𝛾𝑡𝛿\displaystyle\enVert{\theta-\theta_{*}}_{\mathbf{H}_{t}(\theta)}\leq 2(1+2S)% \gamma_{t}(\delta).italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) end_POSTSUBSCRIPT ≤ 2 ( 1 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) . (35)
Proof.

Since θ,θΘ𝜃subscript𝜃Θ\theta,\theta_{*}\in\Thetaitalic_θ , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ roman_Θ, then by Lemma 4, it follows that:

\enVertθθ𝐇t(θ)1+2S\enVertθθ𝐆t(θ,θ).\enVert𝜃subscriptsubscript𝜃subscript𝐇𝑡𝜃12𝑆\enVert𝜃subscriptsubscript𝜃subscript𝐆𝑡𝜃subscript𝜃\enVert{\theta-\theta_{*}}_{\mathbf{H}_{t}(\theta)}\leq\sqrt{1+2S}\enVert{% \theta-\theta_{*}}_{\mathbf{G}_{t}(\theta,\theta_{*})}.italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) end_POSTSUBSCRIPT ≤ square-root start_ARG 1 + 2 italic_S end_ARG italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT .

From triangle inequality, we write :

\enVertg(θ)g(θ)𝐆t1(θ,θ)\enVertg(θ)g(θ^t)𝐆t1(θ,θ)+\enVertg(θ^t)g(θ)𝐆t1(θ,θ),\enVert𝑔subscript𝜃𝑔subscript𝜃superscriptsubscript𝐆𝑡1𝜃subscript𝜃\enVert𝑔subscript𝜃𝑔subscriptsubscript^𝜃𝑡superscriptsubscript𝐆𝑡1𝜃subscript𝜃\enVert𝑔subscript^𝜃𝑡𝑔subscript𝜃superscriptsubscript𝐆𝑡1𝜃subscript𝜃\displaystyle\enVert{g(\theta_{*})-g(\theta)}_{\mathbf{G}_{t}^{-1}(\theta,% \theta_{*})}\leq\enVert{g(\theta_{*})-g(\hat{\theta}_{t})}_{\mathbf{G}_{t}^{-1% }(\theta,\theta_{*})}+\enVert{g(\hat{\theta}_{t})-g(\theta)}_{\mathbf{G}_{t}^{% -1}(\theta,\theta_{*})},italic_g ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_g ( italic_θ ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ italic_g ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_g ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT + italic_g ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g ( italic_θ ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ,

where θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the MLE estimate. Further Lemma 4 gives:

\enVertg(θ)g(θ)𝐆t1(θ,θ)\enVert𝑔subscript𝜃𝑔subscript𝜃superscriptsubscript𝐆𝑡1𝜃subscript𝜃\displaystyle\enVert{g(\theta_{*})-g(\theta)}_{\mathbf{G}_{t}^{-1}(\theta,% \theta_{*})}italic_g ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_g ( italic_θ ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT 1+2S\enVertg(θ)g(θ^t)𝐇t1(θ)absent12𝑆\enVert𝑔subscript𝜃𝑔subscriptsubscript^𝜃𝑡superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle\leq\sqrt{1+2S}\enVert{g(\theta_{*})-g(\hat{\theta}_{t})}_{% \mathbf{H}_{t}^{-1}(\theta_{*})}≤ square-root start_ARG 1 + 2 italic_S end_ARG italic_g ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_g ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT
+1+2S\enVertg(θ^t)g(θ)𝐇t1(θ).12𝑆\enVert𝑔subscript^𝜃𝑡𝑔subscript𝜃superscriptsubscript𝐇𝑡1𝜃\displaystyle+\sqrt{1+2S}\enVert{g(\hat{\theta}_{t})-g(\theta)}_{\mathbf{H}_{t% }^{-1}(\theta)}.+ square-root start_ARG 1 + 2 italic_S end_ARG italic_g ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g ( italic_θ ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ ) end_POSTSUBSCRIPT .

since θ𝜃\thetaitalic_θ is the minimizer of \enVertg(θ)g(θ^t)𝐇t1(θ)\enVert𝑔𝜃𝑔subscriptsubscript^𝜃𝑡superscriptsubscript𝐇𝑡1𝜃\enVert{g(\theta)-g(\hat{\theta}_{t})}_{\mathbf{H}_{t}^{-1}(\theta)}italic_g ( italic_θ ) - italic_g ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ ) end_POSTSUBSCRIPT, therefore we write:

\enVertg(θ)g(θ)𝐆t1(θ,θ)21+2S\enVertg(θ)g(θ^t)𝐇t1(θ).\enVert𝑔subscript𝜃𝑔subscript𝜃superscriptsubscript𝐆𝑡1𝜃subscript𝜃212𝑆\enVert𝑔subscript𝜃𝑔subscriptsubscript^𝜃𝑡superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle\enVert{g(\theta_{*})-g(\theta)}_{\mathbf{G}_{t}^{-1}(\theta,% \theta_{*})}\leq 2\sqrt{1+2S}\enVert{g(\theta_{*})-g(\hat{\theta}_{t})}_{% \mathbf{H}_{t}^{-1}(\theta_{*})}.italic_g ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_g ( italic_θ ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ 2 square-root start_ARG 1 + 2 italic_S end_ARG italic_g ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_g ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT .

Finally, the Eq (35) follows by an application of Lemma 10 as:

\enVertg(θ)g(θ)𝐇t1(θ)\enVert𝑔subscript𝜃𝑔subscript𝜃superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle\enVert{g(\theta_{*})-g(\theta)}_{\mathbf{H}_{t}^{-1}(\theta_{*})}italic_g ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) - italic_g ( italic_θ ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT γt(δ).absentsubscript𝛾𝑡𝛿\displaystyle\leq\gamma_{t}(\delta).≤ italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) .

A.4 Bounds on prediction error

Lemma 6.

For the assortment chosen by the algorithm CB-MNL, 𝒬tsubscript𝒬𝑡\mathcal{Q}_{t}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as given by Eq (12) and any θCt(δ)𝜃subscript𝐶𝑡𝛿\theta\in C_{t}(\delta)italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) the following holds with probability at least 1δ1𝛿1-\delta1 - italic_δ:

αi(𝐗𝒬t,θ,θ)subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃\displaystyle\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{*},\theta)italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) μ˙i\del𝐗𝒬tθ+2(1+2S)Mγt(δ)\enVertxt,i𝐇t1(θ).absentsubscript˙𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃212𝑆𝑀subscript𝛾𝑡𝛿\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle\leq\dot{\mu}_{i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{% *}}+2(1+2S)M\gamma_{t}(\delta)\enVert{x_{t,i}}_{\mathbf{H}_{t}^{-1}(\theta_{*}% )}.≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + 2 ( 1 + 2 italic_S ) italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT .
Proof.

Consider the mulinomial logit function:

αi(𝐗𝒬t,θ,θ)xt,i(θθ)=ext,iθ1+j𝒬text,jθext,iθ1+j𝒬text,jθ.subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃superscriptsubscript𝑥𝑡𝑖top𝜃subscript𝜃superscript𝑒superscriptsubscript𝑥𝑡𝑖top𝜃1subscript𝑗subscript𝒬𝑡superscript𝑒superscriptsubscript𝑥𝑡𝑗top𝜃superscript𝑒superscriptsubscript𝑥𝑡𝑖topsubscript𝜃1subscript𝑗subscript𝒬𝑡superscript𝑒superscriptsubscript𝑥𝑡𝑗topsubscript𝜃\displaystyle\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{*},\theta)x_{t,i}% ^{\top}(\theta-\theta_{*})=\frac{e^{x_{t,i}^{\top}\theta}}{1+\sum_{j\in% \mathcal{Q}_{t}}e^{x_{t,j}^{\top}\theta}}-\frac{e^{x_{t,i}^{\top}\theta_{*}}}{% 1+\sum_{j\in\mathcal{Q}_{t}}e^{x_{t,j}^{\top}\theta_{*}}}.italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG . (36)

We use second-order Taylor expansion for each component of the multinomial logit function at aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Consider for all i[K]𝑖delimited-[]𝐾i\,\in\,[K]italic_i ∈ [ italic_K ]:

fi(ri)subscript𝑓𝑖subscript𝑟𝑖\displaystyle f_{i}(r_{i})italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) =eri1+eri+j𝒬s,jierjabsentsuperscript𝑒subscript𝑟𝑖1superscript𝑒subscript𝑟𝑖subscriptformulae-sequence𝑗subscript𝒬𝑠𝑗𝑖superscript𝑒subscript𝑟𝑗\displaystyle=\frac{e^{r_{i}}}{1+e^{r_{i}}+\sum_{j\in\mathcal{Q}_{s},j\neq i}e% ^{r_{j}}}= divide start_ARG italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_j ≠ italic_i end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG
f(ai)+fi(ai)(riai)+fi′′(ai)(riai)22.absent𝑓subscript𝑎𝑖superscriptsubscript𝑓𝑖subscript𝑎𝑖subscript𝑟𝑖subscript𝑎𝑖superscriptsubscript𝑓𝑖′′subscript𝑎𝑖superscriptsubscript𝑟𝑖subscript𝑎𝑖22\displaystyle\leq f(a_{i})+f_{i}^{\prime}(a_{i})(r_{i}-a_{i})+\frac{f_{i}^{% \prime\prime}(a_{i})(r_{i}-a_{i})^{2}}{2}.≤ italic_f ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ( italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG . (37)

In Eq (A.4), we substitute: fi()μisubscript𝑓𝑖subscript𝜇𝑖f_{i}(\cdot)\to\mu_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( ⋅ ) → italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, rixt,iθsubscript𝑟𝑖superscriptsubscript𝑥𝑡𝑖top𝜃r_{i}\to x_{t,i}^{\top}\thetaitalic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ, and aixt,iθsubscript𝑎𝑖superscriptsubscript𝑥𝑡𝑖topsubscript𝜃a_{i}\to x_{t,i}^{\top}\theta_{*}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT. Thus we re-write Eq (36) as:

αi(𝐗𝒬t,θ,θ)xs,i(θθ)subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃superscriptsubscript𝑥𝑠𝑖top𝜃subscript𝜃\displaystyle\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{*},\theta)x_{s,i}% ^{\top}(\theta-\theta_{*})italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) μ˙i\del𝐗𝒬tθ(xt,i(θθ))+μ¨i\del𝐗𝒬tθ(xt,i(θθ)2,\displaystyle\leq\dot{\mu}_{i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{% *}}(x_{t,i}^{\top}(\theta-\theta_{*}))+\ddot{\mu}_{i}\del{\mathbf{X}_{\mathcal% {Q}_{t}}^{\top}\theta_{*}}(x_{t,i}^{\top}(\theta-\theta_{*})^{2},≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ) + over¨ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,
αi(𝐗𝒬t,θ,θ)thereforeabsentsubscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃\displaystyle\therefore~{}~{}\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{*% },\theta)∴ italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) μ˙i\del𝐗𝒬tθ(xt,i(θθ))+μ¨i\del𝐗𝒬tθ|xt,i(θθ)|absentsubscript˙𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃superscriptsubscript𝑥𝑡𝑖top𝜃subscript𝜃subscript¨𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃superscriptsubscript𝑥𝑡𝑖top𝜃subscript𝜃\displaystyle\leq\dot{\mu}_{i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{% *}}(x_{t,i}^{\top}(\theta-\theta_{*}))+\ddot{\mu}_{i}\del{\mathbf{X}_{\mathcal% {Q}_{t}}^{\top}\theta_{*}}|x_{t,i}^{\top}(\theta-\theta_{*})|≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ) + over¨ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) |
μ˙i\del𝐗𝒬tθ+M\envertxt,i(θθ)absentsubscript˙𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃𝑀\envertsuperscriptsubscript𝑥𝑡𝑖top𝜃subscript𝜃\displaystyle\leq\dot{\mu}_{i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{% *}}+M\envert{x_{t,i}^{\top}(\theta-\theta_{*})}≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + italic_M italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT )

where we upper bound μ¨isubscript¨𝜇𝑖\ddot{\mu}_{i}over¨ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by M𝑀Mitalic_M. An application of Cauchy-Schwarz gives us:

\envertxt,i(θθ)\envertsuperscriptsubscript𝑥𝑡𝑖topsubscript𝜃𝜃\displaystyle\envert{x_{t,i}^{\top}(\theta_{*}-\theta)}italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ ) \enVertxt,i𝐇t1(θ)\enVertθθ𝐇t(θ)absent\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\enVertsubscript𝜃subscript𝜃subscript𝐇𝑡subscript𝜃\displaystyle\leq\enVert{x_{t,i}}_{\mathbf{H}_{t}^{-1}(\theta_{*})}\enVert{% \theta_{*}-\theta}_{\mathbf{H}_{t}(\theta_{*})}≤ italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT (38)

Upon Combining the last two equations we get:

αi(𝐗𝒬t,θ,θ)subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃\displaystyle\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{*},\theta)italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) μ˙i\del𝐗𝒬tθ+\enVertxt,i𝐇t1(θ)\enVertθθ𝐇t(θ).absentsubscript˙𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\enVertsubscript𝜃subscript𝜃subscript𝐇𝑡subscript𝜃\displaystyle\leq\dot{\mu}_{i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{% *}}+\enVert{x_{t,i}}_{\mathbf{H}_{t}^{-1}(\theta_{*})}\enVert{\theta_{*}-% \theta}_{\mathbf{H}_{t}(\theta_{*})}.≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT .

From Lemma 5 we get:

αi(𝐗𝒬t,θ,θ)subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃\displaystyle\alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{*},\theta)italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) μ˙i\del𝐗𝒬tθ+2(1+2S)Mγt(δ)\enVertxt,i𝐇t1(θ).absentsubscript˙𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃212𝑆𝑀subscript𝛾𝑡𝛿\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle\leq\dot{\mu}_{i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{% *}}+2(1+2S)M\gamma_{t}(\delta)\enVert{x_{t,i}}_{\mathbf{H}_{t}^{-1}(\theta_{*}% )}.≤ over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + 2 ( 1 + 2 italic_S ) italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT .

Lemma 3.

For the assortment chosen by the algorithm CB-MNL, 𝒬tsubscript𝒬𝑡\mathcal{Q}_{t}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as given by Eq (12) and any θCt(δ)𝜃subscript𝐶𝑡𝛿\theta\in C_{t}(\delta)italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) the following holds with probability at least 1δ1𝛿1-\delta1 - italic_δ:

Δpred(𝐗𝒬t,θ)superscriptΔpredsubscript𝐗subscript𝒬𝑡𝜃absent\displaystyle\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta)\leqroman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ ) ≤ \del2+4Sγt(δ)i𝒬tμ˙i\del𝐗𝒬tθ\enVertxt,i𝐇t1(θ)\del24𝑆subscript𝛾𝑡𝛿subscript𝑖subscript𝒬𝑡subscript˙𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle\del{2+4S}\gamma_{t}(\delta)\sum_{i\in\mathcal{Q}_{t}}\dot{\mu}_{% i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*}}\enVert{x_{t,i}}_{\mathbf% {H}_{t}^{-1}(\theta_{*})}2 + 4 italic_S italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT
+4κ(1+2S)2Mγt(δ)2i𝒬t\enVertxt,i𝐕t124𝜅superscript12𝑆2𝑀subscript𝛾𝑡superscript𝛿2subscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript𝑥𝑡𝑖2superscriptsubscript𝐕𝑡1\displaystyle+4\kappa(1+2S)^{2}M\gamma_{t}(\delta)^{2}\sum_{i\in\mathcal{Q}_{t% }}\enVert{x_{t,i}}^{2}_{\mathbf{V}_{t}^{-1}}+ 4 italic_κ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
Proof.
Δpred(𝐗𝒬t,θ)superscriptΔpredsubscript𝐗subscript𝒬𝑡𝜃\displaystyle\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta)roman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ ) =\envertμ(𝐗𝒬tθ)μ(𝐗𝒬tθ)absent\envert𝜇superscriptsubscript𝐗subscript𝒬𝑡top𝜃𝜇superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃\displaystyle=~{}\envert{\mu(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta)-\mu(% \mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})}= italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) - italic_μ ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT )
=\displaystyle== \enverti𝒬tαi(𝐗𝒬t,θ,θ)xt,i(θθ)\envertsubscript𝑖subscript𝒬𝑡subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃superscriptsubscript𝑥𝑡𝑖top𝜃subscript𝜃\displaystyle~{}\envert{\sum_{i\in\mathcal{Q}_{t}}\alpha_{i}(\mathbf{X}_{% \mathcal{Q}_{t}},\theta_{*},\theta)x_{t,i}^{\top}(\theta-\theta_{*})}∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) (From Eq (28))
\displaystyle\leq \enverti𝒬tαi(𝐗𝒬t,θ,θ)\enVertxt,i𝐇t1(θ)\enVertθθ𝐇t(θ)\envertsubscript𝑖subscript𝒬𝑡subscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\enVertsubscript𝜃subscript𝜃subscript𝐇𝑡subscript𝜃\displaystyle~{}\envert{\sum_{i\in\mathcal{Q}_{t}}\alpha_{i}(\mathbf{X}_{% \mathcal{Q}_{t}},\theta_{*},\theta)\enVert{x_{t,i}}_{\mathbf{H}_{t}^{-1}(% \theta_{*})}\enVert{\theta_{*}-\theta}_{\mathbf{H}_{t}(\theta_{*})}}∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT (Cauchy-Schwarz inequality and Eq (29))
\displaystyle\leq 2(1+2S)γt(δ)i𝒬t\envertαi(𝐗𝒬t,θ,θ)\enVertxt,i𝐇t1(θ)212𝑆subscript𝛾𝑡𝛿subscript𝑖subscript𝒬𝑡\envertsubscript𝛼𝑖subscript𝐗subscript𝒬𝑡subscript𝜃𝜃\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle~{}2(1+2S)\gamma_{t}(\delta)\sum_{i\in\mathcal{Q}_{t}}\envert{% \alpha_{i}(\mathbf{X}_{\mathcal{Q}_{t}},\theta_{*},\theta)\enVert{x_{t,i}}_{% \mathbf{H}_{t}^{-1}(\theta_{*})}}2 ( 1 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT (From Lemma 5)
\displaystyle\leq 2(1+2S)γt(δ)i𝒬t(μ˙i\del𝐗𝒬tθ\enVertxt,i𝐇t1(θ)\displaystyle~{}2(1+2S)\gamma_{t}(\delta)\sum_{i\in\mathcal{Q}_{t}}\left(\dot{% \mu}_{i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*}}\enVert{x_{t,i}}_{% \mathbf{H}_{t}^{-1}(\theta_{*})}\right.2 ( 1 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT
+2(1+2S)Mγt(δ)\enVertxt,i𝐇t1(θ)2)\displaystyle+\left.2(1+2S)M\gamma_{t}(\delta)\enVert{x_{t,i}}^{2}_{\mathbf{H}% _{t}^{-1}(\theta_{*})}\right)+ 2 ( 1 + 2 italic_S ) italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ) (From Lemma 6)

Upon re-arranging the terms we get:

Δpred(𝐗𝒬t,θ)superscriptΔpredsubscript𝐗subscript𝒬𝑡𝜃absent\displaystyle\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta)\leqroman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ ) ≤ \del2+4Sγt(δ)i𝒬tμ˙i\del𝐗𝒬tθ\enVertxt,i𝐇t1(θ)\del24𝑆subscript𝛾𝑡𝛿subscript𝑖subscript𝒬𝑡subscript˙𝜇𝑖\delsuperscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐇𝑡1subscript𝜃\displaystyle\del{2+4S}\gamma_{t}(\delta)\sum_{i\in\mathcal{Q}_{t}}\dot{\mu}_{% i}\del{\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*}}\enVert{x_{t,i}}_{\mathbf% {H}_{t}^{-1}(\theta_{*})}2 + 4 italic_S italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT
+4κ(1+2S)2Mγt(δ)2i𝒬t\enVertxt,i𝐕t12,4𝜅superscript12𝑆2𝑀subscript𝛾𝑡superscript𝛿2subscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript𝑥𝑡𝑖2superscriptsubscript𝐕𝑡1\displaystyle+4\kappa(1+2S)^{2}M\gamma_{t}(\delta)^{2}\sum_{i\in\mathcal{Q}_{t% }}\enVert{x_{t,i}}^{2}_{\mathbf{V}_{t}^{-1}},+ 4 italic_κ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,

where we use 𝐇t1(θ)κ1𝐕tsucceeds-or-equalssuperscriptsubscript𝐇𝑡1subscript𝜃superscript𝜅1subscript𝐕𝑡\mathbf{H}_{t}^{-1}(\theta_{*})\succeq\kappa^{-1}\mathbf{V}_{t}bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ⪰ italic_κ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from Assumption 2. ∎

Corollary 7.

For the assortment chosen by the algorithm CB-MNL, 𝒬tsubscript𝒬𝑡\mathcal{Q}_{t}caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as given by Eq (12) and any θCt(δ)𝜃subscript𝐶𝑡𝛿\theta\in C_{t}(\delta)italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) the following holds with probability at least 1δ1𝛿1-\delta1 - italic_δ:

Δpred(𝐗𝒬t,θ)superscriptΔpredsubscript𝐗subscript𝒬𝑡𝜃absent\displaystyle\Delta^{\text{pred}}(\mathbf{X}_{\mathcal{Q}_{t}},\theta)\leqroman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ ) ≤ 2\del1+2Sγt(δ)i𝒬t\enVertx~t,i𝐉t12\del12𝑆subscript𝛾𝑡𝛿subscript𝑖subscript𝒬𝑡\enVertsubscriptsubscript~𝑥𝑡𝑖superscriptsubscript𝐉𝑡1\displaystyle 2\del{1+2S}\gamma_{t}(\delta)\sum_{i\in\mathcal{Q}_{t}}\enVert{% \tilde{x}_{t,i}}_{\mathbf{J}_{t}^{-1}}2 1 + 2 italic_S italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
+4κ(1+2S)2Mγt(δ)2i𝒬t\enVertxt,i𝐕t12,4𝜅superscript12𝑆2𝑀subscript𝛾𝑡superscript𝛿2subscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript𝑥𝑡𝑖2superscriptsubscript𝐕𝑡1\displaystyle+4\kappa(1+2S)^{2}M\gamma_{t}(\delta)^{2}\sum_{i\in\mathcal{Q}_{t% }}\enVert{x_{t,i}}^{2}_{\mathbf{V}_{t}^{-1}},+ 4 italic_κ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,

where x~t,i=μ˙i(𝐗𝒬tθ)xt,isubscript~𝑥𝑡𝑖subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃subscript𝑥𝑡𝑖\tilde{x}_{t,i}=\sqrt{\dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_% {*})}x_{t,i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = square-root start_ARG over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_ARG italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT and \enVertx𝐇t1(θ)=\enVertx𝐉t1\enVertsubscript𝑥subscriptsuperscript𝐇1𝑡subscript𝜃\enVertsubscript𝑥subscriptsuperscript𝐉1𝑡\enVert{x}_{\mathbf{H}^{-1}_{t}(\theta_{*})}=\enVert{x}_{\mathbf{J}^{-1}_{t}}italic_x start_POSTSUBSCRIPT bold_H start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT bold_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Proof.

This directly follows from the uniqueness and realizability of θsubscript𝜃\theta_{*}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT.

A.5 Regret calculation

The following two lemmas give the upper bounds on the self-normalized vector summations.

Lemma 13.
t=1Tmin\cbrsuperscriptsubscript𝑡1𝑇\cbr\displaystyle\sum_{t=1}^{T}\min\cbr∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min 2dlog\del1+LKTdλt.absent2𝑑\del1𝐿𝐾𝑇𝑑subscript𝜆𝑡\displaystyle\leq~{}2d\log\del{1+\frac{LKT}{d\lambda_{t}}}.≤ 2 italic_d roman_log 1 + divide start_ARG italic_L italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG .
Proof.

The proof follows by a direct application of Lemma 17 and 18 as:

t=1Tmin\cbri𝒬t\enVertx~t,i𝐉T+11(θ)2,1superscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript~𝑥𝑡𝑖2subscriptsuperscript𝐉1𝑇1𝜃1\displaystyle\sum_{t=1}^{T}\min\cbr{\sum_{i\in\mathcal{Q}_{t}}\enVert{\tilde{x% }_{t,i}}^{2}_{\mathbf{J}^{-1}_{T+1}(\theta)},1}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ( italic_θ ) end_POSTSUBSCRIPT , 1
\displaystyle\leq 2log\deldet(𝐉T+1)λtd2\delsubscript𝐉𝑇1superscriptsubscript𝜆𝑡𝑑\displaystyle~{}2\log\del{\frac{\det(\mathbf{J}_{T+1})}{\lambda_{t}^{d}}}2 roman_log divide start_ARG roman_det ( bold_J start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG (From Lemma 17)
=\displaystyle== 2log\deldet\dels=1t1i𝒬sμ˙i(𝐗𝒬tθ)xt,ixt,i+λt𝐈dλtd2\del\delsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑡topsubscript𝜃subscript𝑥𝑡𝑖superscriptsubscript𝑥𝑡𝑖topsubscript𝜆𝑡subscript𝐈𝑑superscriptsubscript𝜆𝑡𝑑\displaystyle~{}2\log\del{\frac{\det\del{\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}% _{s}}\dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{t}}^{\top}\theta_{*})x_{t,i}x_{t,i% }^{\top}+\lambda_{t}\mathbf{I}_{d}}}{\lambda_{t}^{d}}}2 roman_log divide start_ARG roman_det ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG
\displaystyle\leq 2log\deldet\dels=1t1i𝒬sLxt,ixt,i+λt𝐈dλtd2\del\delsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠𝐿subscript𝑥𝑡𝑖superscriptsubscript𝑥𝑡𝑖topsubscript𝜆𝑡subscript𝐈𝑑superscriptsubscript𝜆𝑡𝑑\displaystyle~{}2\log\del{\frac{\det\del{\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}% _{s}}Lx_{t,i}x_{t,i}^{\top}+\lambda_{t}\mathbf{I}_{d}}}{\lambda_{t}^{d}}}2 roman_log divide start_ARG roman_det ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG (Upper bound by Lipschitz constant)
\displaystyle\leq 2log\delLddet\dels=1t1i𝒬sxt,ixt,i+λt/L𝐈dλtd2\delsuperscript𝐿𝑑\delsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑥𝑡𝑖superscriptsubscript𝑥𝑡𝑖topsubscript𝜆𝑡𝐿subscript𝐈𝑑superscriptsubscript𝜆𝑡𝑑\displaystyle~{}2\log\del{\frac{L^{d}\det\del{\sum_{s=1}^{t-1}\sum_{i\in% \mathcal{Q}_{s}}x_{t,i}x_{t,i}^{\top}+\nicefrac{{\lambda_{t}}}{{L}}\mathbf{I}_% {d}}}{\lambda_{t}^{d}}}2 roman_log divide start_ARG italic_L start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_det ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + / start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_L end_ARG bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG
\displaystyle\leq 2log\deldet\dels=1t1i𝒬sLxt,ixt,i+λt𝐈dλtd2\del\delsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠𝐿subscript𝑥𝑡𝑖superscriptsubscript𝑥𝑡𝑖topsubscript𝜆𝑡subscript𝐈𝑑superscriptsubscript𝜆𝑡𝑑\displaystyle~{}2\log\del{\frac{\det\del{\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}% _{s}}Lx_{t,i}x_{t,i}^{\top}+\lambda_{t}\mathbf{I}_{d}}}{\lambda_{t}^{d}}}2 roman_log divide start_ARG roman_det ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_L italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG
\displaystyle\leq 2dlog\del1+LKTdλt.2𝑑\del1𝐿𝐾𝑇𝑑subscript𝜆𝑡\displaystyle~{}2d\log\del{1+\frac{LKT}{d\lambda_{t}}}.2 italic_d roman_log 1 + divide start_ARG italic_L italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG . (From Lemma 18)

Similar to Lemma 13, we prove the following.

Lemma 14.
t=1Tmin\cbri𝒬t\enVertxt,i𝐕T+11(θ)2,1superscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript𝑥𝑡𝑖2subscriptsuperscript𝐕1𝑇1𝜃1\displaystyle\sum_{t=1}^{T}\min\cbr{\sum_{i\in\mathcal{Q}_{t}}\enVert{x_{t,i}}% ^{2}_{\mathbf{V}^{-1}_{T+1}(\theta)},1}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ( italic_θ ) end_POSTSUBSCRIPT , 1 2dlog\del1+KTdλt.absent2𝑑\del1𝐾𝑇𝑑subscript𝜆𝑡\displaystyle\leq~{}2d\log\del{1+\frac{KT}{d\lambda_{t}}}.≤ 2 italic_d roman_log 1 + divide start_ARG italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG .
Proof.
t=1Tmin\cbri𝒬t\enVertxt,i𝐕T+11(θ)2,1superscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript𝑥𝑡𝑖2subscriptsuperscript𝐕1𝑇1𝜃1\displaystyle\sum_{t=1}^{T}\min\cbr{\sum_{i\in\mathcal{Q}_{t}}\enVert{x_{t,i}}% ^{2}_{\mathbf{V}^{-1}_{T+1}(\theta)},1}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ( italic_θ ) end_POSTSUBSCRIPT , 1
\displaystyle\leq 2log\deldet(𝐕T+1)λtd2\delsubscript𝐕𝑇1superscriptsubscript𝜆𝑡𝑑\displaystyle~{}2\log\del{\frac{\det(\mathbf{V}_{T+1})}{\lambda_{t}^{d}}}2 roman_log divide start_ARG roman_det ( bold_V start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG (From Lemma 17, set μi¯˙()=1˙¯subscript𝜇𝑖1\dot{\underline{\mu_{i}}}(\cdot)=1over˙ start_ARG under¯ start_ARG italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( ⋅ ) = 1)
=\displaystyle== 2log\deldet\dels=1t1i𝒬sxt,ixt,i+λt𝐈dλtd2\del\delsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝑥𝑡𝑖superscriptsubscript𝑥𝑡𝑖topsubscript𝜆𝑡subscript𝐈𝑑superscriptsubscript𝜆𝑡𝑑\displaystyle~{}2\log\del{\frac{\det\del{\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}% _{s}}x_{t,i}x_{t,i}^{\top}+\lambda_{t}\mathbf{I}_{d}}}{\lambda_{t}^{d}}}2 roman_log divide start_ARG roman_det ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG
\displaystyle\leq 2dlog\del1+KTdλt.2𝑑\del1𝐾𝑇𝑑subscript𝜆𝑡\displaystyle~{}2d\log\del{1+\frac{KT}{d\lambda_{t}}}.2 italic_d roman_log 1 + divide start_ARG italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG . (From Lemma 18)

Theorem 1.

With probability at least 1δ1𝛿1-\delta1 - italic_δ:

𝐑Tsubscript𝐑𝑇absent\displaystyle\mathbf{R}_{T}\leqbold_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≤ C1γt(δ)2dlog\del1+LKTdλtT+C2κγt(δ)2dlog\del1+KTdλt,subscript𝐶1subscript𝛾𝑡𝛿2𝑑\del1𝐿𝐾𝑇𝑑subscript𝜆𝑡𝑇subscript𝐶2𝜅subscript𝛾𝑡superscript𝛿2𝑑\del1𝐾𝑇𝑑subscript𝜆𝑡\displaystyle C_{1}\gamma_{t}(\delta)\sqrt{2d\log\del{1+\frac{LKT}{d\lambda_{t% }}}T}+C_{2}\kappa\gamma_{t}(\delta)^{2}d\log\del{1+\frac{KT}{d\lambda_{t}}},italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) square-root start_ARG 2 italic_d roman_log 1 + divide start_ARG italic_L italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_T end_ARG + italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_κ italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d roman_log 1 + divide start_ARG italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ,

where the constants are given as C1=\del4+8Ssubscript𝐶1\del48𝑆C_{1}=\del{4+8S}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 4 + 8 italic_S, C2=4(4+8S)3/2Msubscript𝐶24superscript48𝑆32𝑀C_{2}=4(4+8S)^{\nicefrac{{3}}{{2}}}Mitalic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 4 ( 4 + 8 italic_S ) start_POSTSUPERSCRIPT / start_ARG 3 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_M and γt(δ)subscript𝛾𝑡𝛿\gamma_{t}(\delta)italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) is given by Eq (8).

Proof.

The regret is upper bounded by the prediction error.

𝐑Tsubscript𝐑𝑇absent\displaystyle\mathbf{R}_{T}\leqbold_R start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ≤ t=1Tmin\cbrΔpred\del𝐗𝒬t,θtest,1superscriptsubscript𝑡1𝑇\cbrsuperscriptΔpred\delsubscript𝐗subscript𝒬𝑡superscriptsubscript𝜃𝑡est1\displaystyle\sum_{t=1}^{T}\min\cbr{\Delta^{\text{pred}}\del{\mathbf{X}_{% \mathcal{Q}_{t}},\theta_{t}^{\,\text{est}}},1}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min roman_Δ start_POSTSUPERSCRIPT pred end_POSTSUPERSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT est end_POSTSUPERSCRIPT , 1 (Rmax=1subscript𝑅1R_{\max}=1italic_R start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 1)
\displaystyle\leq t=1Tmin{\del2+4Sγt(δ)i𝒬t\enVertxt,i𝐉t1\displaystyle\sum_{t=1}^{T}\min\left\{\del{2+4S}\gamma_{t}(\delta)\sum_{i\in% \mathcal{Q}_{t}}\enVert{x_{t,i}}_{\mathbf{J}_{t}^{-1}}\right.∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min { 2 + 4 italic_S italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
+8κ(1+2S)2Mγt(δ)2i𝒬t\enVertxt,i𝐕t12,1}\displaystyle+\left.8\kappa(1+2S)^{2}M\gamma_{t}(\delta)^{2}\sum_{i\in\mathcal% {Q}_{t}}\enVert{x_{t,i}}^{2}_{\mathbf{V}_{t}^{-1}},1\right\}+ 8 italic_κ ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , 1 } (From Lemma 7)
\displaystyle\leq 2\del1+2Sγt(δ)t=1Tmin\cbri𝒬t\enVertxt,i𝐉t1,12\del12𝑆subscript𝛾𝑡𝛿superscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsubscript𝑥𝑡𝑖superscriptsubscript𝐉𝑡11\displaystyle 2\del{1+2S}\gamma_{t}(\delta)\sum_{t=1}^{T}\min\cbr{\sum_{i\in% \mathcal{Q}_{t}}\enVert{x_{t,i}}_{\mathbf{J}_{t}^{-1}},1}2 1 + 2 italic_S italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , 1
+8(1+2S)2κMγt(δ)2t=1Tmin\cbri𝒬t\enVertxt,i𝐕t12,18superscript12𝑆2𝜅𝑀subscript𝛾𝑡superscript𝛿2superscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript𝑥𝑡𝑖2superscriptsubscript𝐕𝑡11\displaystyle+8(1+2S)^{2}\kappa M\gamma_{t}(\delta)^{2}\sum_{t=1}^{T}\min\cbr{% \sum_{i\in\mathcal{Q}_{t}}\enVert{x_{t,i}}^{2}_{\mathbf{V}_{t}^{-1}},1}+ 8 ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , 1
\displaystyle\leq 2\del1+2Sγt(δ)Tt=1Tmin\cbri𝒬t\enVertxt,i𝐉t12,12\del12𝑆subscript𝛾𝑡𝛿𝑇superscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript𝑥𝑡𝑖2superscriptsubscript𝐉𝑡11\displaystyle 2\del{1+2S}\gamma_{t}(\delta)\sqrt{T}\sqrt{\sum_{t=1}^{T}\min% \cbr{\sum_{i\in\mathcal{Q}_{t}}\enVert{x_{t,i}}^{2}_{\mathbf{J}_{t}^{-1}},1}}2 1 + 2 italic_S italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) square-root start_ARG italic_T end_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , 1 end_ARG
+8(1+2S)2κMγt(δ)2t=1Tmin\cbri𝒬t\enVertxt,i𝐕t12,18superscript12𝑆2𝜅𝑀subscript𝛾𝑡superscript𝛿2superscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript𝑥𝑡𝑖2superscriptsubscript𝐕𝑡11\displaystyle+8(1+2S)^{2}\kappa M\gamma_{t}(\delta)^{2}\sum_{t=1}^{T}\min\cbr{% \sum_{i\in\mathcal{Q}_{t}}\enVert{x_{t,i}}^{2}_{\mathbf{V}_{t}^{-1}},1}+ 8 ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , 1 (Using Cauchy-Schwarz inequality)
\displaystyle\leq 2\del1+2Sγt(δ)2dlog\del1+LKTdλtT2\del12𝑆subscript𝛾𝑡𝛿2𝑑\del1𝐿𝐾𝑇𝑑subscript𝜆𝑡𝑇\displaystyle 2\del{1+2S}\gamma_{t}(\delta)\sqrt{2d\log\del{1+\frac{LKT}{d% \lambda_{t}}}T}2 1 + 2 italic_S italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) square-root start_ARG 2 italic_d roman_log 1 + divide start_ARG italic_L italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG italic_T end_ARG
+8(1+2S)2κMγt(δ)2dlog\del1+KTdλt.8superscript12𝑆2𝜅𝑀subscript𝛾𝑡superscript𝛿2𝑑\del1𝐾𝑇𝑑subscript𝜆𝑡\displaystyle+8(1+2S)^{2}\kappa M\gamma_{t}(\delta)^{2}d\log\del{1+\frac{KT}{d% \lambda_{t}}}.+ 8 ( 1 + 2 italic_S ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_κ italic_M italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d roman_log 1 + divide start_ARG italic_K italic_T end_ARG start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG . (From Lemma 13 and 14)

For a choice of λt=dlog(KT)subscript𝜆𝑡𝑑𝐾𝑇\lambda_{t}=d\log(KT)italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_d roman_log ( italic_K italic_T ) γt(δ)=O\deld1/2log1/2\delKTsubscript𝛾𝑡𝛿O\delsuperscript𝑑12superscript12\del𝐾𝑇\gamma_{t}(\delta)=\mathrm{O}\del{d^{\nicefrac{{1}}{{2}}}\log^{\nicefrac{{1}}{% {2}}}\del{KT}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) = roman_O italic_d start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_K italic_T. ∎

A.6 Convex relaxation

Lemma 8.

Et\delδCt\delδsubscript𝐶𝑡\del𝛿subscript𝐸𝑡\del𝛿E_{t}\del{\delta}\supseteq C_{t}\del{\delta}italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ ⊇ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_δ, therefore for any θCt(δ)𝜃subscript𝐶𝑡𝛿\theta\in C_{t}(\delta)italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), we also have θEt(δ)𝜃subscript𝐸𝑡𝛿\theta\,\in\,E_{t}(\delta)italic_θ ∈ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) (see Eq (7)).

Proof.

Let θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be the maximum likelihood estimate (see Eq (5)), the second-order Taylor series expansion of the log-loss (with integral remainder term) for any θd𝜃superscript𝑑\theta\in\mathbb{R}^{d}italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is given by:

tλ(θ)=subscriptsuperscript𝜆𝑡𝜃absent\displaystyle\mathcal{L}^{\lambda}_{t}(\theta)=caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) = tλ(θ^t)+tλ(θ^t)(θθ^t)subscriptsuperscript𝜆𝑡subscript^𝜃𝑡subscriptsuperscript𝜆𝑡superscriptsubscript^𝜃𝑡top𝜃subscript^𝜃𝑡\displaystyle\mathcal{L}^{\lambda}_{t}(\hat{\theta}_{t})+\nabla\mathcal{L}^{% \lambda}_{t}(\hat{\theta}_{t})^{\top}(\theta-\hat{\theta}_{t})caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + ∇ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
+(θθ^t)\delν=01(1ν)2tλ(θ^t+ν(θθ^t))𝑑ν(θθ^t)𝜃subscript^𝜃𝑡\delsubscriptsuperscript1𝜈01𝜈superscript2subscriptsuperscript𝜆𝑡subscript^𝜃𝑡𝜈𝜃subscript^𝜃𝑡differential-d𝜈𝜃subscript^𝜃𝑡\displaystyle+(\theta-\hat{\theta}_{t})\del{\int^{1}_{\nu=0}(1-\nu)\nabla^{2}% \mathcal{L}^{\lambda}_{t}(\hat{\theta}_{t}+\nu(\theta-\hat{\theta}_{t}))\cdot d% \nu}(\theta-\hat{\theta}_{t})+ ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∫ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT ( 1 - italic_ν ) ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ν ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ⋅ italic_d italic_ν ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (40)

tλ(θ^t)=0subscriptsuperscript𝜆𝑡subscript^𝜃𝑡0\nabla\mathcal{L}^{\lambda}_{t}(\hat{\theta}_{t})=0∇ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = 0 by definition since θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is maximum likelihood estimate. Therefore :

tλ(θ)subscriptsuperscript𝜆𝑡𝜃\displaystyle\mathcal{L}^{\lambda}_{t}(\theta)caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) =tλ(θ^t)+(θθ^t)\delν=01(1ν)2tλ(θ^t+ν(θθ^t))𝑑ν(θθ^t)absentsubscriptsuperscript𝜆𝑡subscript^𝜃𝑡superscript𝜃subscript^𝜃𝑡top\delsubscriptsuperscript1𝜈01𝜈superscript2subscriptsuperscript𝜆𝑡subscript^𝜃𝑡𝜈𝜃subscript^𝜃𝑡differential-d𝜈𝜃subscript^𝜃𝑡\displaystyle=\mathcal{L}^{\lambda}_{t}(\hat{\theta}_{t})+(\theta-\hat{\theta}% _{t})^{\top}\del{\int^{1}_{\nu=0}(1-\nu)\nabla^{2}\mathcal{L}^{\lambda}_{t}(% \hat{\theta}_{t}+\nu(\theta-\hat{\theta}_{t}))\cdot d\nu}(\theta-\hat{\theta}_% {t})= caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∫ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT ( 1 - italic_ν ) ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ν ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ⋅ italic_d italic_ν ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
=tλ(θ^t)+(θθ^t)\delν=01(1ν)𝐇t(θ^t+ν(θθ^t))𝑑ν(θθ^t)absentsubscriptsuperscript𝜆𝑡subscript^𝜃𝑡superscript𝜃subscript^𝜃𝑡top\delsubscriptsuperscript1𝜈01𝜈subscript𝐇𝑡subscript^𝜃𝑡𝜈𝜃subscript^𝜃𝑡differential-d𝜈𝜃subscript^𝜃𝑡\displaystyle=\mathcal{L}^{\lambda}_{t}(\hat{\theta}_{t})+(\theta-\hat{\theta}% _{t})^{\top}\del{\int^{1}_{\nu=0}(1-\nu)\mathbf{H}_{t}(\hat{\theta}_{t}+\nu(% \theta-\hat{\theta}_{t}))\cdot d\nu}(\theta-\hat{\theta}_{t})= caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∫ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT ( 1 - italic_ν ) bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ν ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ⋅ italic_d italic_ν ( italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (2tλ()=𝐇t()superscript2subscriptsuperscript𝜆𝑡subscript𝐇𝑡\nabla^{2}\mathcal{L}^{\lambda}_{t}(\cdot)=\mathbf{H}_{t}(\cdot)∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) = bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ))
tλ(θ^t)+\enVertθθ^t𝐆t(θ,θ^t)2absentsubscriptsuperscript𝜆𝑡subscript^𝜃𝑡\enVert𝜃subscriptsuperscriptsubscript^𝜃𝑡2subscript𝐆𝑡𝜃subscript^𝜃𝑡\displaystyle\leq\mathcal{L}^{\lambda}_{t}(\hat{\theta}_{t})+\enVert{\theta-% \hat{\theta}_{t}}^{2}_{\mathbf{G}_{t}(\theta,\hat{\theta}_{t})}≤ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT (def. of 𝐆t(θ,θ^t)subscript𝐆𝑡𝜃subscript^𝜃𝑡\mathbf{G}_{t}(\theta,\hat{\theta}_{t})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ))
tλ(θ^t)+\enVertgt(θ)gt(θ^t)𝐆t1(θ,θ^t)2.absentsubscriptsuperscript𝜆𝑡subscript^𝜃𝑡\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡subscriptsuperscriptsubscript^𝜃𝑡2subscriptsuperscript𝐆1𝑡𝜃subscript^𝜃𝑡\displaystyle\leq\mathcal{L}^{\lambda}_{t}(\hat{\theta}_{t})+\enVert{g_{t}(% \theta)-g_{t}(\hat{\theta}_{t})}^{2}_{\mathbf{G}^{-1}_{t}(\theta,\hat{\theta}_% {t})}.≤ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT . (Eq (29))

Thus we obtain:

tλ(θ)tλ(θ^t)subscriptsuperscript𝜆𝑡𝜃subscriptsuperscript𝜆𝑡subscript^𝜃𝑡\displaystyle\mathcal{L}^{\lambda}_{t}(\theta)-\mathcal{L}^{\lambda}_{t}(\hat{% \theta}_{t})caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) \enVertgt(θ)gt(θt^)𝐆t1(θ,θ^t)2absent\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡subscriptsuperscript^subscript𝜃𝑡2superscriptsubscript𝐆𝑡1𝜃subscript^𝜃𝑡\displaystyle\leq\enVert{g_{t}(\theta)-g_{t}(\hat{\theta_{t}})}^{2}_{\mathbf{G% }_{t}^{-1}(\theta,\hat{\theta}_{t})}≤ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT
\delγt2(δ)λt+γt(δ)2=βt2(δ),absent\delsuperscriptsubscript𝛾𝑡2𝛿subscript𝜆𝑡subscript𝛾𝑡superscript𝛿2subscriptsuperscript𝛽2𝑡𝛿\displaystyle\leq\del{\frac{\gamma_{t}^{2}(\delta)}{\lambda_{t}}+\gamma_{t}(% \delta)}^{2}=\beta^{2}_{t}(\delta),≤ divide start_ARG italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_δ ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) , (from Lemma 15)

where the last inequality suggests that θEt(δ)𝜃subscript𝐸𝑡𝛿\theta\in E_{t}(\delta)italic_θ ∈ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) by the definition of the set Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ). Therefore, \delt1,θEt(δ)1δformulae-sequence\delfor-all𝑡1subscript𝜃subscript𝐸𝑡𝛿1𝛿\mathbb{P}\del{\forall t\geq 1,\theta_{*}\in E_{t}(\delta)}\geq 1-\deltablackboard_P ∀ italic_t ≥ 1 , italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ≥ 1 - italic_δ. ∎

The following helper lemma, which translates the confidence set definition of Lemma 10 to the norm defined by 𝐆t1(θ1,θ2)superscriptsubscript𝐆𝑡1subscript𝜃1subscript𝜃2\mathbf{G}_{t}^{-1}(\theta_{1},\theta_{2})bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ).

Lemma 15.

Let δ(0,1]𝛿01\delta\in(0,1]italic_δ ∈ ( 0 , 1 ]. For all θCt(δ)𝜃subscript𝐶𝑡𝛿\theta\in C_{t}(\delta)italic_θ ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) and θ^tsubscript^𝜃𝑡\hat{\theta}_{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as the maximum likelihood estimate in Eq (5).

\enVertgt(θ)gt(θt^)𝐆t1(θ,θ^t)γt2(δ)λt+γt(δ).\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡subscript^subscript𝜃𝑡superscriptsubscript𝐆𝑡1𝜃subscript^𝜃𝑡superscriptsubscript𝛾𝑡2𝛿subscript𝜆𝑡subscript𝛾𝑡𝛿\enVert{g_{t}(\theta)-g_{t}(\hat{\theta_{t}})}_{\mathbf{G}_{t}^{-1}(\theta,% \hat{\theta}_{t})}\leq\frac{\gamma_{t}^{2}(\delta)}{\lambda_{t}}+\gamma_{t}(% \delta).italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ divide start_ARG italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_δ ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG + italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) .
Proof.

We have:

𝐆t\delθ,θ^tsubscript𝐆𝑡\del𝜃subscript^𝜃𝑡\displaystyle\mathbf{G}_{t}\del{\theta,\hat{\theta}_{t}}bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =s=1t1i𝒬sαi\del𝐗𝒬s,θ,θ^txs,ixs,i+λt𝐈dabsentsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript𝛼𝑖\delsubscript𝐗subscript𝒬𝑠𝜃subscript^𝜃𝑡subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\displaystyle=\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\alpha_{i}\del{\mathbf% {X}_{\mathcal{Q}_{s}},\theta,\hat{\theta}_{t}}x_{s,i}x_{s,i}^{\top}+\lambda_{t% }\mathbf{I}_{d}= ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT (def. of 𝐆t\delθ,θ^tsubscript𝐆𝑡\del𝜃subscript^𝜃𝑡\mathbf{G}_{t}\del{\theta,\hat{\theta}_{t}}bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT)
s=1t1i𝒬sμ˙i(𝐗𝒬sθ)\del1+|xs,iθxs,iθ^t|1xs,ixs,i+λt𝐈dabsentsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃\del1superscriptsuperscriptsubscript𝑥𝑠𝑖top𝜃superscriptsubscript𝑥𝑠𝑖topsubscript^𝜃𝑡1subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\displaystyle\geq\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{\mu}_{i}(% \mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta)\del{1+|x_{s,i}^{\top}\theta-x_{s,i}% ^{\top}\hat{\theta}_{t}|}^{-1}x_{s,i}x_{s,i}^{\top}+\lambda_{t}\mathbf{I}_{d}≥ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) 1 + | italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ - italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT (from Lemma 12)
s=1t1i𝒬sμ˙i(𝐗𝒬sθ)\del1+\enVertxs,i𝐆t1\delθ,θ^t\enVertθθ^t𝐆t\delθ,θ^t1xs,ixs,iabsentsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃\del1\enVertsubscriptsubscript𝑥𝑠𝑖subscriptsuperscript𝐆1𝑡\del𝜃subscript^𝜃𝑡\enVert𝜃superscriptsubscriptsubscript^𝜃𝑡subscript𝐆𝑡\del𝜃subscript^𝜃𝑡1subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖top\displaystyle\geq\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{\mu}_{i}(% \mathbf{X}_{\mathcal{Q}_{s}}^{\top}\theta)\del{1+\enVert{x_{s,i}}_{\mathbf{G}^% {-1}_{t}\del{\theta,\hat{\theta}_{t}}}\enVert{\theta-\hat{\theta}_{t}}_{% \mathbf{G}_{t}\del{\theta,\hat{\theta}_{t}}}}^{-1}x_{s,i}x_{s,i}^{\top}≥ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) 1 + italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
+λt𝐈dsubscript𝜆𝑡subscript𝐈𝑑\displaystyle+\lambda_{t}\mathbf{I}_{d}+ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT (Cauchy-Schwarz inequality)
\del1+λt1/2\enVertθθ^t𝐆t\delθ,θ^t1s=1t1i𝒬sμ˙i(𝐗𝒬sθ)xs,ixs,i+λt𝐈dabsent\del1superscriptsubscript𝜆𝑡12\enVert𝜃superscriptsubscriptsubscript^𝜃𝑡subscript𝐆𝑡\del𝜃subscript^𝜃𝑡1superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\displaystyle\geq\del{1+\lambda_{t}^{-\nicefrac{{1}}{{2}}}\enVert{\theta-\hat{% \theta}_{t}}_{\mathbf{G}_{t}\del{\theta,\hat{\theta}_{t}}}}^{-1}\sum_{s=1}^{t-% 1}\sum_{i\in\mathcal{Q}_{s}}\dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{\top}% \theta)x_{s,i}x_{s,i}^{\top}+\lambda_{t}\mathbf{I}_{d}≥ 1 + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT (𝐆t(θ,θ^t)λt𝐈dsucceeds-or-equalssubscript𝐆𝑡𝜃subscript^𝜃𝑡subscript𝜆𝑡subscript𝐈𝑑\mathbf{G}_{t}(\theta,\hat{\theta}_{t})\succeq\lambda_{t}\mathbf{I}_{d}bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⪰ italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT)
\del1+λt1/2\enVertθθ^t𝐆t\delθ,θ^t1\dels=1t1i𝒬sμ˙i(𝐗𝒬sθ)xs,ixs,i+λt𝐈dabsent\del1superscriptsubscript𝜆𝑡12\enVert𝜃superscriptsubscriptsubscript^𝜃𝑡subscript𝐆𝑡\del𝜃subscript^𝜃𝑡1\delsuperscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠subscript˙𝜇𝑖superscriptsubscript𝐗subscript𝒬𝑠top𝜃subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\displaystyle\geq\del{1+\lambda_{t}^{-\nicefrac{{1}}{{2}}}\enVert{\theta-\hat{% \theta}_{t}}_{\mathbf{G}_{t}\del{\theta,\hat{\theta}_{t}}}}^{-1}\del{\sum_{s=1% }^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{\mu}_{i}(\mathbf{X}_{\mathcal{Q}_{s}}^{% \top}\theta)x_{s,i}x_{s,i}^{\top}+\lambda_{t}\mathbf{I}_{d}}≥ 1 + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_θ ) italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
=\del1+λt1/2\enVertθθ^t𝐆t\delθ,θ^t1𝐇t\delθabsent\del1superscriptsubscript𝜆𝑡12\enVert𝜃superscriptsubscriptsubscript^𝜃𝑡subscript𝐆𝑡\del𝜃subscript^𝜃𝑡1subscript𝐇𝑡\del𝜃\displaystyle=\del{1+\lambda_{t}^{-\nicefrac{{1}}{{2}}}\enVert{\theta-\hat{% \theta}_{t}}_{\mathbf{G}_{t}\del{\theta,\hat{\theta}_{t}}}}^{-1}\mathbf{H}_{t}% \del{\theta}= 1 + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_θ - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ (def. of 𝐇t\delθsubscript𝐇𝑡\del𝜃\mathbf{H}_{t}\del{\theta}bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ)
=\del1+λt1/2\enVertgt(θ)gt(θ^t)𝐆t1\delθ,θ^t1𝐇t\delθ,absent\del1superscriptsubscript𝜆𝑡12\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡superscriptsubscriptsubscript^𝜃𝑡superscriptsubscript𝐆𝑡1\del𝜃subscript^𝜃𝑡1subscript𝐇𝑡\del𝜃\displaystyle=\del{1+\lambda_{t}^{-\nicefrac{{1}}{{2}}}\enVert{g_{t}(\theta)-g% _{t}(\hat{\theta}_{t})}_{\mathbf{G}_{t}^{-1}\del{\theta,\hat{\theta}_{t}}}}^{-% 1}\mathbf{H}_{t}\del{\theta},= 1 + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , (from Eq (29))

where

𝐆t\delθ,θ^t\del1+λt1/2\enVertgt(θ)gt(θ^t)𝐆t1\delθ,θ^t1𝐇t\delθsucceeds-or-equalssubscript𝐆𝑡\del𝜃subscript^𝜃𝑡\del1superscriptsubscript𝜆𝑡12\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡superscriptsubscriptsubscript^𝜃𝑡superscriptsubscript𝐆𝑡1\del𝜃subscript^𝜃𝑡1subscript𝐇𝑡\del𝜃\mathbf{G}_{t}\del{\theta,\hat{\theta}_{t}}\succeq\del{1+\lambda_{t}^{-% \nicefrac{{1}}{{2}}}\enVert{g_{t}(\theta)-g_{t}(\hat{\theta}_{t})}_{\mathbf{G}% _{t}^{-1}\del{\theta,\hat{\theta}_{t}}}}^{-1}\mathbf{H}_{t}\del{\theta}bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⪰ 1 + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_θ

is analogous to local information containing counterpart of the relation in Lemma 4. This gives:

\enVertgt(θ)gt(θ^t)𝐆t1\delθ,θ^t2\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡subscriptsuperscriptsubscript^𝜃𝑡2superscriptsubscript𝐆𝑡1\del𝜃subscript^𝜃𝑡\displaystyle\enVert{g_{t}(\theta)-g_{t}(\hat{\theta}_{t})}^{2}_{\mathbf{G}_{t% }^{-1}\del{\theta,\hat{\theta}_{t}}}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT
\displaystyle\leq \del1+λt1/2\enVertgt(θ)gt(θ^t)𝐆t1\delθ,θ^t1\enVertgt(θ)gt(θ^t)𝐇t1\delθ2\del1superscriptsubscript𝜆𝑡12\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡superscriptsubscriptsubscript^𝜃𝑡superscriptsubscript𝐆𝑡1\del𝜃subscript^𝜃𝑡1\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡subscriptsuperscriptsubscript^𝜃𝑡2superscriptsubscript𝐇𝑡1\del𝜃\displaystyle\del{1+\lambda_{t}^{-\nicefrac{{1}}{{2}}}\enVert{g_{t}(\theta)-g_% {t}(\hat{\theta}_{t})}_{\mathbf{G}_{t}^{-1}\del{\theta,\hat{\theta}_{t}}}}^{-1% }\enVert{g_{t}(\theta)-g_{t}(\hat{\theta}_{t})}^{2}_{\mathbf{H}_{t}^{-1}\del{% \theta}}1 + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ end_POSTSUBSCRIPT
\displaystyle\leq λt1/2γt2(δ)\enVertgt(θ)gt(θ^t)𝐆t1\delθ,θ^t+γt2(δ),superscriptsubscript𝜆𝑡12superscriptsubscript𝛾𝑡2𝛿\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡subscriptsubscript^𝜃𝑡superscriptsubscript𝐆𝑡1\del𝜃subscript^𝜃𝑡subscriptsuperscript𝛾2𝑡𝛿\displaystyle\lambda_{t}^{-\nicefrac{{1}}{{2}}}\gamma_{t}^{2}(\delta)\enVert{g% _{t}(\theta)-g_{t}(\hat{\theta}_{t})}_{\mathbf{G}_{t}^{-1}\del{\theta,\hat{% \theta}_{t}}}+\gamma^{2}_{t}(\delta),italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_δ ) italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) , (from Lemma 10)

where the last relation is a quadratic inequality in \enVertgt(θ)gt(θ^t)𝐆t1\delθ,θ^t\enVertsubscript𝑔𝑡𝜃subscript𝑔𝑡subscriptsubscript^𝜃𝑡superscriptsubscript𝐆𝑡1\del𝜃subscript^𝜃𝑡\enVert{g_{t}(\theta)-g_{t}(\hat{\theta}_{t})}_{\mathbf{G}_{t}^{-1}\del{\theta% ,\hat{\theta}_{t}}}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_G start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ , over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT, which on solving completes the proof of the statement in the lemma. ∎

Lemma 9.

Under the event θCt(δ)subscript𝜃subscript𝐶𝑡𝛿\theta_{*}\,\in\,C_{t}(\delta)italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ), the following holds θEt(δ)for-all𝜃subscript𝐸𝑡𝛿\forall\,\theta\,\in\,E_{t}(\delta)∀ italic_θ ∈ italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ):

\enVertθθ𝐇t(θ)(2+2S)γt(δ)+21+Sβt(δ).\enVert𝜃subscriptsubscript𝜃subscript𝐇𝑡subscript𝜃22𝑆subscript𝛾𝑡𝛿21𝑆subscript𝛽𝑡𝛿\displaystyle\enVert{\theta-\theta_{*}}_{\mathbf{H}_{t}(\theta_{*})}\leq(2+2S)% \gamma_{t}(\delta)+2\sqrt{1+S}\beta_{t}(\delta).italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ ( 2 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) + 2 square-root start_ARG 1 + italic_S end_ARG italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) .

When λt=dlog(t)subscript𝜆𝑡𝑑𝑡\lambda_{t}=d\log(t)italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_d roman_log ( italic_t ), then γt(δ)=O~\deldlog(t)subscript𝛾𝑡𝛿~O\del𝑑𝑡\gamma_{t}(\delta)=\tilde{\mathrm{O}}\del{\sqrt{d\log(t)}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) = over~ start_ARG roman_O end_ARG square-root start_ARG italic_d roman_log ( italic_t ) end_ARG, βt(δ)=O~\deldlog(t)subscript𝛽𝑡𝛿~O\del𝑑𝑡\beta_{t}(\delta)=\tilde{\mathrm{O}}\del{\sqrt{d\log(t)}}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) = over~ start_ARG roman_O end_ARG square-root start_ARG italic_d roman_log ( italic_t ) end_ARG, and

\enVertθθ𝐇t(θ)=O~\deldlog(t).\enVert𝜃subscriptsubscript𝜃subscript𝐇𝑡subscript𝜃~O\del𝑑𝑡\enVert{\theta-\theta_{*}}_{\mathbf{H}_{t}(\theta_{*})}=\tilde{\mathrm{O}}\del% {\sqrt{d\log(t)}}.italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT = over~ start_ARG roman_O end_ARG square-root start_ARG italic_d roman_log ( italic_t ) end_ARG .
Proof.

Second-order Taylor expansion of the log-likelihood function with integral remainder term gives:

tλ(θ)=subscriptsuperscript𝜆𝑡𝜃absent\displaystyle\mathcal{L}^{\lambda}_{t}(\theta)=caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) = tλ(θ)+tλ(θ^t)(θθ)subscriptsuperscript𝜆𝑡subscript𝜃subscriptsuperscript𝜆𝑡superscriptsubscript^𝜃𝑡top𝜃subscript𝜃\displaystyle\mathcal{L}^{\lambda}_{t}(\theta_{*})+\nabla\mathcal{L}^{\lambda}% _{t}(\hat{\theta}_{t})^{\top}(\theta-\theta_{*})caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + ∇ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT )
+(θθ)\delν=01(1ν)2tλ(θ+ν(θθ))𝑑ν(θθ)𝜃subscript𝜃\delsubscriptsuperscript1𝜈01𝜈superscript2subscriptsuperscript𝜆𝑡subscript𝜃𝜈𝜃subscript𝜃differential-d𝜈𝜃subscript𝜃\displaystyle+(\theta-\theta_{*})\del{\int^{1}_{\nu=0}(1-\nu)\nabla^{2}% \mathcal{L}^{\lambda}_{t}(\theta_{*}+\nu(\theta-\theta_{*}))\cdot d\nu}(\theta% -\theta_{*})+ ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ∫ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT ( 1 - italic_ν ) ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + italic_ν ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ) ⋅ italic_d italic_ν ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT )
=\displaystyle== tλ(θ)+tλ(θ^t)(θθ)+\enVertθθ𝐆~t(θ,θ)2,subscriptsuperscript𝜆𝑡subscript𝜃subscriptsuperscript𝜆𝑡superscriptsubscript^𝜃𝑡top𝜃subscript𝜃\enVert𝜃subscriptsuperscriptsubscript𝜃2subscript~𝐆𝑡subscript𝜃𝜃\displaystyle\mathcal{L}^{\lambda}_{t}(\theta_{*})+\nabla\mathcal{L}^{\lambda}% _{t}(\hat{\theta}_{t})^{\top}(\theta-\theta_{*})+\enVert{\theta-\theta_{*}}^{2% }_{\mathbf{\tilde{G}}_{t}(\theta_{*},\theta)},caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + ∇ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over~ start_ARG bold_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) end_POSTSUBSCRIPT ,

where 𝐆~t(θ,θ)=(θθ)\delν=01(1ν)𝐇t(θ+ν(θθ))𝑑ν(θθ)subscript~𝐆𝑡subscript𝜃𝜃𝜃subscript𝜃\delsubscriptsuperscript1𝜈01𝜈subscript𝐇𝑡subscript𝜃𝜈𝜃subscript𝜃differential-d𝜈𝜃subscript𝜃\mathbf{\tilde{G}}_{t}(\theta_{*},\theta)=(\theta-\theta_{*})\del{\int^{1}_{% \nu=0}(1-\nu)\mathbf{H}_{t}(\theta_{*}+\nu(\theta-\theta_{*}))\cdot d\nu}(% \theta-\theta_{*})over~ start_ARG bold_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) = ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ∫ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ν = 0 end_POSTSUBSCRIPT ( 1 - italic_ν ) bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT + italic_ν ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) ) ⋅ italic_d italic_ν ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) . From Lemma 4 and Lemma 8 of Abeille et al. [2021] is also follows that

\enVertθθ𝐆~t(θ,θ)2(2+2S)1\enVertθθ𝐇t(θ)2\enVert𝜃subscriptsuperscriptsubscript𝜃2subscript~𝐆𝑡subscript𝜃𝜃superscript22𝑆1\enVert𝜃subscriptsuperscriptsubscript𝜃2subscript𝐇𝑡subscript𝜃\enVert{\theta-\theta_{*}}^{2}_{\mathbf{\tilde{G}}_{t}(\theta_{*},\theta)}\geq% (2+2S)^{-1}\enVert{\theta-\theta_{*}}^{2}_{\mathbf{H}_{t}(\theta_{*})}italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over~ start_ARG bold_G end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT , italic_θ ) end_POSTSUBSCRIPT ≥ ( 2 + 2 italic_S ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT

Therefore we have:

\enVertθθ𝐇t(θ)2\enVert𝜃subscriptsuperscriptsubscript𝜃2subscript𝐇𝑡subscript𝜃\displaystyle\enVert{\theta-\theta_{*}}^{2}_{\mathbf{H}_{t}(\theta_{*})}italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT
\displaystyle\leq (2+2S)\enverttλ(θ)tλ(θ)+(2+2S)\enverttλ(θ^t)(θθ)22𝑆\envertsubscriptsuperscript𝜆𝑡𝜃subscriptsuperscript𝜆𝑡subscript𝜃22𝑆\envertsubscriptsuperscript𝜆𝑡superscriptsubscript^𝜃𝑡top𝜃subscript𝜃\displaystyle(2+2S)\envert{\mathcal{L}^{\lambda}_{t}(\theta)-\mathcal{L}^{% \lambda}_{t}(\theta_{*})}+(2+2S)\envert{\nabla\mathcal{L}^{\lambda}_{t}(\hat{% \theta}_{t})^{\top}(\theta-\theta_{*})}( 2 + 2 italic_S ) caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) - caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) + ( 2 + 2 italic_S ) ∇ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT )
\displaystyle\leq 2(2+2S)βt2(δ)+(2+2S)\enverttλ(θ^t)(θθ)222𝑆superscriptsubscript𝛽𝑡2𝛿22𝑆\envertsubscriptsuperscript𝜆𝑡superscriptsubscript^𝜃𝑡top𝜃subscript𝜃\displaystyle 2(2+2S)\beta_{t}^{2}(\delta)+(2+2S)\envert{\nabla\mathcal{L}^{% \lambda}_{t}(\hat{\theta}_{t})^{\top}(\theta-\theta_{*})}2 ( 2 + 2 italic_S ) italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_δ ) + ( 2 + 2 italic_S ) ∇ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) (def. of Et(δ)subscript𝐸𝑡𝛿E_{t}(\delta)italic_E start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ))
\displaystyle\leq 2(2+2S)βt2(δ)+(2+2S)\enVerttλ(θ^t)𝐇t1(θ)\enVert(θθ)𝐇t(θ)222𝑆superscriptsubscript𝛽𝑡2𝛿22𝑆\enVertsubscriptsuperscript𝜆𝑡subscriptsubscript^𝜃𝑡superscriptsubscript𝐇𝑡1subscript𝜃\enVertsubscript𝜃subscript𝜃subscript𝐇𝑡subscript𝜃\displaystyle 2(2+2S)\beta_{t}^{2}(\delta)+(2+2S)\enVert{\nabla\mathcal{L}^{% \lambda}_{t}(\hat{\theta}_{t})}_{\mathbf{H}_{t}^{-1}(\theta_{*})}\enVert{(% \theta-\theta_{*})}_{\mathbf{H}_{t}(\theta_{*})}2 ( 2 + 2 italic_S ) italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_δ ) + ( 2 + 2 italic_S ) ∇ caligraphic_L start_POSTSUPERSCRIPT italic_λ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT (Cauchy-Schwarz inequality)
\displaystyle\leq 2(2+2S)βt2(δ)+(2+2S)γt(δ)\enVert(θθ)𝐇t(θ).222𝑆superscriptsubscript𝛽𝑡2𝛿22𝑆subscript𝛾𝑡𝛿\enVertsubscript𝜃subscript𝜃subscript𝐇𝑡subscript𝜃\displaystyle 2(2+2S)\beta_{t}^{2}(\delta)+(2+2S)\gamma_{t}(\delta)\enVert{(% \theta-\theta_{*})}_{\mathbf{H}_{t}(\theta_{*})}.2 ( 2 + 2 italic_S ) italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_δ ) + ( 2 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) ( italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT .

Solving the quadratic inequality in \enVertθθ𝐇t(θ)\enVert𝜃subscriptsubscript𝜃subscript𝐇𝑡subscript𝜃\enVert{\theta-\theta_{*}}_{\mathbf{H}_{t}(\theta_{*})}italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT, we get:

\enVertθθ𝐇t(θ)(2+2S)γt(δ)+21+Sβt(δ).\enVert𝜃subscriptsubscript𝜃subscript𝐇𝑡subscript𝜃22𝑆subscript𝛾𝑡𝛿21𝑆subscript𝛽𝑡𝛿\displaystyle\enVert{\theta-\theta_{*}}_{\mathbf{H}_{t}(\theta_{*})}\leq(2+2S)% \gamma_{t}(\delta)+2\sqrt{1+S}\beta_{t}(\delta).italic_θ - italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT start_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ ( 2 + 2 italic_S ) italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) + 2 square-root start_ARG 1 + italic_S end_ARG italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) .

When λt=dlog(t)subscript𝜆𝑡𝑑𝑡\lambda_{t}=d\log(t)italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_d roman_log ( italic_t ), then γt(δ)=O~\deldlog(t)subscript𝛾𝑡𝛿~O\del𝑑𝑡\gamma_{t}(\delta)=\tilde{\mathrm{O}}\del{\sqrt{d\log(t)}}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) = over~ start_ARG roman_O end_ARG square-root start_ARG italic_d roman_log ( italic_t ) end_ARG and βt(δ)=O~\deldlog(t)subscript𝛽𝑡𝛿~O\del𝑑𝑡\beta_{t}(\delta)=\tilde{\mathrm{O}}\del{\sqrt{d\log(t)}}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_δ ) = over~ start_ARG roman_O end_ARG square-root start_ARG italic_d roman_log ( italic_t ) end_ARG. ∎

A.7 Technical lemmas

Remark 3 (Derivatives for MNL choice function).

For the multinomial logit choice function, where the expected reward due to item i𝑖iitalic_i of the assortment Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is modeled as:

fi(St,𝐫)=eri1+eri+jSt,jiKerjsubscript𝑓𝑖subscript𝑆𝑡𝐫superscript𝑒subscript𝑟𝑖1superscript𝑒subscript𝑟𝑖superscriptsubscriptformulae-sequence𝑗subscript𝑆𝑡𝑗𝑖𝐾superscript𝑒subscript𝑟𝑗f_{i}(S_{t},\mathbf{r})=\frac{e^{r_{i}}}{1+e^{r_{i}}+\sum_{j\,\in\,S_{t},\,j% \neq i}^{K}e^{r_{j}}}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_r ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_j ≠ italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG

the partial derivative with respect to the expected reward of ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT item is given as:

firi=fi(St,𝐫)\del1fi(St,𝐫)subscript𝑓𝑖subscript𝑟𝑖subscript𝑓𝑖subscript𝑆𝑡𝐫\del1subscript𝑓𝑖subscript𝑆𝑡𝐫\frac{\partial f_{i}}{\partial r_{i}}=f_{i}(S_{t},\mathbf{r})\del{1-f_{i}(S_{t% },\mathbf{r})}divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_r ) 1 - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_r )

and the double derivative as:

2firi2=fi(St,𝐫)\del1fi(St,𝐫)\del12fi(St,𝐫).superscript2subscript𝑓𝑖superscriptsubscript𝑟𝑖2subscript𝑓𝑖subscript𝑆𝑡𝐫\del1subscript𝑓𝑖subscript𝑆𝑡𝐫\del12subscript𝑓𝑖subscript𝑆𝑡𝐫\frac{\partial^{2}f_{i}}{\partial r_{i}^{2}}=f_{i}(S_{t},\mathbf{r})\del{1-f_{% i}(S_{t},\mathbf{r})}\del{1-2f_{i}(S_{t},\mathbf{r})}.divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_r ) 1 - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_r ) 1 - 2 italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_r ) .
Lemma 16 (Self-Concordance like relation for MNL).

For the multinomial logit choice function, where the expected reward due to item i𝑖iitalic_i of the assortment Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is modeled as:

fi(St,𝐫)=eri1+eri+jSt,jiKerjsubscript𝑓𝑖subscript𝑆𝑡𝐫superscript𝑒subscript𝑟𝑖1superscript𝑒subscript𝑟𝑖superscriptsubscriptformulae-sequence𝑗subscript𝑆𝑡𝑗𝑖𝐾superscript𝑒subscript𝑟𝑗f_{i}(S_{t},\mathbf{r})=\frac{e^{r_{i}}}{1+e^{r_{i}}+\sum_{j\,\in\,S_{t},\,j% \neq i}^{K}e^{r_{j}}}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_r ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j ∈ italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_j ≠ italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG

the following relation holds:

\envert2firi2firi\envertsuperscript2subscript𝑓𝑖superscriptsubscript𝑟𝑖2subscript𝑓𝑖subscript𝑟𝑖\envert{\frac{\partial^{2}f_{i}}{\partial r_{i}^{2}}}\leq\frac{\partial f_{i}}% {\partial r_{i}}divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG ∂ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG
Proof.

The proof directly follows from Remark 3 and the observatio \envert12fi(St,𝐫)1\envert12subscript𝑓𝑖subscript𝑆𝑡𝐫1\envert{1-2f_{i}(S_{t},\mathbf{r})}\leq 11 - 2 italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_r ) ≤ 1 for all items, i𝑖iitalic_i in the assortment choice. ∎

Lemma 17 (Generalized elliptical potential).

Let \cbr𝐗𝒬s\cbrsubscript𝐗subscript𝒬𝑠\cbr{\mathbf{X}_{\mathcal{Q}_{s}}}bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT be a sequence in d×Ksuperscript𝑑𝐾\mathbb{R}^{d\times K}blackboard_R start_POSTSUPERSCRIPT italic_d × italic_K end_POSTSUPERSCRIPT such that for each s𝑠sitalic_s, 𝐗𝒬ssubscript𝐗subscript𝒬𝑠\mathbf{X}_{\mathcal{Q}_{s}}bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT has columns as {xs,1,xs,2,,xs,K}subscript𝑥𝑠1subscript𝑥𝑠2subscript𝑥𝑠𝐾\{x_{s,1},x_{s,2},\cdots,x_{s,K}\}{ italic_x start_POSTSUBSCRIPT italic_s , 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_s , 2 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_s , italic_K end_POSTSUBSCRIPT } where \enVertxs,i2w,d\enVert{x_{s,i}}_{2}\leq w,\,\in\,\mathbb{R}^{d}italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_w , ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for all s1𝑠1s\geq 1italic_s ≥ 1 and i[K]𝑖delimited-[]𝐾i\,\in\,[K]italic_i ∈ [ italic_K ]. Also, let λtsubscript𝜆𝑡\lambda_{t}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be a non-negative scalar. For t1𝑡1t\geq 1italic_t ≥ 1, define 𝐉ts=1t1i𝒬sμi¯˙\del𝐗𝒬sxs,ixs,i+λt𝐈dsubscript𝐉𝑡superscriptsubscript𝑠1𝑡1subscript𝑖subscript𝒬𝑠˙¯subscript𝜇𝑖\delsubscript𝐗subscript𝒬𝑠subscript𝑥𝑠𝑖superscriptsubscript𝑥𝑠𝑖topsubscript𝜆𝑡subscript𝐈𝑑\mathbf{J}_{t}\coloneqq\sum_{s=1}^{t-1}\sum_{i\in\mathcal{Q}_{s}}\dot{% \underline{\mu_{i}}}\del{\mathbf{X}_{\mathcal{Q}_{s}}}x_{s,i}x_{s,i}^{\top}+% \lambda_{t}\mathbf{I}_{d}bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT over˙ start_ARG under¯ start_ARG italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT where μi¯˙\del𝐗𝒬s˙¯subscript𝜇𝑖\delsubscript𝐗subscript𝒬𝑠\dot{\underline{\mu_{i}}}\del{\mathbf{X}_{\mathcal{Q}_{s}}}over˙ start_ARG under¯ start_ARG italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT is strictly positive for all i,,[K]𝑖delimited-[]𝐾i,\in,[K]italic_i , ∈ , [ italic_K ]. Then the following inequality holds:

t=1Tmin\cbri𝒬t\enVertx~t,i𝐉t12,12log\deldet(𝐉T+1)λtdsuperscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript~𝑥𝑡𝑖2subscriptsuperscript𝐉1𝑡12\delsubscript𝐉𝑇1superscriptsubscript𝜆𝑡𝑑\sum_{t=1}^{T}\min\cbr{\sum_{i\in\mathcal{Q}_{t}}\enVert{\tilde{x}_{t,i}}^{2}_% {\mathbf{J}^{-1}_{t}},1}\leq 2\log\del{\frac{\det(\mathbf{J}_{T+1})}{\lambda_{% t}^{d}}}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 1 ≤ 2 roman_log divide start_ARG roman_det ( bold_J start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG

with x~t,i=μi¯˙\del𝐗𝒬sxs,isubscript~𝑥𝑡𝑖˙¯subscript𝜇𝑖\delsubscript𝐗subscript𝒬𝑠subscript𝑥𝑠𝑖\tilde{x}_{t,i}=\sqrt{\dot{\underline{\mu_{i}}}\del{\mathbf{X}_{\mathcal{Q}_{s% }}}}x_{s,i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT = square-root start_ARG over˙ start_ARG under¯ start_ARG italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG bold_X start_POSTSUBSCRIPT caligraphic_Q start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG italic_x start_POSTSUBSCRIPT italic_s , italic_i end_POSTSUBSCRIPT.

Proof.

By the definition of 𝐉tsubscript𝐉𝑡\mathbf{J}_{t}bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT:

det\del𝐉t+1\delsubscript𝐉𝑡1\displaystyle\det\del{\mathbf{J}_{t+1}}roman_det bold_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT =det\del𝐉t+i𝒬tx~t,ix~t,iabsent\delsubscript𝐉𝑡subscript𝑖subscript𝒬𝑡subscript~𝑥𝑡𝑖superscriptsubscript~𝑥𝑡𝑖top\displaystyle=\det\del{\mathbf{J}_{t}+\sum_{i\in\mathcal{Q}_{t}}\tilde{x}_{t,i% }\tilde{x}_{t,i}^{\top}}= roman_det bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT
=det\del𝐉tdet\del𝐈d+𝐉t1/2i𝒬tx~t,ix~t,i𝐉t1/2absent\delsubscript𝐉𝑡\delsubscript𝐈𝑑superscriptsubscript𝐉𝑡12subscript𝑖subscript𝒬𝑡subscript~𝑥𝑡𝑖superscriptsubscript~𝑥𝑡𝑖topsuperscriptsubscript𝐉𝑡12\displaystyle=\det\del{\mathbf{J}_{t}}\det\del{\mathbf{I}_{d}+\mathbf{J}_{t}^{% -\nicefrac{{1}}{{2}}}\sum_{i\in\mathcal{Q}_{t}}\tilde{x}_{t,i}\tilde{x}_{t,i}^% {\top}\mathbf{J}_{t}^{-\nicefrac{{1}}{{2}}}}= roman_det bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT roman_det bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - / start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT
=det\del𝐉t\del1+i𝒬t\enVertx~t,i𝐉t12.absent\delsubscript𝐉𝑡\del1subscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript~𝑥𝑡𝑖2subscriptsuperscript𝐉1𝑡\displaystyle=\det\del{\mathbf{J}_{t}}\del{1+\sum_{i\in\mathcal{Q}_{t}}\enVert% {\tilde{x}_{t,i}}^{2}_{\mathbf{J}^{-1}_{t}}}.= roman_det bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT 1 + ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

Taking log from both sides and summing from t=1𝑡1t=1italic_t = 1 to T𝑇Titalic_T:

t=1Tlog\del1+i𝒬t\enVertx~t,i𝐉t12superscriptsubscript𝑡1𝑇\del1subscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript~𝑥𝑡𝑖2subscriptsuperscript𝐉1𝑡\displaystyle\sum_{t=1}^{T}\log\del{1+\sum_{i\in\mathcal{Q}_{t}}\enVert{\tilde% {x}_{t,i}}^{2}_{\mathbf{J}^{-1}_{t}}}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_log 1 + ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT =t=1Tlog\deldet(𝐉t+1)log\deldet(𝐉t)absentsuperscriptsubscript𝑡1𝑇\delsubscript𝐉𝑡1\delsubscript𝐉𝑡\displaystyle=\sum_{t=1}^{T}\log\del{\det(\mathbf{J}_{t+1})}-\log\del{\det(% \mathbf{J}_{t})}= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_log roman_det ( bold_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) - roman_log roman_det ( bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
=t=1Tlog\del𝐉t+1𝐉tabsentsuperscriptsubscript𝑡1𝑇\delsubscript𝐉𝑡1subscript𝐉𝑡\displaystyle=\sum_{t=1}^{T}\log\del{\frac{\mathbf{J}_{t+1}}{\mathbf{J}_{t}}}= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_log divide start_ARG bold_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT end_ARG start_ARG bold_J start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG
=log\deldet(𝐉t+1)det(λt𝐈d)absent\delsubscript𝐉𝑡1subscript𝜆𝑡subscript𝐈𝑑\displaystyle=\log\del{\frac{\det(\mathbf{J}_{t+1})}{\det(\lambda_{t}\mathbf{I% }_{d})}}= roman_log divide start_ARG roman_det ( bold_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG roman_det ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) end_ARG (By a telescopic sum cancellation)
=log\deldet(𝐉t+1)λtd.absent\delsubscript𝐉𝑡1superscriptsubscript𝜆𝑡𝑑\displaystyle=\log\del{\frac{\det(\mathbf{J}_{t+1})}{\lambda_{t}^{d}}}.= roman_log divide start_ARG roman_det ( bold_J start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG . (44)

For any a𝑎aitalic_a such that 0a10𝑎10\leq a\leq 10 ≤ italic_a ≤ 1, it follows that a2log(1+a)𝑎21𝑎a\leq 2\log(1+a)italic_a ≤ 2 roman_log ( 1 + italic_a ). Therefore, we write:

t=1Tmin\cbri𝒬t\enVertx~t,i𝐉t12,1superscriptsubscript𝑡1𝑇\cbrsubscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript~𝑥𝑡𝑖2subscriptsuperscript𝐉1𝑡1\displaystyle\sum_{t=1}^{T}\min\cbr{\sum_{i\in\mathcal{Q}_{t}}\enVert{\tilde{x% }_{t,i}}^{2}_{\mathbf{J}^{-1}_{t}},1}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT , 1 2t=1Tlog\del1+i𝒬t\enVertx~t,i𝐉t12absent2superscriptsubscript𝑡1𝑇\del1subscript𝑖subscript𝒬𝑡\enVertsubscriptsuperscriptsubscript~𝑥𝑡𝑖2subscriptsuperscript𝐉1𝑡\displaystyle\leq 2\sum_{t=1}^{T}\log\del{1+\sum_{i\in\mathcal{Q}_{t}}\enVert{% \tilde{x}_{t,i}}^{2}_{\mathbf{J}^{-1}_{t}}}≤ 2 ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT roman_log 1 + ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_Q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT
=2log\deldet(𝐉T+1)λtd.absent2\delsubscript𝐉𝑇1superscriptsubscript𝜆𝑡𝑑\displaystyle=2\log\del{\frac{\det(\mathbf{J}_{T+1})}{\lambda_{t}^{d}}}.= 2 roman_log divide start_ARG roman_det ( bold_J start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_ARG . (From Eq (44))

Lemma 18 (Determinant-trace inequality, see Lemma 10 in Abbasi-Yadkori et al. [2011]).

Let {xs}s=1superscriptsubscriptsubscript𝑥𝑠𝑠1\{x_{s}\}_{s=1}^{\infty}{ italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT a sequence in dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT such that \enVertxs2X\enVertsubscriptsubscript𝑥𝑠2𝑋\enVert{x_{s}}_{2}\leq Xitalic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_X for all s𝑠s\in\mathbb{N}italic_s ∈ blackboard_N, and let λtsubscript𝜆𝑡\lambda_{t}italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT be a non-negative scalar. For t1𝑡1t\geq 1italic_t ≥ 1 define 𝐕ts=1t1xsxs+λt𝐈dsubscript𝐕𝑡superscriptsubscript𝑠1𝑡1subscript𝑥𝑠superscriptsubscript𝑥𝑠topsubscript𝜆𝑡subscript𝐈𝑑\mathbf{V}_{t}\coloneqq\sum_{s=1}^{t-1}x_{s}x_{s}^{\top}+\lambda_{t}\mathbf{I}% _{d}bold_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≔ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT. The following inequality holds:

det(𝐕t+1)(λt+tX2/d)d.subscript𝐕𝑡1superscriptsubscript𝜆𝑡𝑡superscript𝑋2𝑑𝑑\displaystyle\det(\mathbf{V}_{t+1})\leq\left(\lambda_{t}+tX^{2}/d\right)^{d}.roman_det ( bold_V start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ≤ ( italic_λ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_t italic_X start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_d ) start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

A.8 Numerical experiments

We build on Section 5 and compare the empirical performance of our proposed algorithm CB-MNL with the previous state of the art in the MNL contextual bandit literature: UCB-MNL[Oh & Iyengar, 2021] and TS-MNL[Oh & Iyengar, 2019] on artificial data for varying model parameters. θdsubscript𝜃superscript𝑑\theta_{*}\in\mathbb{R}^{d}italic_θ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is dlimit-from𝑑d-italic_d -dimensional uniformly random variable with each coordinate in [0,1]01[0,1][ 0 , 1 ] independently and uniformly distributed. The contexts follow multivariate Gaussian distribution. Algorithm CB-MNL only knows the value of N,T,K,d𝑁𝑇𝐾𝑑N,T,K,ditalic_N , italic_T , italic_K , italic_d. Besides, algorithms TS-MNLand UCB-MNL also need to know the value of κ𝜅\kappaitalic_κ for their implementation. Here we simulate for two additional parameter instances again averaged over 25252525 Monte Carlo runs.

Refer to caption
Refer to caption
Figure 3: Comparison of cumulative regret for two additional parameter instance ( left: κ\delTmuch-greater-than𝜅\del𝑇\kappa\gg\del{\sqrt{T}}italic_κ ≫ square-root start_ARG italic_T end_ARG, N=10,d=3,K=6,T=100formulae-sequence𝑁10formulae-sequence𝑑3formulae-sequence𝐾6𝑇100N=10,d=3,K=6,T=100italic_N = 10 , italic_d = 3 , italic_K = 6 , italic_T = 100; right: κ\delTmuch-greater-than𝜅\del𝑇\kappa\gg\del{\sqrt{T}}italic_κ ≫ square-root start_ARG italic_T end_ARG, N=20,d=3,K=5,T=100formulae-sequence𝑁20formulae-sequence𝑑3formulae-sequence𝐾5𝑇100N=20,d=3,K=5,T=100italic_N = 20 , italic_d = 3 , italic_K = 5 , italic_T = 100