HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: color-edits

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2402.17720v1 [cs.LG] 27 Feb 2024
\addauthor

cymagenta \addauthorabcyan \addauthorsbred

The 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART Approach to Instance-Optimal Online Learning

Siddhartha Banerjee
ORIE, Cornell
[email protected]
   Alankrita Bhatt
CMS, Caltech
[email protected]
   Christina Lee Yu
ORIE, Cornell
[email protected]
Abstract

We devise an online learning algorithm – titled Switching via Monotone Adapted Regret Traces (𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART) – that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL) policy and the worst case guarantee of any other input policy 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT. We show that the regret of the 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART policy on any input sequence is within a multiplicative factor e/(e1)1.58𝑒𝑒11.58e/(e-1)\approx 1.58italic_e / ( italic_e - 1 ) ≈ 1.58 of the smaller of: 1) the regret obtained by 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. This implies a strictly stronger guarantee than typical ‘best-of-both-worlds’ bounds as the guarantee holds for every input sequence regardless of how it is generated. 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART is simple to implement as it begins by playing 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and switches at most once during the time horizon to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT. Our approach and results follow from an operational reduction of instance optimal online learning to competitive anaylsis for the ski-rental problem. We complement our competitive ratio upper bounds with a fundamental lower bound showing that over all input sequences, no algorithm can get better than a 1.431.431.431.43-fraction of the minimum regret achieved by 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and the minimax-optimal policy. We also present a modification of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART that combines 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL with a “small-loss” algorithm to achieve instance optimality between the regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and the small loss regret bound.

1 Introduction

Our work aims to develop algorithms for online learning that are instance optimal (Fagin et al., 2001),(Roughgarden, 2021, Chapter 3333) with respect to the stochastic and minimax optimal algorithms for a given setting. This is best motivated via a concrete example:

Example 1 (Binary Prediction).

We are given bit stream yn:=y1,y2,,yn{0,1}ny^{n}\mathrel{\mathop{\mathrel{\mathop{:}}}}=y_{1},y_{2},\ldots,y_{n}\in\{0,1% \}^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT : = italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. At the start of day t𝑡titalic_t, before seeing ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, we choose (possibly randomized) prediction Y^tBernoulli(at)similar-tosubscriptnormal-^𝑌𝑡normal-Bernoullisubscript𝑎𝑡\widehat{Y}_{t}\sim\mathrm{Bernoulli}(a_{t})over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ roman_Bernoulli ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (for at[0,1]subscript𝑎𝑡01a_{t}\in[0,1]italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ 0 , 1 ]) for the upcoming bit ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, given the history yt1superscript𝑦𝑡1y^{t-1}italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT. Our resulting loss on day t𝑡titalic_t is t(at)=(Y^tyt)=|atyt|subscriptnormal-ℓ𝑡subscript𝑎𝑡subscriptnormal-^𝑌𝑡subscript𝑦𝑡subscript𝑎𝑡subscript𝑦𝑡\ell_{t}(a_{t})=\mathbb{P}(\widehat{Y}_{t}\neq y_{t})=|a_{t}-y_{t}|roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = blackboard_P ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≠ italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = | italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT |, and our total loss is Ln(𝖠𝖫𝖦,yn):=t=1nt(at)L_{n}(\mathsf{ALG},y^{n})\mathrel{\mathop{\mathrel{\mathop{:}}}}=\sum_{t=1}^{n% }\ell_{t}(a_{t})italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( sansserif_ALG , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) : = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). The objective is to achieve low regret (i.e., additive loss) compared to the loss Ln(a,yn)=t=1nt(a)subscript𝐿𝑛𝑎superscript𝑦𝑛superscriptsubscript𝑡1𝑛subscriptnormal-ℓ𝑡𝑎L_{n}(a,y^{n})=\sum_{t=1}^{n}\ell_{t}(a)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ) of the best fixed action a*[0,1]superscript𝑎01a^{*}\in[0,1]italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ [ 0 , 1 ] in hindsight. As a*superscript𝑎a^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is the majority in ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT between 0 and 1, it follows that Ln(a*,yn)=min{t=1nyt,nt=1nyt}subscript𝐿𝑛superscript𝑎superscript𝑦𝑛superscriptsubscript𝑡1𝑛subscript𝑦𝑡𝑛superscriptsubscript𝑡1𝑛subscript𝑦𝑡L_{n}(a^{*},y^{n})=\min\left\{\sum_{t=1}^{n}y_{t},n-\sum_{t=1}^{n}y_{t}\right\}italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = roman_min { ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_n - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }. Formally, for sequence yn{0,1}nsuperscript𝑦𝑛superscript01𝑛y^{n}\in\{0,1\}^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, policy 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG incurs regret

Reg(𝖠𝖫𝖦,yn)Reg𝖠𝖫𝖦superscript𝑦𝑛\displaystyle\mathrm{Reg}(\mathsf{ALG},y^{n})roman_Reg ( sansserif_ALG , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) :=Ln(𝖠𝖫𝖦,yn)Ln(a*,yn)=t=1n|atyt|mina[0,1]t=1T|ayt|.\displaystyle\mathrel{\mathop{\mathrel{\mathop{:}}}}=L_{n}(\mathsf{ALG},y^{n})% -L_{n}(a^{*},y^{n})=\textstyle\sum_{t=1}^{n}|a_{t}-y_{t}|-\min_{a\in[0,1]}% \textstyle\sum_{t=1}^{T}|a-y_{t}|.: = italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( sansserif_ALG , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | - roman_min start_POSTSUBSCRIPT italic_a ∈ [ 0 , 1 ] end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_a - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | . (1)

Binary prediction goes back to the seminal works of Blackwell (1956) and Hannan (1957). The definition of regret is motivated by the case where ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is randomly generated as i.i.d. Bernoulli(p)𝑝(p)( italic_p ). If p𝑝pitalic_p is known, then the optimal policy is the ‘Bayes predictor’ a𝖡𝖺𝗒𝖾𝗌=2psuperscript𝑎𝖡𝖺𝗒𝖾𝗌2𝑝a^{\textsf{Bayes}}=\lfloor 2p\rflooritalic_a start_POSTSUPERSCRIPT Bayes end_POSTSUPERSCRIPT = ⌊ 2 italic_p ⌋ (i.e., nearest integer to p𝑝pitalic_p), which coincides with hindsight optimal a*superscript𝑎a^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT with high probability when p𝑝pitalic_p is away from 1/2121/21 / 2. When p𝑝pitalic_p is unknown, the stochastic optimal policy is the Follow The Leader oder 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL policy, which sets at=𝖬𝖺𝗃𝗈𝗋𝗂𝗍𝗒(yt1)subscript𝑎𝑡𝖬𝖺𝗃𝗈𝗋𝗂𝗍𝗒superscript𝑦𝑡1a_{t}=\mathsf{Majority}(y^{t-1})italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = sansserif_Majority ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ), i.e. the majority bit amongst the first t1𝑡1t-1italic_t - 1 bits (at=1/2subscript𝑎𝑡12a_{t}=1/2italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 1 / 2 if both are equal111We choose this specific tie-breaking rule for convenience; however, we can take any at[0,1]subscript𝑎𝑡01a_{t}\in[0,1]italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ 0 , 1 ].).

A starting point for online learning is the observation that it is easy to construct a sequence ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL has poor regret: For example, if yn=(1,0,1,0,1,0,)superscript𝑦𝑛101010y^{n}=(1,0,1,0,1,0,\ldots)italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = ( 1 , 0 , 1 , 0 , 1 , 0 , … ), i.e., alternate 1111s and 00s, then the regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL grows linearly with n𝑛nitalic_n. In contrast, worst-case optimal online learning policies such as those of Blackwell and Hannan, or more modern versions like Multiplicative Weights or Follow The Perturbed Leader (see Cesa-Bianchi and Lugosi (2006); Slivkins (2019)) guarantee regret of Θ(n)Θ𝑛\Theta(\sqrt{n})roman_Θ ( square-root start_ARG italic_n end_ARG ) over all sequences. Indeed, for bit prediction, the exact minimax optimal policy was established by Cover (1966), and this policy (which we refer to as 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover) achieves222Here fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, the so-called Rademacher complexity of the setting, is a fixed function of n𝑛nitalic_n that does not depend on sequence ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. For binary prediction, fn=𝔼|t=1nZt|2n2πsubscript𝑓𝑛𝔼superscriptsubscript𝑡1𝑛subscript𝑍𝑡2𝑛2𝜋f_{n}=\frac{\operatorname{\mathbb{E}}|\sum_{t=1}^{n}Z_{t}|}{2}\approx\sqrt{% \frac{n}{2\pi}}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = divide start_ARG blackboard_E | ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | end_ARG start_ARG 2 end_ARG ≈ square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 italic_π end_ARG end_ARG where ZnUnif{1,1}similar-tosuperscript𝑍𝑛Unif11Z^{n}\sim\mathrm{Unif}\{1,-1\}italic_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∼ roman_Unif { 1 , - 1 } i.i.d. Reg(𝖢𝗈𝗏𝖾𝗋,yn)=n2π(1+o(1))Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛𝑛2𝜋1𝑜1\mathrm{Reg}(\mathsf{Cover},y^{n})=\sqrt{\frac{n}{2\pi}}(1+o(1))roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 italic_π end_ARG end_ARG ( 1 + italic_o ( 1 ) ) under any yn{0,1}nsuperscript𝑦𝑛superscript01𝑛y^{n}\in\{0,1\}^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, implying it is an equalizer (achieves same regret over all sequences).

While the above discussion seems a convincing endorsement of worst-case online learning algorithms, the situation is more complicated. One problem is that while 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL has bad regret on certain pathological sequences, on more ‘realistic’ sequences 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL performs orders of magnitude better than the minimax regret. As an example, with i.i.d. Bernoulli(p)𝑝(p)( italic_p ) input, Reg(𝖥𝖳𝖫,yn)Reg𝖥𝖳𝖫superscript𝑦𝑛\mathrm{Reg}(\mathsf{FTL},y^{n})roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) is actually independent of n𝑛nitalic_n (i.e., O(1)𝑂1O(1)italic_O ( 1 )) as long as p𝑝pitalic_p is away from 1/2121/21 / 2 with high probability. We demonstrate this in Figure 1(a)𝑎(a)( italic_a ), where we see Reg(𝖥𝖳𝖫)Reg𝖥𝖳𝖫\mathrm{Reg}(\mathsf{FTL})roman_Reg ( sansserif_FTL ) is much lower than Reg(𝖢𝗈𝗏𝖾𝗋)0.39nReg𝖢𝗈𝗏𝖾𝗋0.39𝑛\mathrm{Reg}(\mathsf{Cover})\approx 0.39\sqrt{n}roman_Reg ( sansserif_Cover ) ≈ 0.39 square-root start_ARG italic_n end_ARG unless p𝑝pitalic_p is very close to 1/2121/21 / 2. This phenomena is known in more general settings (Huang et al., 2016), suggesting that in practice one may be better off just using 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL. On the other hand, as Figure 1(b,c)𝑏𝑐(b,c)( italic_b , italic_c ) indicates, we know how to generate sequences ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT (Feder et al., 1992) for which Reg(𝖥𝖳𝖫,yn)Reg𝖥𝖳𝖫superscript𝑦𝑛\mathrm{Reg}(\mathsf{FTL},y^{n})roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) grows linearly with n𝑛nitalic_n, and so the n𝑛\sqrt{n}square-root start_ARG italic_n end_ARG regret of 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover becomes appealing.

Now suppose instead that a fictitious oracle is told beforehand which of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL oder 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover is better suited for the upcoming sequence ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT; the demand made by instance optimality is that we try to be competitive against such an oracle on every sequence ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.

Definition 1 (Instance Optimality).

A binary prediction policy 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG is instance optimal with respect to the regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover if there exists some universal γn1subscript𝛾𝑛1\gamma_{n}\geq 1italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ 1 such that for all yn{0,1}nsuperscript𝑦𝑛superscript01𝑛y^{n}\in\{0,1\}^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT:

Reg(𝖠𝖫𝖦,yn)γnmin{Reg(𝖥𝖳𝖫,yn),Reg(𝖢𝗈𝗏𝖾𝗋,yn)}Reg𝖠𝖫𝖦superscript𝑦𝑛subscript𝛾𝑛Reg𝖥𝖳𝖫superscript𝑦𝑛Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛\small\mathrm{Reg}(\mathsf{ALG},y^{n})\leq\gamma_{n}\min\{\mathrm{Reg}(\mathsf% {FTL},y^{n}),\mathrm{Reg}(\mathsf{Cover},y^{n})\}roman_Reg ( sansserif_ALG , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_min { roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) }

We henceforth refer to γnsubscript𝛾𝑛\gamma_{n}italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT as the competitive ratio achieved by 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG; ideally we want this ratio to be a constant, i.e., γn=O(1)subscript𝛾𝑛𝑂1\gamma_{n}=O(1)italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_O ( 1 ). This necessitates that on sequences where 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL gets a constant regret, then 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG basically follows 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL throughout, while on sequences where 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL has high (in particular, ω(n)𝜔𝑛\omega(\sqrt{n})italic_ω ( square-root start_ARG italic_n end_ARG )) regret, then 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG follows 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover in most rounds.

The challenge in designing instance optimal algorithms is that the regret of any algorithm is a quantity that is not adapted to the natural filtration, i.e. it may not be possible to track Reg(𝖠𝖫𝖦,yn)Reg𝖠𝖫𝖦superscript𝑦𝑛\mathrm{Reg}(\mathsf{ALG},y^{n})roman_Reg ( sansserif_ALG , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) for any 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG from just the history (y1,y2,,yt1)subscript𝑦1subscript𝑦2subscript𝑦𝑡1(y_{1},y_{2},\ldots,y_{t-1})( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), since the hindsight optimal action a*superscript𝑎a^{*}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT depends on the entire sequence ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. One proxy is to track an algorithm’s loss instead, leading to the idea of ‘corralling’ policies (Agarwal et al., 2017; Pacchiano et al., 2020; Dann et al., 2023), that run online learning over the reference algorithms to get within O(poly(n))𝑂poly𝑛O(\text{poly}(n))italic_O ( poly ( italic_n ) ) of the smaller of the two losses. Such an approach can not ensure γn=O(1)subscript𝛾𝑛𝑂1\gamma_{n}=O(1)italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_O ( 1 ): for example, consider an i.i.d. sequence of Bernoulli(0.1)0.1(0.1)( 0.1 ) bits, where 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL has lower regret than 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover. With high probability on any such sequence we have small Reg(𝖥𝖳𝖫,yn)=O(1)Reg𝖥𝖳𝖫superscript𝑦𝑛𝑂1\mathrm{Reg}(\mathsf{FTL},y^{n})=O(1)roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_O ( 1 ) and yet high loss Ln(a*,yn)=Θ(n)subscript𝐿𝑛superscript𝑎superscript𝑦𝑛Θ𝑛L_{n}(a^{*},y^{n})=\Theta(n)italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = roman_Θ ( italic_n ); now any corralling algorithm (even a small loss one) must suffer O(poly(n))𝑂poly𝑛O(\text{poly}(n))italic_O ( poly ( italic_n ) ) regret, and hence ω(1)𝜔1\omega(1)italic_ω ( 1 ) competitive ratio. This example also shows that achieving a constant factor guarantee with respect to the minimum of the two losses does not translate to a constant factor guarantee with respect to the minimum of the two regrets.

The instance optimal guarantee is closely related to best-of-both-worlds guarantees, which aim for algorithms that simultaneously achieve (up to constant factors) both the low pseudoregret guarantee of policies designed for stochastic inputs (as with 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL in our setting, or the Upper Confidence Bound (𝖴𝖢𝖡𝖴𝖢𝖡\mathsf{UCB}sansserif_UCB) algorithm in bandits), as well as a per-sequence regret guarantee comparable to a worst-case optimal algorithm 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT (Eg. 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover or Hedge in online learning; 𝖤𝖷𝖯3𝖤𝖷𝖯3\mathsf{EXP}3sansserif_EXP 3 in bandits Auer et al. (2002a)). Such guarantees have been shown in a variety of settings, including online learning (De Rooij et al., 2014; Orabona and Pál, 2015; Mourtada and Gaïffas, 2019; Bilodeau et al., 2023) and bandit settings (Bubeck and Slivkins, 2012; Zimmert and Seldin, 2019; Lykouris et al., 2018; Dann et al., 2023). One problem though is that since pseudoregret and worst-case regret are very different quantities, the above results tend to be hard to interpret, and less predictive of good performance333As an example, Hedge has optimal pseudoregret in certain stochastic settings (Mourtada and Gaïffas, 2019), but this is known to be sensitive to perturbations in the distributions (Bilodeau et al., 2023).. Note though that given a pair of stochastic/worst-case optimal algorithms, a policy that is γ𝛾\gammaitalic_γ-instance-optimal w.r.t. these immediately satisfies a best-of-both-worlds guarantee with constant factor γ𝛾\gammaitalic_γ. In this regard, instance optimality provides a stronger guarantee as it holds on every sequence ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT regardless of how it is generated. Moreover, the parameter γ𝛾\gammaitalic_γ can also provide sharper comparisons between algorithms, as well as admit hardness results on the limits of such guarantees.

1.1 Our Contributions

Refer to caption
Refer to caption
Refer to caption
Figure 1: Comparing regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL, 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover and 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART on a collection of input sequences (for fixed n𝑛nitalic_n).
normal-∙\bullet In Fig. (a)𝑎(a)( italic_a ), we consider i.i.d. Bernoulli(p)𝑝(p)( italic_p ) inputs for varying p𝑝pitalic_p. The regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL is much lower than 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover for p<1/2𝑝12p<1/2italic_p < 1 / 2; the regret of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART tracks 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL closely (better than 2Reg(𝖥𝖳𝖫)2normal-Rnormal-enormal-g𝖥𝖳𝖫2\mathrm{Reg}(\mathsf{FTL})2 roman_R roman_e roman_g ( sansserif_FTL ), indicated by dotted line).
normal-∙\bullet In Fig. (b)𝑏(b)( italic_b ) and (c)𝑐(c)( italic_c ), we consider ‘worst-case’ binary sequences (as per (Feder et al., 1992)) parameterized by the number of ‘lead-changes’: the sequence with parameter c𝑐citalic_c comprises of c𝑐citalic_c pairs ‘0,1010,10 , 1’ or ‘1,0101,01 , 0’, followed by n2c𝑛2𝑐n-2citalic_n - 2 italic_c1111’s. In Fig. (b)𝑏(b)( italic_b ), we consider 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with a deterministic switching threshold (Theorem 1) and compare Reg(𝖲𝖬𝖠𝖱𝖳)normal-Reg𝖲𝖬𝖠𝖱𝖳\mathrm{Reg}(\mathsf{SMART})roman_Reg ( sansserif_SMART ) with 2Reg(𝖥𝖳𝖫)2normal-Rnormal-enormal-g𝖥𝖳𝖫2\mathrm{Reg}(\mathsf{FTL})2 roman_R roman_e roman_g ( sansserif_FTL ) and 2Reg(𝖢𝗈𝗏𝖾𝗋)2normal-Rnormal-enormal-g𝖢𝗈𝗏𝖾𝗋2\mathrm{Reg}(\mathsf{Cover})2 roman_R roman_e roman_g ( sansserif_Cover ) (dotted lines); in Fig. (c)𝑐(c)( italic_c ), we use a randomized threshold (Theorem 2), and show the average regret over the randomized threshold, as well as sample paths (plotted in green), and compare with ee1𝑒𝑒1\frac{e}{e-1}divide start_ARG italic_e end_ARG start_ARG italic_e - 1 end_ARG times Reg(𝖥𝖳𝖫)normal-Reg𝖥𝖳𝖫\mathrm{Reg}(\mathsf{FTL})roman_Reg ( sansserif_FTL ) and Reg(𝖢𝗈𝗏𝖾𝗋)normal-Reg𝖢𝗈𝗏𝖾𝗋\mathrm{Reg}(\mathsf{Cover})roman_Reg ( sansserif_Cover ) (dotted lines).

We consider a general online learning setting where at the beginning of each round t[n]𝑡delimited-[]𝑛t\in[n]italic_t ∈ [ italic_n ], a policy 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG first plays an action at𝒜subscript𝑎𝑡𝒜a_{t}\in\mathcal{A}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_A, following which, a loss function t:𝒜[0,1]:subscript𝑡𝒜01\ell_{t}\mathrel{\mathop{:}}\mathcal{A}\to[0,1]roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_A → [ 0 , 1 ] is revealed, resulting in a loss of t(at)subscript𝑡subscript𝑎𝑡\ell_{t}(a_{t})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). The regret is defined according to:

Reg(𝖠𝖫𝖦,n)=t=1nt(at)infa𝒜t=1nt(a).Reg𝖠𝖫𝖦superscript𝑛superscriptsubscript𝑡1𝑛subscript𝑡subscript𝑎𝑡subscriptinfimum𝑎𝒜superscriptsubscript𝑡1𝑛subscript𝑡𝑎\displaystyle\mathrm{Reg}(\mathsf{ALG},\ell^{n})=\textstyle\sum_{t=1}^{n}\ell_% {t}(a_{t})-\inf_{a\in\mathcal{A}}\textstyle\sum_{t=1}^{n}\ell_{t}(a).roman_Reg ( sansserif_ALG , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_inf start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ) . (2)

More generally, as in with bit prediction, we allow 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG to play in round t𝑡titalic_t a measure wtΔ𝒜subscript𝑤𝑡subscriptΔ𝒜w_{t}\in\Delta_{\mathcal{A}}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ roman_Δ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT (i.e., play {wt:𝒜[0,1]|a𝒜wt(a)=1}conditional-set:subscript𝑤𝑡𝒜maps-to01subscript𝑎𝒜subscript𝑤𝑡𝑎1\{w_{t}\mathrel{\mathop{:}}\mathcal{A}\mapsto[0,1]|\sum_{a\in\mathcal{A}}w_{t}% (a)=1\}{ italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_A ↦ [ 0 , 1 ] | ∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ) = 1 }), resulting in an expected loss of a𝒜wt(a)t(a)subscript𝑎𝒜subscript𝑤𝑡𝑎subscript𝑡𝑎\sum_{a\in\mathcal{A}}w_{t}(a)\ell_{t}(a)∑ start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ) roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ). For notational convenience, we henceforth use (at,t(at))subscript𝑎𝑡subscript𝑡subscript𝑎𝑡(a_{t},\ell_{t}(a_{t}))( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) for the action/loss, and reserve use of expectations for randomness in the algorithm and/or sequence.

We want to understand when is it possible to attain instance optimality as in Eq. (1) with respect to a given pair of algorithms. Ideally, we want the first to be optimal for stochastic instances, and the second to be minimax optimal; unfortunately however exact optimal policies are unknown except in simple settings. To this end, we make two amendments to our goal: First, for the stochastic optimal policy, we use 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL; this is well defined in any online learning setting, and moreover, known to be optimal or near-optimal for a wide range of settings under minimal assumptions (Kotłowski, 2018). Second, instead of the minimax policy, we use as reference any policy 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT which has a known worst case regret bound g(n)𝑔𝑛g(n)italic_g ( italic_n ). With these modifications in place, we have the following objective.

Definition 2.

Given 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and any algorithm 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT with supnReg(𝖠𝖫𝖦𝖶𝖢,n)g(n)subscriptsupremumsuperscriptnormal-ℓ𝑛normal-Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptnormal-ℓ𝑛𝑔𝑛\sup_{\ell^{n}}\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell^{n})\leq g(n)roman_sup start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_n ), we say a policy 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG is instance optimal with respect to the pair if there exists some universal γn1subscript𝛾𝑛1\gamma_{n}\geq 1italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ 1 (i.e. not depending on ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT) such that for every sequence of losses nsuperscriptnormal-ℓ𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT:

Reg(𝖠𝖫𝖦,n)γnmin{Reg(𝖥𝖳𝖫,n),g(n)}Reg𝖠𝖫𝖦superscript𝑛subscript𝛾𝑛Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛\small\mathrm{Reg}(\mathsf{ALG},\ell^{n})\leq\gamma_{n}\min\{\mathrm{Reg}(% \mathsf{FTL},\ell^{n}),g(n)\}roman_Reg ( sansserif_ALG , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g ( italic_n ) }

While the above guarantee is not truly instance-optimal in that we are comparing against a worst-case regret bound g(n)𝑔𝑛g(n)italic_g ( italic_n ) for 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT rather than its performance on the instance nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the two are the same if 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT is minimax optimal and hence attaining equal regret on all loss sequences; recall this is true of 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover for binary prediction.

To realize the above goal, we propose the Switch via Monotone Adapted Regret Traces (𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART) approach, which at a high level is a black-box way to convert design of instance-optimal policies into a simple optimal stopping problem. Our approach depends on just two ingredients: first, owing to the additive structure of online learning problems, we have that the minimax guarantee g(k)𝑔𝑘g(k)italic_g ( italic_k ) above holds over any k𝑘k\in\mathbb{Z}italic_k ∈ blackboard_Z and any (sub)sequence of n𝑛nitalic_n loss functions; second, we show that 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL admits simple anytime regret estimator Στ𝖥𝖳𝖫subscriptsuperscriptΣ𝖥𝖳𝖫𝜏\Sigma^{\mathsf{FTL}}_{\tau}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT (see Lemma 1) which is monotone and adapted (i.e., a function only of historical data). Using these two observations, we can reduce the task of minimizing regret to a version of the ‘ski-rental’ problem (Karlin et al., 1994; Borodin and El-Yaniv, 2005), as follows: we play 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL up to some stopping time τ𝜏\tauitalic_τ, and then switch to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT for the remaining nτ𝑛𝜏n-\tauitalic_n - italic_τ periods, resetting all losses to zero. This algorithm incurs a total regret bounded by Στ𝖥𝖳𝖫+g(nτ)subscriptsuperscriptΣ𝖥𝖳𝖫𝜏𝑔𝑛𝜏\Sigma^{\mathsf{FTL}}_{\tau}+g(n-\tau)roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT + italic_g ( italic_n - italic_τ ), and using ideas from competitive analysis, we get that there is a simple way to choose the stopping time τ𝜏\tauitalic_τ to achieve an e/(e1)1.58𝑒𝑒11.58e/(e-1)\approx 1.58italic_e / ( italic_e - 1 ) ≈ 1.58-competitive ratio guarantee with respect to the minimum between the regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and the worst case guarantee g(n)𝑔𝑛g(n)italic_g ( italic_n ).

Theorem.

(See Theorem 2) Let 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT have worst-case regret supnReg(𝖠𝖫𝖦𝖶𝖢,n)g(n)subscriptsupremumsuperscriptnormal-ℓ𝑛normal-Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptnormal-ℓ𝑛𝑔𝑛\sup_{\ell^{n}}\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell^{n})\leq g(n)roman_sup start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_n ) where g(n)𝑔𝑛g(n)italic_g ( italic_n ) is some monotonic function of n𝑛nitalic_n. An instantiation of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART achieves

Reg(𝖲𝖬𝖠𝖱𝖳,n)ee1min{Reg(𝖥𝖳𝖫,n),g(n)}+1.Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛𝑒𝑒1Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛1\displaystyle\small\mathrm{Reg}(\mathsf{SMART},\ell^{n})\leq\frac{e}{e-1}\min% \{\mathrm{Reg}(\mathsf{FTL},\ell^{n}),g(n)\}+1.roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ divide start_ARG italic_e end_ARG start_ARG italic_e - 1 end_ARG roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g ( italic_n ) } + 1 . (3)

A highlight of our approach is the surprising simplicity of the algorithm and analysis, despite the strength of the instance optimality guarantee. In particular, our approach is modular, allowing one to plug in any 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT and corresponding worst case bound g(n)𝑔𝑛g(n)italic_g ( italic_n ), thus letting us handle any online learning setting with known minimax bounds. This results in an entire family of instance optimal policies for settings such as predictions with experts and online convex optimization. Moreover, the approach is easy to extend to get more complex guarantees; as an example, if 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT is designed to get low regret for benign (i.e., ‘small-loss’) sequences nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, then we show how to use 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART as a subroutine and achieve an instance optimal guarantee with respect to the regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and a small loss regret bound.

Corollary 1 (Following Theorem 5).

Consider the prediction with expert advice setting (Cesa-Bianchi et al., 1997), where 𝒜=Δm1𝒜superscriptnormal-Δ𝑚1\mathcal{A}=\Delta^{m-1}caligraphic_A = roman_Δ start_POSTSUPERSCRIPT italic_m - 1 end_POSTSUPERSCRIPT, the mlimit-from𝑚m-italic_m -simplex for m2𝑚2m\geq 2italic_m ≥ 2, and t(a)=a,tsubscriptnormal-ℓ𝑡𝑎𝑎subscriptnormal-ℓ𝑡\ell_{t}(a)=\langle a,\ell_{t}\rangleroman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ) = ⟨ italic_a , roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ for t[0,1]msubscriptnormal-ℓ𝑡superscript01𝑚\ell_{t}\in[0,1]^{m}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Let L*:=minjt=1ntjL^{*}\mathrel{\mathop{\mathrel{\mathop{:}}}}=\min_{j}\sum_{t=1}^{n}\ell_{tj}italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT : = roman_min start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT. An instantiation of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART achieves

Reg(𝖲𝖬𝖠𝖱𝖳,n)Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛\displaystyle\mathrm{Reg}(\mathsf{SMART},\ell^{n})roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) 2min{Reg(𝖥𝖳𝖫,n),102L*logm}+O(logL*logm).absent2Reg𝖥𝖳𝖫superscript𝑛102superscript𝐿𝑚𝑂superscript𝐿𝑚\displaystyle\leq 2\min\left\{\mathrm{Reg}(\mathsf{FTL},\ell^{n}),10\sqrt{2L^{% *}\log m}\right\}+O(\log L^{*}\log m).≤ 2 roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , 10 square-root start_ARG 2 italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT roman_log italic_m end_ARG } + italic_O ( roman_log italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT roman_log italic_m ) .

Finally, studying instance optimality also lets us understand the fundamental limits of best-of-both worlds algorithms. To this end, we provide a lower bound that shows our algorithm is nearly optimal in the competitive ratio. To the best of our knowledge, this is the first hardness result for best-of-both-worlds guarantees in online learning.

Theorem.

(See Theorem 3) In the binary prediction setting, given any online algorithm 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG, there exist sequences yn{0,1}nsuperscript𝑦𝑛superscript01𝑛y^{n}\in\{0,1\}^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that:

Reg(𝖠𝖫𝖦,yn)Reg𝖠𝖫𝖦superscript𝑦𝑛\displaystyle\mathrm{Reg}(\mathsf{ALG},y^{n})roman_Reg ( sansserif_ALG , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) 1.43min{Reg(𝖥𝖳𝖫,yn),Reg(𝖢𝗈𝗏𝖾𝗋,yn)}absent1.43Reg𝖥𝖳𝖫superscript𝑦𝑛Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛\displaystyle\geq 1.43\min\left\{\mathrm{Reg}(\mathsf{FTL},y^{n}),\mathrm{Reg}% (\mathsf{Cover},y^{n})\right\}≥ 1.43 roman_min { roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) }

Note again that in binary prediction, 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL achieves the optimal pseudoregret under i.i.d. inputs, while 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover is the true minimax policy; thus, this is a fundamental lower bound on best-of-both-worlds guarantee in any online learning setting.

1.2 Related work

There have been many approaches towards combining stochastic and worst-case guarantees. As we discussed before, there is a large literature on best-of-both-worlds algorithms for both full and partial information settings (Wei and Luo, 2018; Bubeck et al., 2019; Zimmert and Seldin, 2019; Dann et al., 2023), and also more complex settings such as metrical task systems (Bhuyan et al., 2023) and control (Sabag et al., 2021; Goel et al., 2023). Another line of work (Rakhlin et al., 2011; Haghtalab et al., 2022; Block et al., 2022; Bhatt et al., 2023) considers smoothed analysis, where the worst-case actions of the adversary are perturbed by nature. A third approach (Bubeck and Slivkins, 2012; Lykouris et al., 2018; Amir et al., 2020; Zimmert and Seldin, 2019) interpolates between the stochastic and adversarial settings by considering most tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to be i.i.d., interspersed with a few adversarially chosen instances (corruptions). Finally, another line considers the data-generating distribution to come from a ball of specified radius around i.i.d. distributions (Mourtada and Gaïffas, 2019; Bilodeau et al., 2023). While all these approaches provide useful insights into the gap between average and worst-case guarantees, one can argue they are all imprecisely specified – given an instance {t}t[n]subscriptsubscript𝑡𝑡delimited-[]𝑛\{\ell_{t}\}_{t\in[n]}{ roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ italic_n ] end_POSTSUBSCRIPT in hindsight, there is no clear sense as to which model best ‘explains’ the instance.

Our focus on instance optimality instead follows the approach of better understanding and shaping the per-sequence regret landscape. The origins of this approach arguably come from the seminal work of Cover (1966) for binary prediction (we discuss this in more detail in Section 3), with a later focus on better bounds for benign instances in general online learning (Auer et al., 2002b; Cesa-Bianchi et al., 2005; Hazan and Kale, 2010). More recently, a line of work (Koolen et al., 2014; Van Erven et al., 2015; Van Erven and Koolen, 2016; Gaillard et al., 2014) have studied designing policies that can adapt to different types of data sequences and achieve multiple performance guarantees simultaneously. The main idea is to use multiple learning rates that are weighted according to their empirical performance on the data. While the focus is still primarily on classifying instances based on when they are easier/harder to learn, some of the resulting guarantees have an instance-optimality flavor; for example, Van Erven and Koolen (2016) show how to simultaneously match the performance guarantee (in terms of certain variance bounds) attained by different learning rates in Hedge. Such approaches however need to understand their baseline algorithms in great detail, and use them in a ‘white-box’ way to get their guarantees.

In contrast, our approach fundamentally focuses on combining policies in a black-box way to get instance optimal outcomes. As we mention, this is similar in spirit to corralling bandit algorithms Agarwal et al. (2017); Pacchiano et al. (2020); Dann et al. (2023), as well as more recent work on online algorithms with predictions Bamas et al. (2020); Dinitz et al. (2022); Anand et al. (2022); however, as we mention above, these all get guarantees with respect to the loss of the reference algorithms, which is much weaker than our regret guarantees (though they do so in much more complex settings with partial information and/or state). To our knowledge, the only previous result which attains a comparable instance-optimality guarantee to ours is that of De Rooij et al. (2014) for the experts problem, where the authors propose the 𝖥𝗅𝗂𝗉𝖥𝗅𝗈𝗉𝖥𝗅𝗂𝗉𝖥𝗅𝗈𝗉\mathsf{FlipFlop}sansserif_FlipFlop policy which interleaves 𝖧𝖾𝖽𝗀𝖾𝖧𝖾𝖽𝗀𝖾\mathsf{Hedge}sansserif_Hedge (with varying learning rates) and 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL to obtain a regret guarantee similar to that of Corollary 1. In fact, their guarantee is stronger as 𝖥𝗅𝗂𝗉𝖥𝗅𝗈𝗉𝖥𝗅𝗂𝗉𝖥𝗅𝗈𝗉\mathsf{FlipFlop}sansserif_FlipFlop is shown to be 5.64-competitive with respect to min{Reg(𝖥𝖳𝖫,n),g(L*)}Reg𝖥𝖳𝖫superscript𝑛𝑔superscript𝐿\min\{\mathrm{Reg}(\mathsf{FTL},\ell^{n}),g(L^{*})\}roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) } where g(L*)L*logm𝑔superscript𝐿superscript𝐿𝑚g(L^{*})\leq\sqrt{L^{*}\log m}italic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≤ square-root start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT roman_log italic_m end_ARG (De Rooij et al., 2014, Corollary 16). However, while 𝖥𝗅𝗂𝗉𝖥𝗅𝗈𝗉𝖥𝗅𝗂𝗉𝖥𝗅𝗈𝗉\mathsf{FlipFlop}sansserif_FlipFlop depends on a clever choice of learning rates in 𝖧𝖾𝖽𝗀𝖾𝖧𝖾𝖽𝗀𝖾\mathsf{Hedge}sansserif_Hedge, 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART can black-box interleave 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL with any worst-case/small-loss algorithm without knowing the inner workings of said algorithm, which we see as a significant engineering strength. More importantly, our approach to this problem is distinct as we focus on the fundamental limits (upper and lower bounds) on the competitive ratio that must be incurred when combining 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL with a worst-case policy; to the best of our knowledge this viewpoint, and the corresponding reduction to an optimal stopping problem, has not been previously explored.

2 Instance Optimal Online Learning: Achievability via 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART

Given the setting and problem statement above, we can now present the 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART policy. We do this for a general online learning problem, wherein we want to combine 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL with any given algorithm 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT with a worst-case regret guarantee supnReg(𝖠𝖫𝖦𝖶𝖢,n)g(n)subscriptsupremumsuperscript𝑛Regsubscript𝖠𝖫𝖦𝖶𝖢superscript𝑛𝑔𝑛\sup_{\ell^{n}}\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell^{n})\leq g(n)roman_sup start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_n ). Before presenting the policy, we first need the following regret decomposition for 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL.

Lemma 1 (Regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL).

If Lt():=i=1tt()L_{t}(\cdot)\mathrel{\mathop{\mathrel{\mathop{:}}}}=\sum_{i=1}^{t}\ell_{t}(\cdot)italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) : = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ), i.e. the cumulative loss function, then

Reg(𝖥𝖳𝖫,n)=t=1n(Lt(at1*)Lt(at*)).Reg𝖥𝖳𝖫superscript𝑛superscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝑡1subscript𝐿𝑡subscriptsuperscript𝑎𝑡\displaystyle\mathrm{Reg}(\mathsf{FTL},\ell^{n})=\textstyle\sum_{t=1}^{n}(L_{t% }(a^{*}_{t-1})-L_{t}(a^{*}_{t})).roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) .

This is reminiscent of the ‘be-the-leader’ lemma (Kalai and Vempala, 2005; Slivkins, 2019), although never stated explicitly as an exact decomposition.

Proof.

Recall we define at𝖥𝖳𝖫=at1*subscriptsuperscript𝑎𝖥𝖳𝖫𝑡subscriptsuperscript𝑎𝑡1a^{\mathsf{FTL}}_{t}=a^{*}_{t-1}italic_a start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT, and hence infa𝒜t=1nt(a)=Ln(an*)subscriptinfimum𝑎𝒜superscriptsubscript𝑡1𝑛subscript𝑡𝑎subscript𝐿𝑛subscriptsuperscript𝑎𝑛\inf_{a\in\mathcal{A}}\sum_{t=1}^{n}\ell_{t}(a)=L_{n}(a^{*}_{n})roman_inf start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ) = italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Now we have

Reg(𝖥𝖳𝖫,n)Reg𝖥𝖳𝖫superscript𝑛\displaystyle\mathrm{Reg}(\mathsf{FTL},\ell^{n})roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) =t=1nt(at1*)Ln(an*)absentsuperscriptsubscript𝑡1𝑛subscript𝑡subscriptsuperscript𝑎𝑡1subscript𝐿𝑛subscriptsuperscript𝑎𝑛\displaystyle=\textstyle\sum_{t=1}^{n}\ell_{t}(a^{*}_{t-1})-L_{n}(a^{*}_{n})= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
=t=1n(Lt(at1*)Lt1(at1*))Ln(an*)absentsuperscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝑡1subscript𝐿𝑡1subscriptsuperscript𝑎𝑡1subscript𝐿𝑛subscriptsuperscript𝑎𝑛\displaystyle=\textstyle\sum_{t=1}^{n}(L_{t}(a^{*}_{t-1})-L_{t-1}(a^{*}_{t-1})% )-L_{n}(a^{*}_{n})= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) - italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
=t=1n(Lt(at1*)Lt(at*))+t=1n(Lt(at*)Lt1(at1*))Ln(an*)absentsuperscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝑡1subscript𝐿𝑡subscriptsuperscript𝑎𝑡superscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝑡subscript𝐿𝑡1subscriptsuperscript𝑎𝑡1subscript𝐿𝑛subscriptsuperscript𝑎𝑛\displaystyle=\textstyle\sum_{t=1}^{n}(L_{t}(a^{*}_{t-1})-L_{t}(a^{*}_{t}))+% \textstyle\sum_{t=1}^{n}(L_{t}(a^{*}_{t})-L_{t-1}(a^{*}_{t-1}))-L_{n}(a^{*}_{n})= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) - italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
=(a)t=1n(Lt(at1*)Lt(at*))superscript𝑎absentsuperscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝑡1subscript𝐿𝑡subscriptsuperscript𝑎𝑡\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\textstyle\sum_{t=1}^{n}(L_{t}(a% ^{*}_{t-1})-L_{t}(a^{*}_{t}))start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( italic_a ) end_ARG end_RELOP ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) )

where (a)𝑎(a)( italic_a ) follows since t=1n(Lt(at*)Lt1(at1*))=Ln(an*)superscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝑡subscript𝐿𝑡1subscriptsuperscript𝑎𝑡1subscript𝐿𝑛subscriptsuperscript𝑎𝑛\sum_{t=1}^{n}(L_{t}(a^{*}_{t})-L_{t-1}(a^{*}_{t-1}))=L_{n}(a^{*}_{n})∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ) = italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) by telescoping. ∎

Next, let Σt𝖥𝖳𝖫subscriptsuperscriptΣ𝖥𝖳𝖫𝑡\Sigma^{\mathsf{FTL}}_{t}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denote the anytime regret that 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL incurs up to round t𝑡titalic_t (i.e., assuming the game ends after round t𝑡titalic_t). From Lemma 1 we have

Σt𝖥𝖳𝖫:=Reg(𝖥𝖳𝖫,t)=i=1t(Li(ai1*)Li(ai*))\displaystyle\Sigma^{\mathsf{FTL}}_{t}\mathrel{\mathop{\mathrel{\mathop{:}}}}=% \mathrm{Reg}(\mathsf{FTL},\ell^{t})=\textstyle\sum_{i=1}^{t}(L_{i}(a^{*}_{i-1}% )-L_{i}(a^{*}_{i}))roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : = roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) (4)

Now we make three critical observations:

  • Σt𝖥𝖳𝖫subscriptsuperscriptΣ𝖥𝖳𝖫𝑡\Sigma^{\mathsf{FTL}}_{t}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is adapted: it can be computed at the end of the tthsuperscript𝑡𝑡t^{th}italic_t start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT round

  • Σt𝖥𝖳𝖫subscriptsuperscriptΣ𝖥𝖳𝖫𝑡\Sigma^{\mathsf{FTL}}_{t}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is monotone non-decreasing in t𝑡titalic_t (since by definition Li(ai1*)Li(ai*)0subscript𝐿𝑖subscriptsuperscript𝑎𝑖1subscript𝐿𝑖subscriptsuperscript𝑎𝑖0L_{i}(a^{*}_{i-1})-L_{i}(a^{*}_{i})\geq 0italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≥ 0)

  • Σt𝖥𝖳𝖫subscriptsuperscriptΣ𝖥𝖳𝖫𝑡\Sigma^{\mathsf{FTL}}_{t}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is an anytime lower bound for Reg(𝖥𝖳𝖫,n)Reg𝖥𝖳𝖫superscript𝑛\mathrm{Reg}(\mathsf{FTL},\ell^{n})roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ), with Σn𝖥𝖳𝖫=Reg(𝖥𝖳𝖫,n)subscriptsuperscriptΣ𝖥𝖳𝖫𝑛Reg𝖥𝖳𝖫superscript𝑛\Sigma^{\mathsf{FTL}}_{n}=\mathrm{Reg}(\mathsf{FTL},\ell^{n})roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )

For an input threshold θ0𝜃0\theta\geq 0italic_θ ≥ 0 and an algorithm 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT, we get the following (meta)algorithm.

Input: Policies 𝖥𝖳𝖫,𝖠𝖫𝖦𝖶𝖢𝖥𝖳𝖫subscript𝖠𝖫𝖦𝖶𝖢\mathsf{FTL},\mathsf{ALG}_{\mathsf{WC}}sansserif_FTL , sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT, threshold θ𝜃\thetaitalic_θ
Initialize Σ0𝖥𝖳𝖫=0subscriptsuperscriptΣ𝖥𝖳𝖫00\Sigma^{\mathsf{FTL}}_{0}=0roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0, t=1𝑡1t=1italic_t = 1 ;
while Σt1𝖥𝖳𝖫θsubscriptsuperscriptnormal-Σ𝖥𝖳𝖫𝑡1𝜃\Sigma^{\mathsf{FTL}}_{t-1}\leq\thetaroman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ≤ italic_θ do
      Set at=at1*subscript𝑎𝑡superscriptsubscript𝑎𝑡1a_{t}=a_{t-1}^{*}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT;
       // Play 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL
      Observe t()subscript𝑡\ell_{t}(\cdot)roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ );
      Update Lt()=Lt1()+t()subscript𝐿𝑡subscript𝐿𝑡1subscript𝑡L_{t}(\cdot)=L_{t-1}(\cdot)+\ell_{t}(\cdot)italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) = italic_L start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( ⋅ ) + roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) and Σt𝖥𝖳𝖫=Σt1𝖥𝖳𝖫+(Lt(at1*)Lt(at*))subscriptsuperscriptΣ𝖥𝖳𝖫𝑡subscriptsuperscriptΣ𝖥𝖳𝖫𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡\Sigma^{\mathsf{FTL}}_{t}=\Sigma^{\mathsf{FTL}}_{t-1}+(L_{t}(a_{t-1}^{*})-L_{t% }(a_{t}^{*}))roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) and t=t+1𝑡𝑡1t=t+1italic_t = italic_t + 1;
     
      end while
     Reset losses to 00 and play 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT for remaining rounds (See Figure 2(b)) ;
Algorithm 1 Switching via Monotone Adapted Regret Traces (𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART)

We now have the following performance guarantee for Algorithm 1 for θ=g(n)𝜃𝑔𝑛\theta=g(n)italic_θ = italic_g ( italic_n ).

Theorem 1 (Regret of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with deterministic threshold).

We are given 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL, and any other policy 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT with worst-case regret supnReg(𝖠𝖫𝖦𝖶𝖢,n)g(n)subscriptsupremumsuperscriptnormal-ℓ𝑛normal-Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptnormal-ℓ𝑛𝑔𝑛\sup_{\ell^{n}}\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell^{n})\leq g(n)roman_sup start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_n ) for some monotone function g𝑔gitalic_g. Then, playing 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with threshold θ=g(n)𝜃𝑔𝑛\theta=g(n)italic_θ = italic_g ( italic_n ) ensures

Reg(𝖲𝖬𝖠𝖱𝖳,n)2min{Reg(𝖥𝖳𝖫,n),g(n)}+1.Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛2Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛1\displaystyle\mathrm{Reg}(\mathsf{SMART},\ell^{n})\leq 2\min\{\mathrm{Reg}(% \mathsf{FTL},\ell^{n}),g(n)\}+1.roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ 2 roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g ( italic_n ) } + 1 . (5)

As we mention before, the structure of the 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART algorithm (and the resulting guarantee) parallels the standard 2-competitive guarantee for the ski-rental problem (Karlin et al., 1994). This is a classical optimal stopping problem, where a principal faces an unknown horizon, and in each period must decide whether to rent a pair of skis for the period (for fixed cost $1) or buy the skis for the remaining horizon (for fixed cost $B𝐵Bitalic_B). The aim is to design a policy which is minimax optimal (over the unknown horizon) with regards to the ratio of the cost paid by the principal, and the optimal cost in hindsight. We further expand on this connection in Section 2.1 for the case of binary prediction. However, the connection suggests a natural follow-up question of whether randomized switching can help (as is the case for ski-rental); the following result answers this in the affirmative.

Theorem 2 (Regret of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with Randomized Thresholds).

We are given 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and any other policy 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT with worst-case regret supnReg(𝖠𝖫𝖦𝖶𝖢,n)g(n)subscriptsupremumsuperscriptnormal-ℓ𝑛normal-Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptnormal-ℓ𝑛𝑔𝑛\sup_{\ell^{n}}\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell^{n})\leq g(n)roman_sup start_POSTSUBSCRIPT roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_n ) for some monotone function g𝑔gitalic_g. Moreover, given random sample U𝑈𝑛𝑖𝑓[0,1]similar-to𝑈𝑈𝑛𝑖𝑓01U\sim\text{Unif}[0,1]italic_U ∼ Unif [ 0 , 1 ], suppose we set

θ=g(n)ln(1+(e1)U)𝜃𝑔𝑛1𝑒1𝑈\displaystyle\theta=g(n)\ln(1+(e-1)U)italic_θ = italic_g ( italic_n ) roman_ln ( 1 + ( italic_e - 1 ) italic_U )

Then playing the 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART policy (Algorithm 1) with random threshold θ𝜃\thetaitalic_θ ensures

𝔼θsubscript𝔼𝜃\displaystyle\operatorname{\mathbb{E}}_{\theta}blackboard_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [Reg(𝖲𝖬𝖠𝖱𝖳,n)]ee1min{Reg(𝖥𝖳𝖫,n),g(n)}+1.delimited-[]Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛𝑒𝑒1Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛1\displaystyle\left[\mathrm{Reg}(\mathsf{SMART},\ell^{n})\right]\leq\frac{e}{e-% 1}\min\{\mathrm{Reg}(\mathsf{FTL},\ell^{n}),g(n)\}+1.[ roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ] ≤ divide start_ARG italic_e end_ARG start_ARG italic_e - 1 end_ARG roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g ( italic_n ) } + 1 .

While we state the above as a randomized switching policy, this is more for interpretability – it is easier to view our policy as switching between two black-box algorithms rather than playing a convex combination of the two. However, since we define the loss incurred by any 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG in round t𝑡titalic_t to be the expected loss when 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG plays a distribution wtsubscript𝑤𝑡w_{t}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT over actions 𝒜𝒜\mathcal{A}caligraphic_A (see Section 1.1), therefore we can alternately implement the above by mixing between the actions of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT. More specifically, the above policy induces a monotone mixing rule, where over t𝑡titalic_t, the weight on the action suggested by 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT is non-decreasing.

Remark 1 (Optimality over Monotone Mixing Policies).

The competitive ratio of ee1𝑒𝑒1\frac{e}{e-1}divide start_ARG italic_e end_ARG start_ARG italic_e - 1 end_ARG is known to be optimal for the ski-rental problem via Yao’s minmax theorem (Borodin and El-Yaniv, 2005). A direct corollary of this is the optimality of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART over algorithms that are single switch (i.e., where the weight on 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT is non-decreasing in t𝑡titalic_t). One difference between our setting and ski-rental is that switching back-and-forth between 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT is possible; see for example the 𝖥𝗅𝗂𝗉𝖥𝗅𝗈𝗉𝖥𝗅𝗂𝗉𝖥𝗅𝗈𝗉\mathsf{FlipFlop}sansserif_FlipFlop policy of De Rooij et al. (2014). In Section 3 we provide a fundamental lower bound of 1.431.431.431.43 on the competitive ratio over all algorithms; this suggests that multiple switching can help get a better competitive ratio (since e/(e1)1.58𝑒𝑒11.58e/(e-1)\approx 1.58italic_e / ( italic_e - 1 ) ≈ 1.58), but also, that single-switch policies are surprisingly close to optimal.

2.1 Illustrating the Reduction to Ski Rental in Binary Prediction

Before proving Theorems 1 and 2, we first illustrate the basic idea of our approach and reduction for the binary prediction setting. This is aided by the observation that the regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL in this setting admits a simple geometric interpretation: for any sequence, and any time t𝑡titalic_t, we have that Σ𝖥𝖳𝖫superscriptΣ𝖥𝖳𝖫\Sigma^{\mathsf{FTL}}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT is equal to 1/2 times the number of ‘lead changes’ up to time t𝑡titalic_t; where a lead change is a time step i𝑖iitalic_i where the count of 1111s and 00s in the (sub)sequence yi1superscript𝑦𝑖1y^{i-1}italic_y start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT is equal (see Figure 2(a)).

Corollary 2 (𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL for binary prediction).

In binary prediction, for any sequence y{0,1}n𝑦superscript01𝑛y\in\{0,1\}^{n}italic_y ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and any time tn𝑡𝑛t\leq nitalic_t ≤ italic_n, define the lead-change counter

c(yt):=|{0jts.t.i=1jyi=i=1j(1yi)}|.c(y^{t})\mathrel{\mathop{\mathrel{\mathop{:}}}}=|\{0\leq j\leq t~{}s.t.~{}\sum% _{i=1}^{j}y_{i}=\sum_{i=1}^{j}(1-y_{i})\}|.italic_c ( italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) : = | { 0 ≤ italic_j ≤ italic_t italic_s . italic_t . ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( 1 - italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } | .

Then we have Σt𝖥𝖳𝖫=12c(yt1)subscriptsuperscriptnormal-Σ𝖥𝖳𝖫𝑡12𝑐superscript𝑦𝑡1\Sigma^{\mathsf{FTL}}_{t}=\frac{1}{2}c(y^{t-1})roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) and thus Reg(𝖥𝖳𝖫,yn)=12c(yn1)normal-Reg𝖥𝖳𝖫superscript𝑦𝑛12𝑐superscript𝑦𝑛1\mathrm{Reg}(\mathsf{FTL},y^{n})=\frac{1}{2}c(y^{n-1})roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( italic_y start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ).

Corollary 2 follows from the regret decomposition in Lemma 1. Furthermore, since the losses of 0s and 1s are equal at a lead change, it also follows that at a lead change t𝑡titalic_t, the anytime regret Σt𝖥𝖳𝖫subscriptsuperscriptΣ𝖥𝖳𝖫𝑡\Sigma^{\mathsf{FTL}}_{t}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is also equal to the hindsight regret incurred by 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL up to time t𝑡titalic_t. Since Σt𝖥𝖳𝖫subscriptsuperscriptΣ𝖥𝖳𝖫𝑡\Sigma^{\mathsf{FTL}}_{t}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT only increases in value at lead changes, if the 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART algorithm switches to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT, it will only happen after a lead change, and thus 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART behaves as if it had oracle knowledge of the regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL from just the history up to the current time.

Consider the instantiation of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART where 𝖠𝖫𝖦𝖶𝖢=𝖢𝗈𝗏𝖾𝗋subscript𝖠𝖫𝖦𝖶𝖢𝖢𝗈𝗏𝖾𝗋\mathsf{ALG}_{\mathsf{WC}}=\mathsf{Cover}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT = sansserif_Cover and g(n)=n2π𝑔𝑛𝑛2𝜋g(n)=\sqrt{\frac{n}{2\pi}}italic_g ( italic_n ) = square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 italic_π end_ARG end_ARG. As mentioned in Section 1, a remarkable property of 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover is that it is the true minimax optimal algorithm, where Reg(𝖢𝗈𝗏𝖾𝗋,yn)=n2π(1+o(1))Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛𝑛2𝜋1𝑜1\mathrm{Reg}(\mathsf{Cover},y^{n})=\sqrt{\frac{n}{2\pi}}(1+o(1))roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 italic_π end_ARG end_ARG ( 1 + italic_o ( 1 ) ) for all sequences ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, such that g(n)𝑔𝑛g(n)italic_g ( italic_n ) is nearly equal to Reg(𝖢𝗈𝗏𝖾𝗋,yn)Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛\mathrm{Reg}(\mathsf{Cover},y^{n})roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) regardless of the sequence (Cover, 1966).

It follows that 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART is equivalent to an algorithm which starts with 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL, plays it until the regret of 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL matches the minimax regret guaranteed by 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover for the remaining sequence, and then switches to 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover until the end. Let tswsubscript𝑡swt_{\mathrm{sw}}italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT denote the last round 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART plays 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL before switching to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT (with tsw=nsubscript𝑡sw𝑛t_{\mathrm{sw}}=nitalic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT = italic_n if it never switches). For a single switch algorithm, the sequence which maximizes the regret is one that maximizes the 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL regret before the switch at tswsubscript𝑡swt_{\mathrm{sw}}italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT, and minimizes the 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL after the switch, as depicted in Figure 2(a). For such a sequence, Σt𝖥𝖳𝖫=t/4subscriptsuperscriptΣ𝖥𝖳𝖫𝑡𝑡4\Sigma^{\mathsf{FTL}}_{t}=t/4roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_t / 4 at lead changes t𝑡titalic_t, such that the regret incurred by the algorithm is linear before the switch, matching the linear cost of renting skis in the ski rental problem. Note that tswsubscript𝑡swt_{\mathrm{sw}}italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT will necessarily be o(n)𝑜𝑛o(n)italic_o ( italic_n ) in such sequences as the time it takes until Σt𝖥𝖳𝖫g(n)subscriptsuperscriptΣ𝖥𝖳𝖫𝑡𝑔𝑛\Sigma^{\mathsf{FTL}}_{t}\geq g(n)roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≥ italic_g ( italic_n ) is linear in g(n)=o(n)𝑔𝑛𝑜𝑛g(n)=o(n)italic_g ( italic_n ) = italic_o ( italic_n ). After the switch, 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover will incur regret ntsw2π(1+o(1))=g(n)(1+o(1))𝑛subscript𝑡sw2𝜋1𝑜1𝑔𝑛1𝑜1\sqrt{\frac{n-t_{\mathrm{sw}}}{2\pi}}(1+o(1))=g(n)(1+o(1))square-root start_ARG divide start_ARG italic_n - italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_π end_ARG end_ARG ( 1 + italic_o ( 1 ) ) = italic_g ( italic_n ) ( 1 + italic_o ( 1 ) ), matching the fixed cost of buying skis at the switching point; Corollary 3 follows as a result of this analysis. Furthermore, in the binary prediction setting, 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART achieves the stronger notion of instance optimality stated in Definition 1.

Corollary 3.

For all yn{0,1}nsuperscript𝑦𝑛superscript01𝑛y^{n}\in\{0,1\}^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with 𝖠𝖫𝖦𝖶𝖢=𝖢𝗈𝗏𝖾𝗋subscript𝖠𝖫𝖦𝖶𝖢𝖢𝗈𝗏𝖾𝗋\mathsf{ALG}_{\mathsf{WC}}=\mathsf{Cover}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT = sansserif_Cover and θ=n2π𝜃𝑛2𝜋\theta=\sqrt{\frac{n}{2\pi}}italic_θ = square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 italic_π end_ARG end_ARG satisfies

Reg(𝖲𝖬𝖠𝖱𝖳,yn)Reg𝖲𝖬𝖠𝖱𝖳superscript𝑦𝑛\displaystyle\mathrm{Reg}(\mathsf{SMART},y^{n})roman_Reg ( sansserif_SMART , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) 2min{Reg(𝖥𝖳𝖫,yn),Reg(𝖢𝗈𝗏𝖾𝗋,yn)}+1.absent2Reg𝖥𝖳𝖫superscript𝑦𝑛Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛1\displaystyle\leq 2\min\{\mathrm{Reg}(\mathsf{FTL},y^{n}),\mathrm{Reg}(\mathsf% {Cover},y^{n})\}+1.≤ 2 roman_min { roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) } + 1 .

𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with 𝖠𝖫𝖦𝖶𝖢=𝖢𝗈𝗏𝖾𝗋subscript𝖠𝖫𝖦𝖶𝖢𝖢𝗈𝗏𝖾𝗋\mathsf{ALG}_{\mathsf{WC}}=\mathsf{Cover}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT = sansserif_Cover and θ=n2πln(1+(e1)U)𝜃𝑛2𝜋1𝑒1𝑈\theta=\sqrt{\frac{n}{2\pi}}\ln(1+(e-1)U)italic_θ = square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 italic_π end_ARG end_ARG roman_ln ( 1 + ( italic_e - 1 ) italic_U ) for U𝑈𝑛𝑖𝑓𝑜𝑟𝑚[0,1]similar-to𝑈𝑈𝑛𝑖𝑓𝑜𝑟𝑚01U\sim\text{Uniform}[0,1]italic_U ∼ Uniform [ 0 , 1 ] satisfies

Reg(𝖲𝖬𝖠𝖱𝖳,yn)Reg𝖲𝖬𝖠𝖱𝖳superscript𝑦𝑛\displaystyle\mathrm{Reg}(\mathsf{SMART},y^{n})roman_Reg ( sansserif_SMART , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) 1.58min{Reg(𝖥𝖳𝖫,yn),Reg(𝖢𝗈𝗏𝖾𝗋,yn)}+1.absent1.58Reg𝖥𝖳𝖫superscript𝑦𝑛Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛1\displaystyle\leq 1.58\min\{\mathrm{Reg}(\mathsf{FTL},y^{n}),\mathrm{Reg}(% \mathsf{Cover},y^{n})\}+1.≤ 1.58 roman_min { roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) } + 1 .
Refer to caption
Refer to caption
Figure 2: Figure 2(a) on the left shows the worst case instance in binary prediction for an algorithm which starts with 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and switches at most once during the time horizon to 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover. Figure2(b) on the right depicts in a prediction with experts setting how 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART resets the losses after the switch from 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT.

2.2 Proofs for General Online Learning

In the general online learning setting, the proof is nearly the same, with the reduction to the ski-rental problem captured by Lemma 2.

Lemma 2.

Let tsw:=min1tn1Σt𝖥𝖳𝖫>θt_{\mathrm{sw}}\mathrel{\mathop{\mathrel{\mathop{:}}}}=\min_{1\leq t\leq n-1}% \Sigma^{\mathsf{FTL}}_{t}>\thetaitalic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT : = roman_min start_POSTSUBSCRIPT 1 ≤ italic_t ≤ italic_n - 1 end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > italic_θ denote the last round 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART plays 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL before switching to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT (with tsw=nsubscript𝑡normal-sw𝑛t_{\mathrm{sw}}=nitalic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT = italic_n if it never switches). 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART incurs regret bounded by

Reg(𝖲𝖬𝖠𝖱𝖳,n)Reg(𝖥𝖳𝖫,tsw)+Reg(𝖠𝖫𝖦𝖶𝖢,tsw+1n)θ+Reg(𝖠𝖫𝖦𝖶𝖢,tsw+1n)+1.Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛Reg𝖥𝖳𝖫superscriptsubscript𝑡swRegsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝑡sw1𝑛𝜃Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝑡sw1𝑛1\mathrm{Reg}(\mathsf{SMART},\ell^{n})\leq\mathrm{Reg}(\mathsf{FTL},\ell^{t_{% \mathrm{sw}}})+\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{t_{\mathrm{sw}}+1% }^{n})\leq\theta+\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{t_{\mathrm{sw}}% +1}^{n})+1.roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) + roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_θ + roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) + 1 .
Proof.

We separately bound the regret of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART before the switch and after the switch,

Reg(𝖲𝖬𝖠𝖱𝖳,n)Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛\displaystyle\mathrm{Reg}(\mathsf{SMART},\ell^{n})roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) =(t=1tswt(at)t=1tswt(an*))+(t=tsw+1nt(at)t=tsw+1nt(an*))absentsuperscriptsubscript𝑡1subscript𝑡swsubscript𝑡subscript𝑎𝑡superscriptsubscript𝑡1subscript𝑡swsubscript𝑡subscriptsuperscript𝑎𝑛superscriptsubscript𝑡subscript𝑡sw1𝑛subscript𝑡subscript𝑎𝑡superscriptsubscript𝑡subscript𝑡sw1𝑛subscript𝑡subscriptsuperscript𝑎𝑛\displaystyle=\big{(}\textstyle\sum_{t=1}^{t_{\mathrm{sw}}}\ell_{t}(a_{t})-% \textstyle\sum_{t=1}^{t_{\mathrm{sw}}}\ell_{t}(a^{*}_{n})\big{)}+\big{(}% \textstyle\sum_{t=t_{\mathrm{sw}}+1}^{n}\ell_{t}(a_{t})-\textstyle\sum_{t=t_{% \mathrm{sw}}+1}^{n}\ell_{t}(a^{*}_{n})\big{)}= ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) + ( ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) )
(a)(t=1tswt(at)t=1tswt(atsw*))+(t=tsw+1nt(at)t=tsw+1nt(atsw+1:n*))superscript𝑎absentsuperscriptsubscript𝑡1subscript𝑡swsubscript𝑡subscript𝑎𝑡superscriptsubscript𝑡1subscript𝑡swsubscript𝑡subscriptsuperscript𝑎subscript𝑡swsuperscriptsubscript𝑡subscript𝑡sw1𝑛subscript𝑡subscript𝑎𝑡superscriptsubscript𝑡subscript𝑡sw1𝑛subscript𝑡subscriptsuperscript𝑎:subscript𝑡sw1𝑛\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\big{(}\textstyle\sum_{t=1}^{% t_{\mathrm{sw}}}\ell_{t}(a_{t})-\textstyle\sum_{t=1}^{t_{\mathrm{sw}}}\ell_{t}% (a^{*}_{t_{\mathrm{sw}}})\big{)}+\big{(}\textstyle\sum_{t=t_{\mathrm{sw}}+1}^{% n}\ell_{t}(a_{t})-\textstyle\sum_{t=t_{\mathrm{sw}}+1}^{n}\ell_{t}(a^{*}_{t_{% \mathrm{sw}}+1\mathrel{\mathop{:}}n})\big{)}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG ( italic_a ) end_ARG end_RELOP ( ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) + ( ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 : italic_n end_POSTSUBSCRIPT ) )
=Reg(𝖥𝖳𝖫,tsw)+Reg(𝖠𝖫𝖦𝖶𝖢,tsw+1n).absentReg𝖥𝖳𝖫superscriptsubscript𝑡swRegsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝑡sw1𝑛\displaystyle=\mathrm{Reg}(\mathsf{FTL},\ell^{t_{\mathrm{sw}}})+\mathrm{Reg}(% \mathsf{ALG}_{\mathsf{WC}},\ell_{t_{\mathrm{sw}}+1}^{n}).= roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) + roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) .

The first term is upper bounded by Reg(𝖥𝖳𝖫,tsw)Reg𝖥𝖳𝖫superscriptsubscript𝑡sw\mathrm{Reg}(\mathsf{FTL},\ell^{t_{\mathrm{sw}}})roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) since Ltsw(an*)Ltsw(atsw*)subscript𝐿subscript𝑡swsubscriptsuperscript𝑎𝑛subscript𝐿subscript𝑡swsubscriptsuperscript𝑎subscript𝑡swL_{t_{\mathrm{sw}}}(a^{*}_{n})\geq L_{t_{\mathrm{sw}}}(a^{*}_{t_{\mathrm{sw}}})italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ≥ italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) by definition, as atsw*subscriptsuperscript𝑎subscript𝑡swa^{*}_{t_{\mathrm{sw}}}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the minimizer of Ltswsubscript𝐿subscript𝑡swL_{t_{\mathrm{sw}}}italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and furthermore 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART always plays according to 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL in rounds up to tswsubscript𝑡swt_{\mathrm{sw}}italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT. The second term is upper bounded by Reg(𝖠𝖫𝖦𝖶𝖢,tsw+1n)Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝑡sw1𝑛\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{t_{\mathrm{sw}}+1}^{n})roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) because t=tsw+1nt(atsw+1:n*)t=tsw+1nt(an*)superscriptsubscript𝑡subscript𝑡sw1𝑛subscript𝑡subscriptsuperscript𝑎:subscript𝑡sw1𝑛superscriptsubscript𝑡subscript𝑡sw1𝑛subscript𝑡subscriptsuperscript𝑎𝑛\sum_{t=t_{\mathrm{sw}}+1}^{n}\ell_{t}(a^{*}_{t_{\mathrm{sw}}+1\mathrel{% \mathop{:}}n})\leq\sum_{t=t_{\mathrm{sw}}+1}^{n}\ell_{t}(a^{*}_{n})∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 : italic_n end_POSTSUBSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) as atsw+1:n*subscriptsuperscript𝑎:subscript𝑡sw1𝑛a^{*}_{t_{\mathrm{sw}}+1\mathrel{\mathop{:}}n}italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 : italic_n end_POSTSUBSCRIPT is the minimizer of the losses after tswsubscript𝑡swt_{\mathrm{sw}}italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT, and at time t>tsw𝑡subscript𝑡swt>t_{\mathrm{sw}}italic_t > italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT, 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART plays according to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT on the sequence of losses limited to t+1nsuperscriptsubscript𝑡1𝑛\ell_{t+1}^{n}roman_ℓ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. This illustrates the important role of resetting the losses after the switch as depicted in Figure 2(b). Using the fact that Σtsw1𝖥𝖳𝖫θsubscriptsuperscriptΣ𝖥𝖳𝖫subscript𝑡sw1𝜃\Sigma^{\mathsf{FTL}}_{t_{\mathrm{sw}}-1}\leq\thetaroman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ≤ italic_θ and Ltsw1(xtsw1*)Ltsw1(xtsw*)subscript𝐿subscript𝑡sw1subscriptsuperscript𝑥subscript𝑡sw1subscript𝐿subscript𝑡sw1subscriptsuperscript𝑥subscript𝑡swL_{t_{\mathrm{sw}}-1}(x^{*}_{t_{\mathrm{sw}}-1})\leq L_{t_{\mathrm{sw}}-1}(x^{% *}_{t_{\mathrm{sw}}})italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ) ≤ italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ), it follows that
Reg(𝖥𝖳𝖫,tsw)=Σtsw1𝖥𝖳𝖫+Ltsw1(atsw1*)Ltsw1(xtsw*)+tsw(atsw1*)tsw(xtsw*)θ+1.Reg𝖥𝖳𝖫superscriptsubscript𝑡swsubscriptsuperscriptΣ𝖥𝖳𝖫subscript𝑡sw1subscript𝐿subscript𝑡sw1subscriptsuperscript𝑎subscript𝑡sw1subscript𝐿subscript𝑡sw1subscriptsuperscript𝑥subscript𝑡swsubscriptsubscript𝑡swsubscriptsuperscript𝑎subscript𝑡sw1subscriptsubscript𝑡swsubscriptsuperscript𝑥subscript𝑡sw𝜃1~{}~{}~{}~{}~{}~{}\mathrm{Reg}(\mathsf{FTL},\ell^{t_{\mathrm{sw}}})=\Sigma^{% \mathsf{FTL}}_{t_{\mathrm{sw}}-1}+L_{t_{\mathrm{sw}}-1}(a^{*}_{t_{\mathrm{sw}}% -1})-L_{t_{\mathrm{sw}}-1}(x^{*}_{t_{\mathrm{sw}}})+\ell_{t_{\mathrm{sw}}}(a^{% *}_{t_{\mathrm{sw}}-1})-\ell_{t_{\mathrm{sw}}}(x^{*}_{t_{\mathrm{sw}}})\leq% \theta+1.roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ) - roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ≤ italic_θ + 1 .

The reduction to ski-rental is again immediate due to the properties that Σt𝖥𝖳𝖫:=Reg(𝖥𝖳𝖫,tsw)\Sigma^{\mathsf{FTL}}_{t}\mathrel{\mathop{:}}=\mathrm{Reg}(\mathsf{FTL},\ell^{% t_{\mathrm{sw}}})roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : = roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) is adapted, monotone, and is an anytime lower bound for Reg(𝖥𝖳𝖫,n)Reg𝖥𝖳𝖫superscript𝑛\mathrm{Reg}(\mathsf{FTL},\ell^{n})roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ), while remaining an upper bound on the true regret incurred by 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL up to time t𝑡titalic_t. As a result, the algorithm can pretend that it truly observes the regret it incurs at each time up to the switching time tswsubscript𝑡swt_{\mathrm{sw}}italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT. After the switch, 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART incurs regret Reg(𝖠𝖫𝖦𝖶𝖢,tsw+1n)Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝑡sw1𝑛\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{t_{\mathrm{sw}}+1}^{n})roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) which is upper bounded by g(ntsw)g(n)𝑔𝑛subscript𝑡sw𝑔𝑛g(n-t_{\mathrm{sw}})\leq g(n)italic_g ( italic_n - italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT ) ≤ italic_g ( italic_n ) by assumption.

Proof of Theorem 1.

This follows immediately from Lemma 2. For nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that Reg(𝖥𝖳𝖫,n)g(n)Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛\mathrm{Reg}(\mathsf{FTL},\ell^{n})\leq g(n)roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_n ), 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART will never switch to 𝖢𝗈𝗏𝖾𝗋𝖢𝗈𝗏𝖾𝗋\mathsf{Cover}sansserif_Cover as Σt𝖥𝖳𝖫Reg(𝖥𝖳𝖫,n)g(n)subscriptsuperscriptΣ𝖥𝖳𝖫𝑡Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛\Sigma^{\mathsf{FTL}}_{t}\leq\mathrm{Reg}(\mathsf{FTL},\ell^{n})\leq g(n)roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≤ roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_n ) such that Reg(𝖲𝖬𝖠𝖱𝖳,n)=min{Reg(𝖥𝖳𝖫,n),g(n)}Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛\mathrm{Reg}(\mathsf{SMART},\ell^{n})=\min\{\mathrm{Reg}(\mathsf{FTL},\ell^{n}% ),g(n)\}roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g ( italic_n ) }. For nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT such that Reg(𝖥𝖳𝖫,n)>g(n)Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛\mathrm{Reg}(\mathsf{FTL},\ell^{n})>g(n)roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) > italic_g ( italic_n ), by Lemma 2, Reg(𝖲𝖬𝖠𝖱𝖳,n)g(n)+g(ntsw)+12g(n)+1Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛𝑔𝑛𝑔𝑛subscript𝑡sw12𝑔𝑛1\mathrm{Reg}(\mathsf{SMART},\ell^{n})\leq g(n)+g(n-t_{\mathrm{sw}})+1\leq 2g(n% )+1roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_n ) + italic_g ( italic_n - italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT ) + 1 ≤ 2 italic_g ( italic_n ) + 1.

Proof of Theorem 2.

The proof uses a primal-dual approach, similar to that of Karlin et al. (1994) for ski-rental. For a given sequence of loss functions nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, we use the shorthand r=Reg(𝖥𝖳𝖫,n)𝑟Reg𝖥𝖳𝖫superscript𝑛r=\mathrm{Reg}(\mathsf{FTL},\ell^{n})italic_r = roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) and g=g(n)𝑔𝑔𝑛g=g(n)italic_g = italic_g ( italic_n ). Also, for our given choice of cumulative distribution function Fnsubscript𝐹𝑛F_{n}italic_F start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, the corresponding probability density function is given by f(z)=ez/gg(e1)𝑓𝑧superscript𝑒𝑧𝑔𝑔𝑒1f(z)=\frac{e^{z/g}}{g(e-1)}italic_f ( italic_z ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_z / italic_g end_POSTSUPERSCRIPT end_ARG start_ARG italic_g ( italic_e - 1 ) end_ARG for z[0,g]𝑧0𝑔z\in[0,g]italic_z ∈ [ 0 , italic_g ]. As before, let tsw:=min1tn1Σt𝖥𝖳𝖫>θt_{\mathrm{sw}}\mathrel{\mathop{\mathrel{\mathop{:}}}}=\min_{1\leq t\leq n-1}% \Sigma^{\mathsf{FTL}}_{t}>\thetaitalic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT : = roman_min start_POSTSUBSCRIPT 1 ≤ italic_t ≤ italic_n - 1 end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > italic_θ be the (random) round where 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART switches from 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT (with tsw=nsubscript𝑡sw𝑛t_{\mathrm{sw}}=nitalic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT = italic_n if it never switches). Then by Lemma 2 we have

Reg(𝖲𝖬𝖠𝖱𝖳,n)1min{Reg(𝖥𝖳𝖫,n),g(n)}{θ+gmin{r,g}if tsw<n1if tsw=nReg𝖲𝖬𝖠𝖱𝖳superscript𝑛1Reg𝖥𝖳𝖫superscript𝑛𝑔𝑛cases𝜃𝑔𝑟𝑔if subscript𝑡sw𝑛1if subscript𝑡sw𝑛\displaystyle\frac{\mathrm{Reg}(\mathsf{SMART},\ell^{n})-1}{\min\{\mathrm{Reg}% (\mathsf{FTL},\ell^{n}),g(n)\}}\leq\begin{cases}\frac{\theta+g}{\min\{r,g\}}&% \text{if }t_{\mathrm{sw}}<n\\ 1&\text{if }t_{\mathrm{sw}}=n\end{cases}divide start_ARG roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) - 1 end_ARG start_ARG roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g ( italic_n ) } end_ARG ≤ { start_ROW start_CELL divide start_ARG italic_θ + italic_g end_ARG start_ARG roman_min { italic_r , italic_g } end_ARG end_CELL start_CELL if italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT < italic_n end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL if italic_t start_POSTSUBSCRIPT roman_sw end_POSTSUBSCRIPT = italic_n end_CELL end_ROW

where the second case follows from the fact that θ[0,g]𝜃0𝑔\theta\in[0,g]italic_θ ∈ [ 0 , italic_g ], and hence if we never switch, then rg(n)𝑟𝑔𝑛r\leq g(n)italic_r ≤ italic_g ( italic_n ). Taking expectation over θ𝜃\thetaitalic_θ, we have

𝔼θ[Reg(𝖲𝖬𝖠𝖱𝖳,n)]1min{Reg(𝖥𝖳𝖫,n),g}{0rx+grf(x)𝑑x+1F(r)if rg0gx+ggf(x)𝑑xif r>gsubscript𝔼𝜃Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛1Reg𝖥𝖳𝖫superscript𝑛𝑔casessuperscriptsubscript0𝑟𝑥𝑔𝑟𝑓𝑥differential-d𝑥1𝐹𝑟if 𝑟𝑔superscriptsubscript0𝑔𝑥𝑔𝑔𝑓𝑥differential-d𝑥if 𝑟𝑔\displaystyle\frac{\operatorname{\mathbb{E}}_{\theta}\left[\mathrm{Reg}(% \mathsf{SMART},\ell^{n})\right]-1}{\min\{\mathrm{Reg}(\mathsf{FTL},\ell^{n}),g% \}}\leq\begin{cases}\int_{0}^{r}\frac{x+g}{r}f(x)dx+1-F(r)&\text{if }r\leq g\\ \int_{0}^{g}\frac{x+g}{g}f(x)dx&\text{if }r>g\end{cases}divide start_ARG blackboard_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ] - 1 end_ARG start_ARG roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g } end_ARG ≤ { start_ROW start_CELL ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT divide start_ARG italic_x + italic_g end_ARG start_ARG italic_r end_ARG italic_f ( italic_x ) italic_d italic_x + 1 - italic_F ( italic_r ) end_CELL start_CELL if italic_r ≤ italic_g end_CELL end_ROW start_ROW start_CELL ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT divide start_ARG italic_x + italic_g end_ARG start_ARG italic_g end_ARG italic_f ( italic_x ) italic_d italic_x end_CELL start_CELL if italic_r > italic_g end_CELL end_ROW

Let ϕ(z):=0z(x+g)zf(x)dx+1F(z)\phi(z)\mathrel{\mathop{\mathrel{\mathop{:}}}}=\int_{0}^{z}\frac{(x+g)}{z}f(x)% dx+1-F(z)italic_ϕ ( italic_z ) : = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT divide start_ARG ( italic_x + italic_g ) end_ARG start_ARG italic_z end_ARG italic_f ( italic_x ) italic_d italic_x + 1 - italic_F ( italic_z ) for z[0,g]𝑧0𝑔z\in[0,g]italic_z ∈ [ 0 , italic_g ]; then 𝔼θ[Reg(𝖲𝖬𝖠𝖱𝖳,n)]1min{Reg(𝖥𝖳𝖫,n),g}maxz[0,g]ϕ(z)subscript𝔼𝜃Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛1Reg𝖥𝖳𝖫superscript𝑛𝑔subscript𝑧0𝑔italic-ϕ𝑧\frac{\operatorname{\mathbb{E}}_{\theta}\left[\mathrm{Reg}(\mathsf{SMART},\ell% ^{n})\right]-1}{\min\{\mathrm{Reg}(\mathsf{FTL},\ell^{n}),g\}}\leq\max_{z\in[0% ,g]}\phi(z)divide start_ARG blackboard_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT [ roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ] - 1 end_ARG start_ARG roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g } end_ARG ≤ roman_max start_POSTSUBSCRIPT italic_z ∈ [ 0 , italic_g ] end_POSTSUBSCRIPT italic_ϕ ( italic_z ). Moreover, we can differentiate to get z2ϕ(z)=gzf(z)0z(x+g)f(x)𝑑xsuperscript𝑧2superscriptitalic-ϕ𝑧𝑔𝑧𝑓𝑧superscriptsubscript0𝑧𝑥𝑔𝑓𝑥differential-d𝑥z^{2}\phi^{\prime}(z)=gzf(z)-\int_{0}^{z}(x+g)f(x)dxitalic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) = italic_g italic_z italic_f ( italic_z ) - ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ( italic_x + italic_g ) italic_f ( italic_x ) italic_d italic_x. Substituting our choice of f𝑓fitalic_f in this expression, we get

z2dϕ(z)dzsuperscript𝑧2𝑑italic-ϕ𝑧𝑑𝑧\displaystyle\frac{z^{2}d\phi(z)}{dz}divide start_ARG italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_ϕ ( italic_z ) end_ARG start_ARG italic_d italic_z end_ARG =zez/g(e1)0z(x+g)ex/gg(e1)𝑑x=zez/gg0z/g(w+1)ew𝑑w(e1)=0absent𝑧superscript𝑒𝑧𝑔𝑒1superscriptsubscript0𝑧𝑥𝑔superscript𝑒𝑥𝑔𝑔𝑒1differential-d𝑥𝑧superscript𝑒𝑧𝑔𝑔superscriptsubscript0𝑧𝑔𝑤1superscript𝑒𝑤differential-d𝑤𝑒10\displaystyle=\frac{ze^{z/g}}{(e-1)}-\int_{0}^{z}(x+g)\frac{e^{x/g}}{g(e-1)}dx% =\frac{ze^{z/g}-g\int_{0}^{z/g}(w+1)e^{w}dw}{(e-1)}=0= divide start_ARG italic_z italic_e start_POSTSUPERSCRIPT italic_z / italic_g end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_e - 1 ) end_ARG - ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ( italic_x + italic_g ) divide start_ARG italic_e start_POSTSUPERSCRIPT italic_x / italic_g end_POSTSUPERSCRIPT end_ARG start_ARG italic_g ( italic_e - 1 ) end_ARG italic_d italic_x = divide start_ARG italic_z italic_e start_POSTSUPERSCRIPT italic_z / italic_g end_POSTSUPERSCRIPT - italic_g ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z / italic_g end_POSTSUPERSCRIPT ( italic_w + 1 ) italic_e start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT italic_d italic_w end_ARG start_ARG ( italic_e - 1 ) end_ARG = 0

Thus, ϕ(z)italic-ϕ𝑧\phi(z)italic_ϕ ( italic_z ) is constant for all z[0,g]𝑧0𝑔z\in[0,g]italic_z ∈ [ 0 , italic_g ] and ϕ(g)=1g(e1)0g(1+x/g)ex/g𝑑x=ee1italic-ϕ𝑔1𝑔𝑒1superscriptsubscript0𝑔1𝑥𝑔superscript𝑒𝑥𝑔differential-d𝑥𝑒𝑒1\phi(g)=\frac{1}{g(e-1)}\int_{0}^{g}(1+x/g)e^{x/g}dx=\frac{e}{e-1}italic_ϕ ( italic_g ) = divide start_ARG 1 end_ARG start_ARG italic_g ( italic_e - 1 ) end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( 1 + italic_x / italic_g ) italic_e start_POSTSUPERSCRIPT italic_x / italic_g end_POSTSUPERSCRIPT italic_d italic_x = divide start_ARG italic_e end_ARG start_ARG italic_e - 1 end_ARG. ∎

3 Instance Optimal Online Learning: Converse

In this section, we investigate fundamental limits on the instance-optimal regret guarantees achievable by any algorithm. More precisely, in the setting of binary prediction, we ask what is the smallest value of γnsubscript𝛾𝑛\gamma_{n}italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT satisfying

Reg(𝖠𝖫𝖦,yn)Reg𝖠𝖫𝖦superscript𝑦𝑛\displaystyle\mathrm{Reg}(\mathsf{ALG},y^{n})roman_Reg ( sansserif_ALG , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) γnmin{Reg(𝖥𝖳𝖫,yn),Reg(𝖢𝗈𝗏𝖾𝗋,yn)}=γnmin{12c(yn1),fn}absentsubscript𝛾𝑛Reg𝖥𝖳𝖫superscript𝑦𝑛Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛subscript𝛾𝑛12𝑐superscript𝑦𝑛1subscript𝑓𝑛\displaystyle\leq\gamma_{n}\min\{\mathrm{Reg}(\mathsf{FTL},y^{n}),\mathrm{Reg}% (\mathsf{Cover},y^{n})\}=\gamma_{n}\min\left\{\tfrac{1}{2}c(y^{n-1}),f_{n}\right\}≤ italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_min { roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) } = italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_min { divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( italic_y start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } (6)

for all ynsuperscript𝑦𝑛y^{n}italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, where fn:=Reg(𝖢𝗈𝗏𝖾𝗋,yn)=n2π(1+o(1))f_{n}\mathrel{\mathop{\mathrel{\mathop{:}}}}=\mathrm{Reg}(\mathsf{Cover},y^{n}% )=\sqrt{\frac{n}{2\pi}}(1+o(1))italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT : = roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = square-root start_ARG divide start_ARG italic_n end_ARG start_ARG 2 italic_π end_ARG end_ARG ( 1 + italic_o ( 1 ) ). We show the following lower bound.

Theorem 3 (Lower bound on the competitive ratio).
limnγn(1e1/π+2Q(2/π))11.4335subscript𝑛subscript𝛾𝑛superscript1superscript𝑒1𝜋2𝑄2𝜋11.4335\lim_{n\to\infty}\gamma_{n}\geq\Big{(}1-e^{-1/\pi}+2Q\big{(}\sqrt{2/\pi}\big{)% }\Big{)}^{-1}\approx 1.4335roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ ( 1 - italic_e start_POSTSUPERSCRIPT - 1 / italic_π end_POSTSUPERSCRIPT + 2 italic_Q ( square-root start_ARG 2 / italic_π end_ARG ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ≈ 1.4335

where the Q()𝑄normal-⋅Q(\cdot)italic_Q ( ⋅ ) function is Q(x):=12πxet2/2Q(x)\mathrel{\mathop{\mathrel{\mathop{:}}}}=\frac{1}{\sqrt{2\pi}}\int_{x}^{% \infty}e^{-t^{2}/2}italic_Q ( italic_x ) : = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG end_ARG ∫ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 end_POSTSUPERSCRIPT.

Since binary prediction is a specific online learning problem, this also yields a fundamental lower bound for instance-optimality for general online learning. Note particularly that γn>1subscript𝛾𝑛1\gamma_{n}>1italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > 1, implying that (1+o(1))min{Reg(𝖥𝖳𝖫),Reg(𝖠𝖫𝖦𝖶𝖢)}1𝑜1Reg𝖥𝖳𝖫Regsubscript𝖠𝖫𝖦𝖶𝖢(1+o(1))\min\{\mathrm{Reg}(\mathsf{FTL}),\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC% }})\}( 1 + italic_o ( 1 ) ) roman_min { roman_Reg ( sansserif_FTL ) , roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT ) } regret is not possible to achieve. Thus, there is an inevitable multiplicative factor that must be paid in order to achieve an instance-optimal regret guarantee.

An equivalent way to state 6 is to find the smallest γnsubscript𝛾𝑛\gamma_{n}italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for which a predictor {at(yt1)}t=1nsuperscriptsubscriptsubscript𝑎𝑡superscript𝑦𝑡1𝑡1𝑛\{a_{t}(y^{t-1})\}_{t=1}^{n}{ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT satisfies for all y{0,1}n𝑦superscript01𝑛y\in\{0,1\}^{n}italic_y ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT

t=1n|at(yt1)yt|superscriptsubscript𝑡1𝑛subscript𝑎𝑡superscript𝑦𝑡1subscript𝑦𝑡\displaystyle\textstyle\sum_{t=1}^{n}|a_{t}(y^{t-1})-y_{t}|∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | γnmin{12c(yn1),fn}+min{t=1nyi,nt=1nyi}.absentsubscript𝛾𝑛12𝑐superscript𝑦𝑛1subscript𝑓𝑛superscriptsubscript𝑡1𝑛subscript𝑦𝑖𝑛superscriptsubscript𝑡1𝑛subscript𝑦𝑖\displaystyle\leq\gamma_{n}\min\big{\{}\tfrac{1}{2}c(y^{n-1}),f_{n}\big{\}}+% \min\big{\{}\textstyle\sum_{t=1}^{n}y_{i},n-\textstyle\sum_{t=1}^{n}y_{i}\big{% \}}.≤ italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_min { divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( italic_y start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } + roman_min { ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_n - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } . (7)

In order to establish the values of γnsubscript𝛾𝑛\gamma_{n}italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT for which the loss function in the right hand side of 7 are achievable, we utilize the following result of Cover (1966), which provides an exact characterization of the set of all loss functions achievable in binary prediction. Formally, we say a function ϕ:{0,1}n+:italic-ϕsuperscript01𝑛superscript\phi\mathrel{\mathop{:}}\{0,1\}^{n}\to\mathbb{R}^{+}italic_ϕ : { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is achievable in binary prediction if there exists a predictor/strategy at:yt1[0,1]:subscript𝑎𝑡superscript𝑦𝑡1maps-to01a_{t}\mathrel{\mathop{:}}y^{t-1}\mapsto[0,1]italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ↦ [ 0 , 1 ] that ensures t=1n|at(yt1)yt|=ϕ(yn),yn{0,1}n\sum_{t=1}^{n}|a_{t}(y^{t-1})-y_{t}|=\phi(y^{n})\quad,\,\forall\,y^{n}\in\{0,1% \}^{n}∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | = italic_ϕ ( italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , ∀ italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Then, we have the following characterization.

Theorem 4 (Cover (1966)).

Let ϵnBern(12)similar-tosuperscriptitalic-ϵ𝑛normal-Bern12\epsilon^{n}\sim\mathrm{Bern}\left(\frac{1}{2}\right)italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∼ roman_Bern ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) i.i.d. For ϕitalic-ϕ\phiitalic_ϕ to be achievable, it must satisfy the following:

  • Balance: 𝔼[ϕ(ϵn)]=n2𝔼italic-ϕsuperscriptitalic-ϵ𝑛𝑛2\operatorname{\mathbb{E}}[\phi(\epsilon^{n})]=\frac{n}{2}blackboard_E [ italic_ϕ ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ] = divide start_ARG italic_n end_ARG start_ARG 2 end_ARG .

  • Stability: Let ϕt(yt):=𝔼[ϕ(ytϵt+1n)]\phi_{t}(y^{t})\mathrel{\mathop{\mathrel{\mathop{:}}}}=\operatorname{\mathbb{E% }}[\phi(y^{t}\epsilon_{t+1}^{n})]italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) : = blackboard_E [ italic_ϕ ( italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ]; then |ϕt(yt10)ϕt(yt11)|1t[n],yt{0,1}tformulae-sequencesubscriptitalic-ϕ𝑡superscript𝑦𝑡10subscriptitalic-ϕ𝑡superscript𝑦𝑡111for-all𝑡delimited-[]𝑛superscript𝑦𝑡superscript01𝑡|\phi_{t}(y^{t-1}0)-\phi_{t}(y^{t-1}1)|\leq 1\,\forall t\in[n],\,y^{t}\in\{0,1% \}^{t}| italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT 0 ) - italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT 1 ) | ≤ 1 ∀ italic_t ∈ [ italic_n ] , italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

Further any ϕitalic-ϕ\phiitalic_ϕ satisfying the above is realized by predictor at(yt1)=1+ϕt(yt10)ϕt(yt11)2subscript𝑎𝑡superscript𝑦𝑡11subscriptitalic-ϕ𝑡superscript𝑦𝑡10subscriptitalic-ϕ𝑡superscript𝑦𝑡112a_{t}(y^{t-1})=\frac{1+\phi_{t}(y^{t-1}0)-\phi_{t}(y^{t-1}1)}{2}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) = divide start_ARG 1 + italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT 0 ) - italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT 1 ) end_ARG start_ARG 2 end_ARG.

As an immediate corollary, Theorem 4 equips us with the exact minimax optimal algorithm for binary prediction alluded to in Section 1. Returning to our setting, from the balance condition in Theorem 4, for ϵnBern(1/2)similar-tosuperscriptitalic-ϵ𝑛Bern12\epsilon^{n}\sim\mathrm{Bern}(1/2)italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ∼ roman_Bern ( 1 / 2 ) i.i.d.

γn𝔼[min{12c(ϵn1),fn}]+𝔼[min{t=1nϵt,nt=1nϵt}]n2.subscript𝛾𝑛𝔼12𝑐superscriptitalic-ϵ𝑛1subscript𝑓𝑛𝔼superscriptsubscript𝑡1𝑛subscriptitalic-ϵ𝑡𝑛superscriptsubscript𝑡1𝑛subscriptitalic-ϵ𝑡𝑛2\displaystyle\gamma_{n}\operatorname{\mathbb{E}}\big{[}\min\big{\{}\tfrac{1}{2% }c(\epsilon^{n-1}),f_{n}\big{\}}\big{]}+\operatorname{\mathbb{E}}\big{[}\min% \big{\{}\textstyle\sum_{t=1}^{n}\epsilon_{t},n-\textstyle\sum_{t=1}^{n}% \epsilon_{t}\big{\}}\big{]}\geq\frac{n}{2}.italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT blackboard_E [ roman_min { divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ] + blackboard_E [ roman_min { ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_n - ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } ] ≥ divide start_ARG italic_n end_ARG start_ARG 2 end_ARG .

for the function in (7) to be achievable. Using the definition of fnsubscript𝑓𝑛f_{n}italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT,

γnfn𝔼[min{12c(ϵn1),fn}]=2fn𝔼[min{c(ϵn1),2fn}]].\displaystyle\gamma_{n}\geq\frac{f_{n}}{\operatorname{\mathbb{E}}\left[\min% \left\{\tfrac{1}{2}c(\epsilon^{n-1}),f_{n}\right\}\right]}=\frac{2f_{n}}{% \operatorname{\mathbb{E}}\left[\min\left\{c(\epsilon^{n-1}),2f_{n}\right\}]% \right]}.italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ divide start_ARG italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG blackboard_E [ roman_min { divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) , italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ] end_ARG = divide start_ARG 2 italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG blackboard_E [ roman_min { italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ) , 2 italic_f start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ] ] end_ARG .

The above bound immediately yields that γn1subscript𝛾𝑛1\gamma_{n}\geq 1italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ≥ 1 as expected. We can further sharply characterize the asymptotics of γnsubscript𝛾𝑛\gamma_{n}italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, resulting in the stated lower bound. The full proof of Theorem 3 is provided in Appendix B.

4 Instance-Optimal Algorithms in Small-Loss Settings

So far, we have presented specializations of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART that achieve instance-optimality between 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and the worst-case regret g(n)𝑔𝑛g(n)italic_g ( italic_n ). However, many worst-case algorithms can still adapt to the instance nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and achieve regret guarantees that are a function of the ‘difficulty’ of the instance nsuperscript𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. A common way to quantify this is difficulty is via small-loss bounds, where the regret is upper bounded by g(L*)𝑔superscript𝐿g(L^{*})italic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) where g()𝑔g(\cdot)italic_g ( ⋅ ) as earlier is a monotonic increasing function and L*:=mina𝒜t=1nt(a)L^{*}\mathrel{\mathop{\mathrel{\mathop{:}}}}=\min_{a\in\mathcal{A}}\sum_{t=1}^% {n}\ell_{t}(a)italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT : = roman_min start_POSTSUBSCRIPT italic_a ∈ caligraphic_A end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a ) is the loss achieved by the best action. Such guarantees imply that for sequences where there exists an action achieving low loss, the corresponding regret achieved is also low. Thus, a natural question is whether 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART can be specialized to yield an algorithm that is constant competitive with respect to min{Reg(𝖥𝖳𝖫,n),g(L*)}Reg𝖥𝖳𝖫superscript𝑛𝑔superscript𝐿\min\{\mathrm{Reg}(\mathsf{FTL},\ell^{n}),g(L^{*})\}roman_min { roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , italic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) }.

As a starting point, if L*superscript𝐿L^{*}italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is known apriori, it is easy to achieve a ee1𝑒𝑒1\frac{e}{e-1}divide start_ARG italic_e end_ARG start_ARG italic_e - 1 end_ARG approximation by simply using 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with (random) threshold θ=g(L*)ln(1+(e1)U),UUnif[0,1]formulae-sequence𝜃𝑔superscript𝐿1𝑒1𝑈similar-to𝑈Unif01\theta=g(L^{*})\ln(1+(e-1)U),U\sim\mathrm{Unif}[0,1]italic_θ = italic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) roman_ln ( 1 + ( italic_e - 1 ) italic_U ) , italic_U ∼ roman_Unif [ 0 , 1 ]; this is an immediate corollary of Theorem 2. When L*superscript𝐿L^{*}italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is not known, we use a guess-and-double argument to devise an algorithm that achieves the following instance-optimality guarantee.

Theorem 5 (Regret of 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART for unknown small loss).

Let 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT have small loss regret guarantees satisfying Reg(𝖠𝖫𝖦𝖶𝖢,n)g(L*)normal-Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptnormal-ℓ𝑛𝑔superscript𝐿\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell^{n})\leq g(L^{*})roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ italic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) for any nsuperscriptnormal-ℓ𝑛\ell^{n}roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT where L*=minj[m]Ltjsuperscript𝐿subscript𝑗delimited-[]𝑚subscript𝐿𝑡𝑗L^{*}=\min_{j\in[m]}L_{tj}italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = roman_min start_POSTSUBSCRIPT italic_j ∈ [ italic_m ] end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_t italic_j end_POSTSUBSCRIPT, i.e. the loss achieved by the best expert in hindsight. Then, if we play 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART for Small-Loss as stated in Algorithm 2222, we have

Reg(𝖲𝖬𝖠𝖱𝖳,n)Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛\displaystyle\mathrm{Reg}(\mathsf{SMART},\ell^{n})roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) 2min(Reg(𝖥𝖳𝖫,n),z=1log(1+L*/logm)+1g(2zlogm))+O(logL*/logm)absent2Reg𝖥𝖳𝖫superscript𝑛superscriptsubscript𝑧11superscript𝐿𝑚1𝑔superscript2𝑧𝑚𝑂superscript𝐿𝑚\displaystyle\leq 2\min\big{(}\mathrm{Reg}(\mathsf{FTL},\ell^{n}),\textstyle% \sum_{z=1}^{\log(1+L^{*}/\log m)+1}g(2^{z}\log m)\big{)}+O(\log L^{*}/\log m)≤ 2 roman_min ( roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , ∑ start_POSTSUBSCRIPT italic_z = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log ( 1 + italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT / roman_log italic_m ) + 1 end_POSTSUPERSCRIPT italic_g ( 2 start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT roman_log italic_m ) ) + italic_O ( roman_log italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT / roman_log italic_m )

In particular, in the prediction with expert advice setting, we know that 𝖧𝖾𝖽𝗀𝖾𝖧𝖾𝖽𝗀𝖾\mathsf{Hedge}sansserif_Hedge with a time-varying learning rate achieves g(L*)22L*logm+κlogm𝑔superscript𝐿22superscript𝐿𝑚𝜅𝑚g(L^{*})\equiv 2\sqrt{2L^{*}\log m}+\kappa\log mitalic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≡ 2 square-root start_ARG 2 italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT roman_log italic_m end_ARG + italic_κ roman_log italic_m (where κ>0𝜅0\kappa>0italic_κ > 0 is an absolute constant) (Auer et al., 2002b; Cesa-Bianchi et al., 2007); this gives Corollary 1 in Section 1.

The intuitive idea behind the algorithm is to guess the value of L*superscript𝐿L^{*}italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, and play 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with this guessed value while simultaneously keeping track of the regret incurred. Whenever the regret incurred exceeds the guarantee established by 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART with known L*superscript𝐿L^{*}italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT double the guessed value and start again. We use the notation 𝖠𝖫𝖦𝖶𝖢(t1t2)subscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝑡1subscript𝑡2\mathsf{ALG}_{\mathsf{WC}}(\ell_{t_{1}}^{t_{2}})sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) to refer to the worst-case algorithm when the previously observed sequence is t1t2superscriptsubscriptsubscript𝑡1subscript𝑡2\ell_{t_{1}}^{t_{2}}roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT; in particular this would be equivalent to the action recommended at time t2+1subscript𝑡21t_{2}+1italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 after throwing away all the observed losses before t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We let Σt1:t2𝖥𝖳𝖫=i=t1t2(Li(ai1*)Li(ai*))subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡1subscript𝑡2superscriptsubscript𝑖subscript𝑡1subscript𝑡2subscript𝐿𝑖subscriptsuperscript𝑎𝑖1subscript𝐿𝑖subscriptsuperscript𝑎𝑖\Sigma^{\mathsf{FTL}}_{t_{1}\mathrel{\mathop{:}}t_{2}}=\sum_{i=t_{1}}^{t_{2}}(% L_{i}(a^{*}_{i-1})-L_{i}(a^{*}_{i}))roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ), which grows as the number of leader changes within i[t1,t2]𝑖subscript𝑡1subscript𝑡2i\in[t_{1},t_{2}]italic_i ∈ [ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ]. The algorithm’s pseudocode is given in Algorithm 2 below, and a proof of Theorem 5 is provided in Appendix A.

Input: Policies 𝖥𝖳𝖫,𝖠𝖫𝖦𝖶𝖢𝖥𝖳𝖫subscript𝖠𝖫𝖦𝖶𝖢\mathsf{FTL},\mathsf{ALG}_{\mathsf{WC}}sansserif_FTL , sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT; Small-loss bound g()𝑔g(\cdot)italic_g ( ⋅ )
for z=0,1,𝑧01normal-…z=0,1,\dotscitalic_z = 0 , 1 , … (epochs) do
      Let t=tz:=𝑡subscript𝑡𝑧:t=t_{z}\mathrel{\mathop{\mathrel{\mathop{:}}}}=italic_t = italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : =start time of zthsuperscript𝑧𝑡z^{th}italic_z start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT epoch, Lz*:=2zlogmL^{*}_{z}\mathrel{\mathop{\mathrel{\mathop{:}}}}=2^{z}\log mitalic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : = 2 start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT roman_log italic_m (current guess for L*superscript𝐿L^{*}italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT), Σtz:tz1𝖥𝖳𝖫=0subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝑡𝑧10\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}t_{z}-1}=0roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT = 0 ;
      while Σtz:t1𝖥𝖳𝖫g(Lz*)subscriptsuperscriptnormal-Σ𝖥𝖳𝖫normal-:subscript𝑡𝑧𝑡1𝑔subscriptsuperscript𝐿𝑧\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}t-1}\leq g(L^{*}_{z})roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_t - 1 end_POSTSUBSCRIPT ≤ italic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) do
           Set at=at1*subscript𝑎𝑡superscriptsubscript𝑎𝑡1a_{t}=a_{t-1}^{*}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT;
            // Play 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL
           Observe t()subscript𝑡\ell_{t}(\cdot)roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ );
           Update Lt()=Lt1()+t()subscript𝐿𝑡subscript𝐿𝑡1subscript𝑡L_{t}(\cdot)=L_{t-1}(\cdot)+\ell_{t}(\cdot)italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) = italic_L start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( ⋅ ) + roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) and Σtz:t𝖥𝖳𝖫=Σtz:t1𝖥𝖳𝖫+(Lt(at1*)Lt(at*))subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧𝑡subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}t}=\Sigma^{\mathsf{FTL}}_{t_{z% }\mathrel{\mathop{:}}t-1}+(L_{t}(a_{t-1}^{*})-L_{t}(a_{t}^{*}))roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_t end_POSTSUBSCRIPT = roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_t - 1 end_POSTSUBSCRIPT + ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) and t=t+1𝑡𝑡1t=t+1italic_t = italic_t + 1;
          
           end while
          Let τz:=minttzΣtz:t𝖥𝖳𝖫>g(Lz*)\tau_{z}\mathrel{\mathop{:}}=\min_{t\geq t_{z}}\Sigma^{\mathsf{FTL}}_{t_{z}% \mathrel{\mathop{:}}t}>g(L^{*}_{z})italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : = roman_min start_POSTSUBSCRIPT italic_t ≥ italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_t end_POSTSUBSCRIPT > italic_g ( italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) and t=τz+1𝑡subscript𝜏𝑧1t=\tau_{z}+1italic_t = italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1;
           // Check if loss incurred by 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT in this epoch violates the upper bound from Lz*superscriptsubscript𝐿𝑧L_{z}^{*}italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is correct
           while t=tztat,tLz*+2min{Σtz:t𝖥𝖳𝖫,g(Lz*)}+1superscriptsubscript𝑡subscript𝑡𝑧𝑡subscript𝑎𝑡subscriptnormal-ℓ𝑡superscriptsubscript𝐿𝑧2subscriptsuperscriptnormal-Σ𝖥𝖳𝖫normal-:subscript𝑡𝑧𝑡𝑔superscriptsubscript𝐿𝑧1\sum_{t=t_{z}}^{t}\langle a_{t},\ell_{t}\rangle\leq L_{z}^{*}+2\min\{\Sigma^{% \mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}t},g(L_{z}^{*})\}+1∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟨ italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩ ≤ italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + 2 roman_min { roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_t end_POSTSUBSCRIPT , italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) } + 1 do
                Set at=𝖠𝖫𝖦𝖶𝖢(τz+1t1)subscript𝑎𝑡subscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝜏𝑧1𝑡1a_{t}=\mathsf{ALG}_{\mathsf{WC}}(\ell_{\tau_{z}+1}^{t-1})italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) ;
                 // Play 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT forgetting losses before τz+1subscript𝜏𝑧1\tau_{z}+1italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1
                Observe t()subscript𝑡\ell_{t}(\cdot)roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ );
                Update Lt()=Lt1()+t()subscript𝐿𝑡subscript𝐿𝑡1subscript𝑡L_{t}(\cdot)=L_{t-1}(\cdot)+\ell_{t}(\cdot)italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) = italic_L start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( ⋅ ) + roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( ⋅ ) and Σtz:t𝖥𝖳𝖫=Σtz:t1𝖥𝖳𝖫+(Lt(at1*)Lt(at*))subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧𝑡subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}t}=\Sigma^{\mathsf{FTL}}_{t_{z% }\mathrel{\mathop{:}}t-1}+(L_{t}(a_{t-1}^{*})-L_{t}(a_{t}^{*}))roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_t end_POSTSUBSCRIPT = roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_t - 1 end_POSTSUBSCRIPT + ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) and t=t+1𝑡𝑡1t=t+1italic_t = italic_t + 1;
               
                end while
               
                end for
Algorithm 2 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART for Small-Loss

5 Conclusion

In this paper, we present 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART, a simple and black-box online learning algorithm that adapts to the data and achieves instance optimal regret with respect to 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and any given worst-case algorithm. We show that 𝖲𝖬𝖠𝖱𝖳𝖲𝖬𝖠𝖱𝖳\mathsf{SMART}sansserif_SMART only switches once from 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL to the worst-case algorithm, and attains a regret that is within a factor of e/(e1)1.58𝑒𝑒11.58e/(e-1)\approx 1.58italic_e / ( italic_e - 1 ) ≈ 1.58 of the minimum of the regret of FTL and the minimax regret over all input sequences; we also show that any algorithm must incur an extra factor of at least 1.431.431.431.43 establishing that our simple approach is surprisingly close to optimal. Furthermore, we extend SMART to incorporate a small-loss algorithm and obtain instance optimality with respect to the small-loss regret bound. Our approach relies on a novel reduction of instance optimal online learning to the ski-rental problem, and leverages tools from information theory and competitive analysis. Our work suggests several open problems for future research, such as finding instance optimal algorithms for bandit settings, or designing algorithms that can adapt to multiple reference algorithms besides FTL and minimax algorithms.

Acknowledgements

This work is supported by NSF grants CNS-1955997, CCF-2337796 and ECCS-1847393, and AFOSR grant FA9550-23-1-0301. This work was partially done when the authors were visitors at the Simons Institute for the Theory of Computing, UC Berkeley.

Appendix A Omitted proofs from Section 4

In this Section, we will establish the proofs of Theorem 5 and Corollary 1.

Recall Algorithm 2, where 𝖠𝖫𝖦𝖶𝖢(t1t2)subscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝑡1subscript𝑡2\mathsf{ALG}_{\mathsf{WC}}(\ell_{t_{1}}^{t_{2}})sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT ( roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) refers to the worst-case algorithm when the previously observed sequence is t1t2superscriptsubscriptsubscript𝑡1subscript𝑡2\ell_{t_{1}}^{t_{2}}roman_ℓ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT; in particular this would be equivalent to the action recommended at time t2+1subscript𝑡21t_{2}+1italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 1 after throwing away all the observed losses before t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We let Σt1:t2𝖥𝖳𝖫=i=t1t2(Li(ai1*)Li(ai*))subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡1subscript𝑡2superscriptsubscript𝑖subscript𝑡1subscript𝑡2subscript𝐿𝑖subscriptsuperscript𝑎𝑖1subscript𝐿𝑖subscriptsuperscript𝑎𝑖\Sigma^{\mathsf{FTL}}_{t_{1}\mathrel{\mathop{:}}t_{2}}=\sum_{i=t_{1}}^{t_{2}}(% L_{i}(a^{*}_{i-1})-L_{i}(a^{*}_{i}))roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT : italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ), which grows as the number of leader changes within i[t1,t2]𝑖subscript𝑡1subscript𝑡2i\in[t_{1},t_{2}]italic_i ∈ [ italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ].

We first have the following decomposition of the regret for any algorithm 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG that plays the sequence of actions at𝖠𝖫𝖦subscriptsuperscript𝑎𝖠𝖫𝖦𝑡a^{\mathsf{ALG}}_{t}italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at time t𝑡titalic_t.

Lemma 3.

The regret incurred by any sequence of actions (at𝖠𝖫𝖦)t[n+1]subscriptsubscriptsuperscript𝑎𝖠𝖫𝖦𝑡𝑡delimited-[]𝑛1(a^{\mathsf{ALG}}_{t})_{t\in[n+1]}( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_t ∈ [ italic_n + 1 ] end_POSTSUBSCRIPT can be written as

Reg(𝖠𝖫𝖦,n)Reg𝖠𝖫𝖦superscript𝑛\displaystyle\mathrm{Reg}(\mathsf{ALG},\ell^{n})roman_Reg ( sansserif_ALG , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) =t=1n(Lt(at𝖠𝖫𝖦)Lt(at+1𝖠𝖫𝖦)),absentsuperscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡1\displaystyle=\sum_{t=1}^{n}\left(L_{t}(a^{\mathsf{ALG}}_{t})-L_{t}(a^{\mathsf% {ALG}}_{t+1})\right),= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ) , (8)

where we let an+1𝖠𝖫𝖦:=an*a_{n+1}^{\mathsf{ALG}}\mathrel{\mathop{:}}=a_{n}^{*}italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT : = italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT.

Proof.
Ln(𝖠𝖫𝖦)subscript𝐿𝑛𝖠𝖫𝖦\displaystyle L_{n}(\mathsf{ALG})italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( sansserif_ALG ) =t=1nt(at𝖠𝖫𝖦)absentsuperscriptsubscript𝑡1𝑛subscript𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡\displaystyle=\sum_{t=1}^{n}\ell_{t}(a^{\mathsf{ALG}}_{t})= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (9)
=t=1n(Lt(at𝖠𝖫𝖦)Lt1(at𝖠𝖫𝖦))absentsuperscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡subscript𝐿𝑡1subscriptsuperscript𝑎𝖠𝖫𝖦𝑡\displaystyle=\sum_{t=1}^{n}\left(L_{t}(a^{\mathsf{ALG}}_{t})-L_{t-1}(a^{% \mathsf{ALG}}_{t})\right)= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) (10)
=Ln(an𝖠𝖫𝖦)+t=1n1(Lt(at𝖠𝖫𝖦)Lt(at+1𝖠𝖫𝖦))L0(a1𝖠𝖫𝖦).absentsubscript𝐿𝑛subscriptsuperscript𝑎𝖠𝖫𝖦𝑛superscriptsubscript𝑡1𝑛1subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡1subscript𝐿0subscriptsuperscript𝑎𝖠𝖫𝖦1\displaystyle=L_{n}(a^{\mathsf{ALG}}_{n})+\sum_{t=1}^{n-1}\left(L_{t}(a^{% \mathsf{ALG}}_{t})-L_{t}(a^{\mathsf{ALG}}_{t+1})\right)-L_{0}(a^{\mathsf{ALG}}% _{1}).= italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ) - italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) . (11)

This implies a regret decomposition of

Reg(𝖠𝖫𝖦,n)Reg𝖠𝖫𝖦superscript𝑛\displaystyle\mathrm{Reg}(\mathsf{ALG},\ell^{n})roman_Reg ( sansserif_ALG , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) =Ln(𝖠𝖫𝖦)Ln(an*)absentsubscript𝐿𝑛𝖠𝖫𝖦subscript𝐿𝑛superscriptsubscript𝑎𝑛\displaystyle=L_{n}(\mathsf{ALG})-L_{n}(a_{n}^{*})= italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( sansserif_ALG ) - italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) (12)
=Ln(an𝖠𝖫𝖦)Ln(an*)+t=1n1(Lt(at𝖠𝖫𝖦)Lt(at+1𝖠𝖫𝖦))L0(a1𝖠𝖫𝖦)absentsubscript𝐿𝑛subscriptsuperscript𝑎𝖠𝖫𝖦𝑛subscript𝐿𝑛superscriptsubscript𝑎𝑛superscriptsubscript𝑡1𝑛1subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡1subscript𝐿0subscriptsuperscript𝑎𝖠𝖫𝖦1\displaystyle=L_{n}(a^{\mathsf{ALG}}_{n})-L_{n}(a_{n}^{*})+\sum_{t=1}^{n-1}% \left(L_{t}(a^{\mathsf{ALG}}_{t})-L_{t}(a^{\mathsf{ALG}}_{t+1})\right)-L_{0}(a% ^{\mathsf{ALG}}_{1})= italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ) - italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) (13)

As L0(a1𝖠𝖫𝖦)=0subscript𝐿0subscriptsuperscript𝑎𝖠𝖫𝖦10L_{0}(a^{\mathsf{ALG}}_{1})=0italic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = 0, and an+1𝖠𝖫𝖦:=an*a_{n+1}^{\mathsf{ALG}}\mathrel{\mathop{:}}=a_{n}^{*}italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT : = italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, it follows that

Reg(𝖠𝖫𝖦,n)Reg𝖠𝖫𝖦superscript𝑛\displaystyle\mathrm{Reg}(\mathsf{ALG},\ell^{n})roman_Reg ( sansserif_ALG , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) =t=1n(Lt(at𝖠𝖫𝖦)Lt(at+1𝖠𝖫𝖦)).absentsuperscriptsubscript𝑡1𝑛subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡subscript𝐿𝑡subscriptsuperscript𝑎𝖠𝖫𝖦𝑡1\displaystyle=\sum_{t=1}^{n}\left(L_{t}(a^{\mathsf{ALG}}_{t})-L_{t}(a^{\mathsf% {ALG}}_{t+1})\right).= ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUPERSCRIPT sansserif_ALG end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ) . (14)

Next, we use this decomposition to establish the regret of any algorithm 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG that alternates between playing 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and another algorithm 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT.

Lemma 4.

Consider an algorithm 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG which alternates between playing 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT, where 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL is played in the intervals {[tz,τz]}z[zlast]subscriptsubscript𝑡𝑧subscript𝜏𝑧𝑧delimited-[]subscript𝑧normal-last\{[t_{z},\tau_{z}]\}_{z\in[z_{\mathrm{last}}]}{ [ italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ] } start_POSTSUBSCRIPT italic_z ∈ [ italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT, and 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT is played in intervals {[τz+1,tz+11]}z[zlast]subscriptsubscript𝜏𝑧1subscript𝑡𝑧11𝑧delimited-[]subscript𝑧normal-last\{[\tau_{z}+1,t_{z+1}-1]\}_{z\in[z_{\mathrm{last}}]}{ [ italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 , italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 ] } start_POSTSUBSCRIPT italic_z ∈ [ italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT. The regret of 𝖠𝖫𝖦𝖠𝖫𝖦\mathsf{ALG}sansserif_ALG is bounded by

Reg(𝖠𝖫𝖦,n)z(Σtz:τz1𝖥𝖳𝖫+Reg(𝖠𝖫𝖦𝖶𝖢,τz+1tz+11)+1).Reg𝖠𝖫𝖦superscript𝑛subscript𝑧subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝜏𝑧1Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝜏𝑧1subscript𝑡𝑧111\displaystyle\mathrm{Reg}(\mathsf{ALG},\ell^{n})\leq\sum_{z}\left(\Sigma^{% \mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}\tau_{z}-1}+\mathrm{Reg}(\mathsf{ALG}_% {\mathsf{WC}},\ell_{\tau_{z}+1}^{t_{z+1}-1})+1\right).roman_Reg ( sansserif_ALG , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT + roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ) + 1 ) . (15)
Proof.

We let tzlast=n+1subscript𝑡subscript𝑧last𝑛1t_{z_{\mathrm{last}}}=n+1italic_t start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT end_POSTSUBSCRIPT = italic_n + 1, and an+1=an*subscript𝑎𝑛1superscriptsubscript𝑎𝑛a_{n+1}=a_{n}^{*}italic_a start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = italic_a start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. We use Lemma 3 and rearrange the terms by grouping them by the FTL periods and the 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT periods.

RegReg\displaystyle\mathrm{Reg}roman_Reg (𝖠𝖫𝖦,n)𝖠𝖫𝖦superscript𝑛\displaystyle(\mathsf{ALG},\ell^{n})( sansserif_ALG , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )
=z(t=tztz+11(Lt(at)Lt(at+1)))absentsubscript𝑧superscriptsubscript𝑡subscript𝑡𝑧subscript𝑡𝑧11subscript𝐿𝑡subscript𝑎𝑡subscript𝐿𝑡subscript𝑎𝑡1\displaystyle=\sum_{z}\left(\sum_{t=t_{z}}^{t_{z+1}-1}(L_{t}(a_{t})-L_{t}(a_{t% +1}))\right)= ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ) )
=z(t=tzτz1(Lt(at1*)Lt(at*))+Lτz(aτz1*)Lτz(aτz+1)+t=τz+1tz+11(Lt(at)Lt(at+1)))absentsubscript𝑧superscriptsubscript𝑡subscript𝑡𝑧subscript𝜏𝑧1subscript𝐿𝑡superscriptsubscript𝑎𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡subscript𝐿subscript𝜏𝑧superscriptsubscript𝑎subscript𝜏𝑧1subscript𝐿subscript𝜏𝑧subscript𝑎subscript𝜏𝑧1superscriptsubscript𝑡subscript𝜏𝑧1subscript𝑡𝑧11subscript𝐿𝑡subscript𝑎𝑡subscript𝐿𝑡subscript𝑎𝑡1\displaystyle=\sum_{z}\left(\sum_{t=t_{z}}^{\tau_{z}-1}(L_{t}(a_{t-1}^{*})-L_{% t}(a_{t}^{*}))+L_{\tau_{z}}(a_{\tau_{z}-1}^{*})-L_{\tau_{z}}(a_{\tau_{z}+1})+% \sum_{t=\tau_{z}+1}^{t_{z+1}-1}(L_{t}(a_{t})-L_{t}(a_{t+1}))\right)= ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) + italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) ) )
=z(t=tzτz1(Lt(at1*)Lt(at*))+Lτz(aτz1*)+t=τz+1tz+11t(at)Ltz+11(atz+1))absentsubscript𝑧superscriptsubscript𝑡subscript𝑡𝑧subscript𝜏𝑧1subscript𝐿𝑡superscriptsubscript𝑎𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡subscript𝐿subscript𝜏𝑧superscriptsubscript𝑎subscript𝜏𝑧1superscriptsubscript𝑡subscript𝜏𝑧1subscript𝑡𝑧11subscript𝑡subscript𝑎𝑡subscript𝐿subscript𝑡𝑧11subscript𝑎subscript𝑡𝑧1\displaystyle=\sum_{z}\left(\sum_{t=t_{z}}^{\tau_{z}-1}(L_{t}(a_{t-1}^{*})-L_{% t}(a_{t}^{*}))+L_{\tau_{z}}(a_{\tau_{z}-1}^{*})+\sum_{t=\tau_{z}+1}^{t_{z+1}-1% }\ell_{t}(a_{t})-L_{t_{z+1}-1}(a_{t_{z+1}})\right)= ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) + italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_t = italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) )
=zt=tzτz1(Lt(at1*)Lt(at*))+z(t=τz+1tz+11t(at)+Lτz(aτz1*)Ltz+11(atz+11*)).absentsubscript𝑧superscriptsubscript𝑡subscript𝑡𝑧subscript𝜏𝑧1subscript𝐿𝑡superscriptsubscript𝑎𝑡1subscript𝐿𝑡superscriptsubscript𝑎𝑡subscript𝑧superscriptsubscript𝑡subscript𝜏𝑧1subscript𝑡𝑧11subscript𝑡subscript𝑎𝑡subscript𝐿subscript𝜏𝑧superscriptsubscript𝑎subscript𝜏𝑧1subscript𝐿subscript𝑡𝑧11superscriptsubscript𝑎subscript𝑡𝑧11\displaystyle=\sum_{z}\sum_{t=t_{z}}^{\tau_{z}-1}(L_{t}(a_{t-1}^{*})-L_{t}(a_{% t}^{*}))+\sum_{z}\left(\sum_{t=\tau_{z}+1}^{t_{z+1}-1}\ell_{t}(a_{t})+L_{\tau_% {z}}(a_{\tau_{z-1}}^{*})-L_{t_{z+1}-1}(a_{t_{z+1}-1}^{*})\right).= ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) + ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_t = italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) .

We bound the first term by the FTL regret. Recall the notation

Σtz:τz1𝖥𝖳𝖫:=t=tzτz1(Lt(at1*)Lt(at*)).\displaystyle\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}\tau_{z}-1}% \mathrel{\mathop{:}}=\sum_{t=t_{z}}^{\tau_{z}-1}(L_{t}(a_{t-1}^{*})-L_{t}(a_{t% }^{*})).roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT : = ∑ start_POSTSUBSCRIPT italic_t = italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ) . (16)

Because we are playing FTL at both time τzsubscript𝜏𝑧\tau_{z}italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT and τz1subscript𝜏𝑧1\tau_{z}-1italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1, it holds that

Lτz(aτz1*)=Lτz1(aτz1*)+τz(aτz1*)Lτz(aτz*)+1.subscript𝐿subscript𝜏𝑧superscriptsubscript𝑎subscript𝜏𝑧1subscript𝐿subscript𝜏𝑧1superscriptsubscript𝑎subscript𝜏𝑧1subscriptsubscript𝜏𝑧superscriptsubscript𝑎subscript𝜏𝑧1subscript𝐿subscript𝜏𝑧superscriptsubscript𝑎subscript𝜏𝑧1L_{\tau_{z}}(a_{\tau_{z}-1}^{*})=L_{\tau_{z}-1}(a_{\tau_{z}-1}^{*})+\ell_{\tau% _{z}}(a_{\tau_{z}-1}^{*})\leq L_{\tau_{z}}(a_{\tau_{z}}^{*})+1.italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) = italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + roman_ℓ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≤ italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + 1 .

To bound the second term, we will show that

t=τz+1tz+11superscriptsubscript𝑡subscript𝜏𝑧1subscript𝑡𝑧11\displaystyle\sum_{t=\tau_{z}+1}^{t_{z+1}-1}∑ start_POSTSUBSCRIPT italic_t = italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT t(at)+Lτz(aτz1*)Ltz+11(atz+11*)subscript𝑡subscript𝑎𝑡subscript𝐿subscript𝜏𝑧superscriptsubscript𝑎subscript𝜏𝑧1subscript𝐿subscript𝑡𝑧11superscriptsubscript𝑎subscript𝑡𝑧11\displaystyle\ell_{t}(a_{t})+L_{\tau_{z}}(a_{\tau_{z-1}}^{*})-L_{t_{z+1}-1}(a_% {t_{z+1}-1}^{*})roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT )
t=τz+1tz+11t(at)+Lτz(aτz*)Ltz+11(atz+11*)+1absentsuperscriptsubscript𝑡subscript𝜏𝑧1subscript𝑡𝑧11subscript𝑡subscript𝑎𝑡subscript𝐿subscript𝜏𝑧superscriptsubscript𝑎subscript𝜏𝑧subscript𝐿subscript𝑡𝑧11superscriptsubscript𝑎subscript𝑡𝑧111\displaystyle\leq\sum_{t=\tau_{z}+1}^{t_{z+1}-1}\ell_{t}(a_{t})+L_{\tau_{z}}(a% _{\tau_{z}}^{*})-L_{t_{z+1}-1}(a_{t_{z+1}-1}^{*})+1≤ ∑ start_POSTSUBSCRIPT italic_t = italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) - italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + 1 (17)
=t=τz+1tz+11t(at)+minaLτz(a)minaLtz+11(a)+1absentsuperscriptsubscript𝑡subscript𝜏𝑧1subscript𝑡𝑧11subscript𝑡subscript𝑎𝑡subscript𝑎subscript𝐿subscript𝜏𝑧𝑎subscript𝑎subscript𝐿subscript𝑡𝑧11𝑎1\displaystyle=\sum_{t=\tau_{z}+1}^{t_{z+1}-1}\ell_{t}(a_{t})+\min_{a}L_{\tau_{% z}}(a)-\min_{a}L_{t_{z+1}-1}(a)+1= ∑ start_POSTSUBSCRIPT italic_t = italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + roman_min start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ) - roman_min start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_a ) + 1 (18)
t=τz+1tz+11t(at)mina(Ltz+11(a)Lτz(a))+1absentsuperscriptsubscript𝑡subscript𝜏𝑧1subscript𝑡𝑧11subscript𝑡subscript𝑎𝑡subscript𝑎subscript𝐿subscript𝑡𝑧11𝑎subscript𝐿subscript𝜏𝑧𝑎1\displaystyle\leq\sum_{t=\tau_{z}+1}^{t_{z+1}-1}\ell_{t}(a_{t})-\min_{a}\left(% L_{t_{z+1}-1}(a)-L_{\tau_{z}}(a)\right)+1≤ ∑ start_POSTSUBSCRIPT italic_t = italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - roman_min start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( italic_L start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ( italic_a ) - italic_L start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_a ) ) + 1 (19)
=Reg(𝖠𝖫𝖦𝖶𝖢,τz+1tz+11)+1.absentRegsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝜏𝑧1subscript𝑡𝑧111\displaystyle=\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{\tau_{z}+1}^{t_{z+% 1}-1})+1.= roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ) + 1 . (20)

We now complete the proof of Theorem 3 by showing that the conditions that determine the switching time between 𝖥𝖳𝖫𝖥𝖳𝖫\mathsf{FTL}sansserif_FTL and 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT are appropriately chosen to upper bound Σtz:τz1𝖥𝖳𝖫subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝜏𝑧1\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}\tau_{z}-1}roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT and Reg(𝖠𝖫𝖦𝖶𝖢,τz+1tz+11)Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝜏𝑧1subscript𝑡𝑧11\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{\tau_{z}+1}^{t_{z+1}-1})roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ). Firstly, note that for any z<zlast𝑧subscript𝑧lastz<z_{\mathrm{last}}italic_z < italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT, Reg(𝖠𝖫𝖦𝖶𝖢,τz+1tz+11)>g(Lz*)Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝜏𝑧1subscript𝑡𝑧11𝑔superscriptsubscript𝐿𝑧\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{\tau_{z}+1}^{t_{z+1}-1})>g(L_{z}% ^{*})roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ) > italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ), which implies that L*>Lz*superscript𝐿superscriptsubscript𝐿𝑧L^{*}>L_{z}^{*}italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT > italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. In particular, the epoch zlast1subscript𝑧last1z_{\mathrm{last}}-1italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT - 1 was exited, which implies that444Here we have implicitly assumed that L*>logmsuperscript𝐿𝑚L^{*}>\log mitalic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT > roman_log italic_m—if not, then there is only one epoch, zlast=0subscript𝑧last0z_{\mathrm{last}}=0italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT = 0 and the result is readily implied by Corollary 2.

2zlast1logmL*zlastlog(L*logm)+1superscript2subscript𝑧last1𝑚superscript𝐿subscript𝑧lastsuperscript𝐿𝑚12^{z_{\mathrm{last}}-1}\log m\leq L^{*}\implies z_{\mathrm{last}}\leq\log\left% (\frac{L^{*}}{\log m}\right)+12 start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT roman_log italic_m ≤ italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ⟹ italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT ≤ roman_log ( divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) + 1

By the stopping condition of the epoch, Reg(𝖠𝖫𝖦𝖶𝖢,τz+1tz+11)Reg(𝖠𝖫𝖦𝖶𝖢,τz+1tz+12)+1g(Lz*)+1Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝜏𝑧1subscript𝑡𝑧11Regsubscript𝖠𝖫𝖦𝖶𝖢superscriptsubscriptsubscript𝜏𝑧1subscript𝑡𝑧121𝑔superscriptsubscript𝐿𝑧1\mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{\tau_{z}+1}^{t_{z+1}-1})\leq% \mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}},\ell_{\tau_{z}+1}^{t_{z+1}-2})+1\leq g% (L_{z}^{*})+1roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT ) ≤ roman_Reg ( sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT , roman_ℓ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_z + 1 end_POSTSUBSCRIPT - 2 end_POSTSUPERSCRIPT ) + 1 ≤ italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + 1, such that substituting into Lemma 4 implies

Reg(𝖲𝖬𝖠𝖱𝖳,n)z(Σtz:τz1𝖥𝖳𝖫+g(Lz*)+2).Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛subscript𝑧subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝜏𝑧1𝑔superscriptsubscript𝐿𝑧2\displaystyle\mathrm{Reg}(\mathsf{SMART},\ell^{n})\leq\sum_{z}\left(\Sigma^{% \mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}\tau_{z}-1}+g(L_{z}^{*})+2\right).roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT + italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + 2 ) . (21)

Also, it always holds that Σtz:τz1𝖥𝖳𝖫g(Lz*)subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝜏𝑧1𝑔superscriptsubscript𝐿𝑧\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}\tau_{z}-1}\leq g(L_{z}^{*})roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ≤ italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) and for z<zlast𝑧subscript𝑧lastz<z_{\mathrm{last}}italic_z < italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT, Σtz:τz𝖥𝖳𝖫>g(Lz*)subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝜏𝑧𝑔superscriptsubscript𝐿𝑧\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}\tau_{z}}>g(L_{z}^{*})roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT > italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). Therefore

zzlast1g(Lz*)<zΣtz:τz1𝖥𝖳𝖫Reg(𝖥𝖳𝖫,n)superscriptsubscript𝑧subscript𝑧last1𝑔superscriptsubscript𝐿𝑧subscript𝑧subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝜏𝑧1Reg𝖥𝖳𝖫superscript𝑛\sum_{z}^{z_{\mathrm{last}}-1}g(L_{z}^{*})<\sum_{z}\Sigma^{\mathsf{FTL}}_{t_{z% }\mathrel{\mathop{:}}\tau_{z}-1}\leq\mathrm{Reg}(\mathsf{FTL},\ell^{n})∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) < ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ≤ roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )

and zΣtz:τz1𝖥𝖳𝖫zzlastg(Lz*)subscript𝑧subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝜏𝑧1superscriptsubscript𝑧subscript𝑧last𝑔superscriptsubscript𝐿𝑧\sum_{z}\Sigma^{\mathsf{FTL}}_{t_{z}\mathrel{\mathop{:}}\tau_{z}-1}\leq\sum_{z% }^{z_{\mathrm{last}}}g(L_{z}^{*})∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT ≤ ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ).

To put it all together, if zzlastg(Lz*)Reg(𝖥𝖳𝖫,n)superscriptsubscript𝑧subscript𝑧last𝑔superscriptsubscript𝐿𝑧Reg𝖥𝖳𝖫superscript𝑛\sum_{z}^{z_{\mathrm{last}}}g(L_{z}^{*})\leq\mathrm{Reg}(\mathsf{FTL},\ell^{n})∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) ≤ roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ), then

Reg(𝖲𝖬𝖠𝖱𝖳,n)2zg(Lz*)+2zlast.Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛2subscript𝑧𝑔superscriptsubscript𝐿𝑧2subscript𝑧last\displaystyle\mathrm{Reg}(\mathsf{SMART},\ell^{n})\leq 2\sum_{z}g(L_{z}^{*})+2% z_{\mathrm{last}}.roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ 2 ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) + 2 italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT . (22)

If Reg(𝖥𝖳𝖫,n)<zzlastg(Lz*)Reg𝖥𝖳𝖫superscript𝑛superscriptsubscript𝑧subscript𝑧last𝑔superscriptsubscript𝐿𝑧\mathrm{Reg}(\mathsf{FTL},\ell^{n})<\sum_{z}^{z_{\mathrm{last}}}g(L_{z}^{*})roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) < ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ), then it must be that in the last epoch the algorithm never switches to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT. If it switched to 𝖠𝖫𝖦𝖶𝖢subscript𝖠𝖫𝖦𝖶𝖢\mathsf{ALG}_{\mathsf{WC}}sansserif_ALG start_POSTSUBSCRIPT sansserif_WC end_POSTSUBSCRIPT it would imply that Reg(𝖥𝖳𝖫,n)zΣtz:τz𝖥𝖳𝖫>zzlastg(Lz*)Reg𝖥𝖳𝖫superscript𝑛subscript𝑧subscriptsuperscriptΣ𝖥𝖳𝖫:subscript𝑡𝑧subscript𝜏𝑧superscriptsubscript𝑧subscript𝑧last𝑔superscriptsubscript𝐿𝑧\mathrm{Reg}(\mathsf{FTL},\ell^{n})\geq\sum_{z}\Sigma^{\mathsf{FTL}}_{t_{z}% \mathrel{\mathop{:}}\tau_{z}}>\sum_{z}^{z_{\mathrm{last}}}g(L_{z}^{*})roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≥ ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT roman_Σ start_POSTSUPERSCRIPT sansserif_FTL end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT : italic_τ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT > ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) which would violate the assumption that Reg(𝖥𝖳𝖫,n)<zzlastg(Lz*)Reg𝖥𝖳𝖫superscript𝑛superscriptsubscript𝑧subscript𝑧last𝑔superscriptsubscript𝐿𝑧\mathrm{Reg}(\mathsf{FTL},\ell^{n})<\sum_{z}^{z_{\mathrm{last}}}g(L_{z}^{*})roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) < ∑ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_g ( italic_L start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). Therefore it must be that

Reg(𝖲𝖬𝖠𝖱𝖳,n)2Reg(𝖥𝖳𝖫,n)+2zlast.Reg𝖲𝖬𝖠𝖱𝖳superscript𝑛2Reg𝖥𝖳𝖫superscript𝑛2subscript𝑧last\displaystyle\mathrm{Reg}(\mathsf{SMART},\ell^{n})\leq 2\mathrm{Reg}(\mathsf{% FTL},\ell^{n})+2z_{\mathrm{last}}.roman_Reg ( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ 2 roman_R roman_e roman_g ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) + 2 italic_z start_POSTSUBSCRIPT roman_last end_POSTSUBSCRIPT . (23)

As a result, it follows that (putting the L*logmsuperscript𝐿𝑚L^{*}\leq\log mitalic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ≤ roman_log italic_m and the L*>logmsuperscript𝐿𝑚L^{*}>\log mitalic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT > roman_log italic_m cases together)

RegReg\displaystyle\mathrm{Reg}roman_Reg (𝖲𝖬𝖠𝖱𝖳,n)𝖲𝖬𝖠𝖱𝖳superscript𝑛\displaystyle(\mathsf{SMART},\ell^{n})( sansserif_SMART , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT )
2min(Reg(𝖥𝖳𝖫,n),z=0log(1+L*logm)+1g(2zlogm))+2log(1+L*logm)+2.absent2Reg𝖥𝖳𝖫superscript𝑛superscriptsubscript𝑧01superscript𝐿𝑚1𝑔superscript2𝑧𝑚21superscript𝐿𝑚2\displaystyle\leq 2\min\left(\mathrm{Reg}(\mathsf{FTL},\ell^{n}),\sum_{z=0}^{% \log\left(1+\frac{L^{*}}{\log m}\right)+1}g(2^{z}\log m)\right)+2\log\left(1+% \frac{L^{*}}{\log m}\right)+2.≤ 2 roman_min ( roman_Reg ( sansserif_FTL , roman_ℓ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , ∑ start_POSTSUBSCRIPT italic_z = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log ( 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) + 1 end_POSTSUPERSCRIPT italic_g ( 2 start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT roman_log italic_m ) ) + 2 roman_log ( 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) + 2 . (24)

A.1 Proof of Corollary 1

This follows from Theorem 5 and by calculating

z=0log(1+L*logm)+1superscriptsubscript𝑧01superscript𝐿𝑚1\displaystyle\sum_{z=0}^{\log\left(1+\frac{L^{*}}{\log m}\right)+1}∑ start_POSTSUBSCRIPT italic_z = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log ( 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) + 1 end_POSTSUPERSCRIPT g(2zlogm)𝑔superscript2𝑧𝑚\displaystyle g(2^{z}\log m)italic_g ( 2 start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT roman_log italic_m )
=22logmz=0log(1+L*logm)+12z/2+κlogmlog(1+L*logm)+κlogmabsent22𝑚superscriptsubscript𝑧01superscript𝐿𝑚1superscript2𝑧2𝜅𝑚1superscript𝐿𝑚𝜅𝑚\displaystyle=2\sqrt{2}\log m\sum_{z=0}^{\log\left(1+\frac{L^{*}}{\log m}% \right)+1}2^{z/2}+\kappa\log m\log\left(1+\frac{L^{*}}{\log m}\right)+\kappa\log m= 2 square-root start_ARG 2 end_ARG roman_log italic_m ∑ start_POSTSUBSCRIPT italic_z = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_log ( 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) + 1 end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_z / 2 end_POSTSUPERSCRIPT + italic_κ roman_log italic_m roman_log ( 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) + italic_κ roman_log italic_m
logm42211+L*logm+κlogmlog(1+L*logm)+κlogmabsent𝑚42211superscript𝐿𝑚𝜅𝑚1superscript𝐿𝑚𝜅𝑚\displaystyle\leq\log m\frac{4\sqrt{2}}{\sqrt{2}-1}\sqrt{1+\frac{L^{*}}{\log m% }}+\kappa\log m\log\left(1+\frac{L^{*}}{\log m}\right)+\kappa\log m≤ roman_log italic_m divide start_ARG 4 square-root start_ARG 2 end_ARG end_ARG start_ARG square-root start_ARG 2 end_ARG - 1 end_ARG square-root start_ARG 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG end_ARG + italic_κ roman_log italic_m roman_log ( 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) + italic_κ roman_log italic_m
102log2m+2L*logm+κlogmlog(1+L*logm)+κlogmabsent102superscript2𝑚2superscript𝐿𝑚𝜅𝑚1superscript𝐿𝑚𝜅𝑚\displaystyle\leq 10\sqrt{2\log^{2}m+2L^{*}\log m}+\kappa\log m\log\left(1+% \frac{L^{*}}{\log m}\right)+\kappa\log m≤ 10 square-root start_ARG 2 roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m + 2 italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT roman_log italic_m end_ARG + italic_κ roman_log italic_m roman_log ( 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) + italic_κ roman_log italic_m
(a)102L*logm+κlog(1+L*logm)logm+102logm+κlogmsuperscript(a)absent102superscript𝐿𝑚𝜅1superscript𝐿𝑚𝑚102𝑚𝜅𝑚\displaystyle\stackrel{{\scriptstyle\text{(a)}}}{{\leq}}10\sqrt{2L^{*}\log m}+% \kappa\log\left(1+\frac{L^{*}}{\log m}\right)\log m+10\sqrt{2}\log m+\kappa\log mstart_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG (a) end_ARG end_RELOP 10 square-root start_ARG 2 italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT roman_log italic_m end_ARG + italic_κ roman_log ( 1 + divide start_ARG italic_L start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_ARG start_ARG roman_log italic_m end_ARG ) roman_log italic_m + 10 square-root start_ARG 2 end_ARG roman_log italic_m + italic_κ roman_log italic_m

where (a) follows since for nonnegative a,b𝑎𝑏a,bitalic_a , italic_b a+ba+b𝑎𝑏𝑎𝑏\sqrt{a+b}\leq\sqrt{a}+\sqrt{b}square-root start_ARG italic_a + italic_b end_ARG ≤ square-root start_ARG italic_a end_ARG + square-root start_ARG italic_b end_ARG.

Appendix B Proof of Theorem 3

Consider a large even n𝑛nitalic_n. We then have for horizon size n+1𝑛1n+1italic_n + 1, Reg(𝖥𝖳𝖫,yn+1)=c(yn)2Reg𝖥𝖳𝖫superscript𝑦𝑛1𝑐superscript𝑦𝑛2\mathrm{Reg}(\mathsf{FTL},y^{n+1})=\frac{c(y^{n})}{2}roman_Reg ( sansserif_FTL , italic_y start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) = divide start_ARG italic_c ( italic_y start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) end_ARG start_ARG 2 end_ARG. Moreover, Reg(𝖢𝗈𝗏𝖾𝗋,yn+1)=fn+1Reg𝖢𝗈𝗏𝖾𝗋superscript𝑦𝑛1subscript𝑓𝑛1\mathrm{Reg}(\mathsf{Cover},y^{n+1})=f_{n+1}roman_Reg ( sansserif_Cover , italic_y start_POSTSUPERSCRIPT italic_n + 1 end_POSTSUPERSCRIPT ) = italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT. Let555Note that k𝑘kitalic_k in this definition does not counting the origin as as a line crossing.

pn,k:=[c(ϵn)=k+1].\displaystyle p_{n,k}\mathrel{\mathop{\mathrel{\mathop{:}}}}=\mathbb{P}[c(% \epsilon^{n})=k+1].italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT : = blackboard_P [ italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_k + 1 ] . (25)

We then have,

𝔼[min{c(ϵn),2fn+1}]𝔼𝑐superscriptitalic-ϵ𝑛2subscript𝑓𝑛1\displaystyle\operatorname{\mathbb{E}}[\min\{c(\epsilon^{n}),2f_{n+1}\}]blackboard_E [ roman_min { italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT } ] =k=1nmin{k,2fn+1}Pr[c(ϵn)=k]absentsuperscriptsubscript𝑘1𝑛𝑘2subscript𝑓𝑛1Pr𝑐superscriptitalic-ϵ𝑛𝑘\displaystyle=\sum_{k=1}^{n}\min\{k,2f_{n+1}\}\Pr[c(\epsilon^{n})=k]= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min { italic_k , 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT } roman_Pr [ italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) = italic_k ]
=k=1nmin{k,2fn+1}pn,k1absentsuperscriptsubscript𝑘1𝑛𝑘2subscript𝑓𝑛1subscript𝑝𝑛𝑘1\displaystyle=\sum_{k=1}^{n}\min\{k,2f_{n+1}\}p_{n,k-1}= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min { italic_k , 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT } italic_p start_POSTSUBSCRIPT italic_n , italic_k - 1 end_POSTSUBSCRIPT
=k=0nmin{k+1,2fn+1}pn,kabsentsuperscriptsubscript𝑘0𝑛𝑘12subscript𝑓𝑛1subscript𝑝𝑛𝑘\displaystyle=\sum_{k=0}^{n}\min\{k+1,2f_{n+1}\}p_{n,k}= ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_min { italic_k + 1 , 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT } italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT
=k=02fn+11(k+1)pn,k+2fn+1[c(ϵn)2fn+1]absentsuperscriptsubscript𝑘02subscript𝑓𝑛11𝑘1subscript𝑝𝑛𝑘2subscript𝑓𝑛1delimited-[]𝑐superscriptitalic-ϵ𝑛2subscript𝑓𝑛1\displaystyle=\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}(k+1)p_{n,k}+2f_{n+1}% \mathbb{P}[c(\epsilon^{n})\geq 2f_{n+1}]= ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT ( italic_k + 1 ) italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT + 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT blackboard_P [ italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≥ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ]
=k=02fn+11(k+1)pn,k+2fn+1[c(ϵn)2fn+1]+[c(ϵn)2fn+1]absentsuperscriptsubscript𝑘02subscript𝑓𝑛11𝑘1subscript𝑝𝑛𝑘2subscript𝑓𝑛1delimited-[]𝑐superscriptitalic-ϵ𝑛2subscript𝑓𝑛1delimited-[]𝑐superscriptitalic-ϵ𝑛2subscript𝑓𝑛1\displaystyle=\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}(k+1)p_{n,k}+2f_{n+1}% \mathbb{P}[c(\epsilon^{n})\geq 2f_{n+1}]+\mathbb{P}[c(\epsilon^{n})\leq\lfloor 2% f_{n+1}\rfloor]= ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT ( italic_k + 1 ) italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT + 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT blackboard_P [ italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≥ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ] + blackboard_P [ italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ ]
=k=02fn+11kpn,k+2fn+1(1k=02fn+11pn,k)+[c(ϵn)2fn+1]absentsuperscriptsubscript𝑘02subscript𝑓𝑛11𝑘subscript𝑝𝑛𝑘2subscript𝑓𝑛11superscriptsubscript𝑘02subscript𝑓𝑛11subscript𝑝𝑛𝑘delimited-[]𝑐superscriptitalic-ϵ𝑛2subscript𝑓𝑛1\displaystyle=\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}kp_{n,k}+2f_{n+1}\left(1-% \sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}p_{n,k}\right)+\mathbb{P}[c(\epsilon^{n}% )\leq\lfloor 2f_{n+1}\rfloor]= ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT + 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ( 1 - ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT ) + blackboard_P [ italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ ] (26)

Upon dividing 26 by 2fn+12subscript𝑓𝑛12f_{n+1}2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT, we get

𝔼[min{c(ϵn),2fn+1}]2fn+1=k=02fn+11kpn,k2fn+1+(1k=02fn+11pn,k)+[c(ϵn)2fn+1]2fn+1.𝔼𝑐superscriptitalic-ϵ𝑛2subscript𝑓𝑛12subscript𝑓𝑛1superscriptsubscript𝑘02subscript𝑓𝑛11𝑘subscript𝑝𝑛𝑘2subscript𝑓𝑛11superscriptsubscript𝑘02subscript𝑓𝑛11subscript𝑝𝑛𝑘delimited-[]𝑐superscriptitalic-ϵ𝑛2subscript𝑓𝑛12subscript𝑓𝑛1\displaystyle\frac{\operatorname{\mathbb{E}}[\min\{c(\epsilon^{n}),2f_{n+1}\}]% }{2f_{n+1}}=\frac{\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}kp_{n,k}}{2f_{n+1}}+% \left(1-\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}p_{n,k}\right)+\frac{\mathbb{P}[% c(\epsilon^{n})\leq\lfloor 2f_{n+1}\rfloor]}{2f_{n+1}}.divide start_ARG blackboard_E [ roman_min { italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT } ] end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_ARG = divide start_ARG ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_ARG + ( 1 - ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT ) + divide start_ARG blackboard_P [ italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) ≤ ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ ] end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_ARG . (27)

Note that the third term vanishes since fn+1subscript𝑓𝑛1f_{n+1}\to\inftyitalic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT → ∞ and 0[]10delimited-[]10\leq\mathbb{P}[\cdot]\leq 10 ≤ blackboard_P [ ⋅ ] ≤ 1. We will now separately evaluate the first two terms in 27. To do this, we require an auxiliary lemma, the proof of which is provided later.

Lemma 5.

If kCn𝑘𝐶𝑛k\leq C\sqrt{n}italic_k ≤ italic_C square-root start_ARG italic_n end_ARG for an absolute constant C𝐶Citalic_C, then for large enough n𝑛nitalic_n (n32C2𝑛32superscript𝐶2n\geq 32C^{2}italic_n ≥ 32 italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT suffices) we have

e16C3npn,k2nπek2/2n1C/n12C/ne16C3n.superscript𝑒16superscript𝐶3𝑛subscript𝑝𝑛𝑘2𝑛𝜋superscript𝑒superscript𝑘22𝑛1𝐶𝑛12𝐶𝑛superscript𝑒16superscript𝐶3𝑛\displaystyle e^{-\frac{16C^{3}}{\sqrt{n}}}\leq\frac{p_{n,k}}{\sqrt{\frac{2}{n% \pi}}e^{-k^{2}/2n}}\leq\sqrt{\frac{1-C/\sqrt{n}}{1-2C/\sqrt{n}}}e^{\frac{16C^{% 3}}{\sqrt{n}}}.italic_e start_POSTSUPERSCRIPT - divide start_ARG 16 italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_POSTSUPERSCRIPT ≤ divide start_ARG italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG italic_e start_POSTSUPERSCRIPT - italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 italic_n end_POSTSUPERSCRIPT end_ARG ≤ square-root start_ARG divide start_ARG 1 - italic_C / square-root start_ARG italic_n end_ARG end_ARG start_ARG 1 - 2 italic_C / square-root start_ARG italic_n end_ARG end_ARG end_ARG italic_e start_POSTSUPERSCRIPT divide start_ARG 16 italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG end_POSTSUPERSCRIPT . (28)

That is,

pn,k=2nπek2/2n(1+o(1)).subscript𝑝𝑛𝑘2𝑛𝜋superscript𝑒superscript𝑘22𝑛1𝑜1p_{n,k}=\sqrt{\frac{2}{n\pi}}e^{-k^{2}/2n}(1+o(1)).italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG italic_e start_POSTSUPERSCRIPT - italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 italic_n end_POSTSUPERSCRIPT ( 1 + italic_o ( 1 ) ) .

We now evaluate the first term in 27. Since 2fn+112n2subscript𝑓𝑛112𝑛\lfloor 2f_{n+1}\rfloor-1\leq 2\sqrt{n}⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 ≤ 2 square-root start_ARG italic_n end_ARG, we invoke Lemma 5 to evaluate

k=02fn+11kpn,ksuperscriptsubscript𝑘02subscript𝑓𝑛11𝑘subscript𝑝𝑛𝑘\displaystyle\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}kp_{n,k}∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT =(1+o(1))k=02fn+11k2nπek22nabsent1𝑜1superscriptsubscript𝑘02subscript𝑓𝑛11𝑘2𝑛𝜋superscript𝑒superscript𝑘22𝑛\displaystyle=(1+o(1))\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}k\sqrt{\frac{2}{n% \pi}}e^{-\frac{k^{2}}{2n}}= ( 1 + italic_o ( 1 ) ) ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT (29)
=(1+o(1))2nπk=02fn+11kek22nabsent1𝑜12𝑛𝜋superscriptsubscript𝑘02subscript𝑓𝑛11𝑘superscript𝑒superscript𝑘22𝑛\displaystyle=(1+o(1))\sqrt{\frac{2}{n\pi}}\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor% -1}ke^{-\frac{k^{2}}{2n}}= ( 1 + italic_o ( 1 ) ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT (30)

Now, we note that since xxex22nmaps-to𝑥𝑥superscript𝑒superscript𝑥22𝑛x\mapsto xe^{-\frac{x^{2}}{2n}}italic_x ↦ italic_x italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT is increasing on (0,2fn+11)02subscript𝑓𝑛11(0,\lfloor 2f_{n+1}\rfloor-1)( 0 , ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 ) , by a Riemann approximation, we have that

02fn+12xex22n𝑑x1k=02fn+11kek22n02fn+1xex22n𝑑xsuperscriptsubscript02subscript𝑓𝑛12𝑥superscript𝑒superscript𝑥22𝑛differential-d𝑥1superscriptsubscript𝑘02subscript𝑓𝑛11𝑘superscript𝑒superscript𝑘22𝑛superscriptsubscript02subscript𝑓𝑛1𝑥superscript𝑒superscript𝑥22𝑛differential-d𝑥\displaystyle\int_{0}^{2f_{n+1}-2}xe^{-\frac{x^{2}}{2n}}dx-1\leq\sum_{k=0}^{% \lfloor 2f_{n+1}\rfloor-1}ke^{-\frac{k^{2}}{2n}}\leq\int_{0}^{2f_{n+1}}xe^{-% \frac{x^{2}}{2n}}dx∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT - 2 end_POSTSUPERSCRIPT italic_x italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT italic_d italic_x - 1 ≤ ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT ≤ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_x italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT italic_d italic_x (31)

and evaluating

02fn+1xex22n𝑑xsuperscriptsubscript02subscript𝑓𝑛1𝑥superscript𝑒superscript𝑥22𝑛differential-d𝑥\displaystyle\int_{0}^{2f_{n+1}}xe^{-\frac{x^{2}}{2n}}dx∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_x italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT italic_d italic_x =1204fn+12et2n𝑑tabsent12superscriptsubscript04superscriptsubscript𝑓𝑛12superscript𝑒𝑡2𝑛differential-d𝑡\displaystyle=\frac{1}{2}\int_{0}^{4f_{n+1}^{2}}e^{-\frac{t}{2n}}dt= divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_t end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT italic_d italic_t
=n(1e4fn+122n)absent𝑛1superscript𝑒4superscriptsubscript𝑓𝑛122𝑛\displaystyle=n\left(1-e^{-\frac{4f_{n+1}^{2}}{2n}}\right)= italic_n ( 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG 4 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT )
=n(1e1π(1+o(1))).absent𝑛1superscript𝑒1𝜋1𝑜1\displaystyle=n\left(1-e^{-\frac{1}{\pi}(1+o(1))}\right).= italic_n ( 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_π end_ARG ( 1 + italic_o ( 1 ) ) end_POSTSUPERSCRIPT ) . (32)

Evaluating the lower bound in 31 analogously we have

k=02fn+11kek22n=(1+o(1))n(1e1π(1+o(1)))superscriptsubscript𝑘02subscript𝑓𝑛11𝑘superscript𝑒superscript𝑘22𝑛1𝑜1𝑛1superscript𝑒1𝜋1𝑜1\displaystyle\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}ke^{-\frac{k^{2}}{2n}}=(1+o% (1))n\left(1-e^{-\frac{1}{\pi}(1+o(1))}\right)∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT = ( 1 + italic_o ( 1 ) ) italic_n ( 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_π end_ARG ( 1 + italic_o ( 1 ) ) end_POSTSUPERSCRIPT ) (33)

and therefore from 30,

k=02fn+11kpn,k2fn+1superscriptsubscript𝑘02subscript𝑓𝑛11𝑘subscript𝑝𝑛𝑘2subscript𝑓𝑛1\displaystyle\frac{\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}kp_{n,k}}{2f_{n+1}}divide start_ARG ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_ARG =(1+o(1))(1e1π(1+o(1)))n2nπ2(n+1)πabsent1𝑜11superscript𝑒1𝜋1𝑜1𝑛2𝑛𝜋2𝑛1𝜋\displaystyle=(1+o(1))\frac{\left(1-e^{-\frac{1}{\pi}(1+o(1))}\right)n\sqrt{% \frac{2}{n\pi}}}{\sqrt{\frac{2(n+1)}{\pi}}}= ( 1 + italic_o ( 1 ) ) divide start_ARG ( 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_π end_ARG ( 1 + italic_o ( 1 ) ) end_POSTSUPERSCRIPT ) italic_n square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG end_ARG start_ARG square-root start_ARG divide start_ARG 2 ( italic_n + 1 ) end_ARG start_ARG italic_π end_ARG end_ARG end_ARG
=(1+o(1))(1e1π(1+o(1)))absent1𝑜11superscript𝑒1𝜋1𝑜1\displaystyle=(1+o(1))\left(1-e^{-\frac{1}{\pi}(1+o(1))}\right)= ( 1 + italic_o ( 1 ) ) ( 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_π end_ARG ( 1 + italic_o ( 1 ) ) end_POSTSUPERSCRIPT ) (34)

and therefore, from 34, we have that

limnk=02fn+11kpn,k2fn+1=1e1π.subscript𝑛superscriptsubscript𝑘02subscript𝑓𝑛11𝑘subscript𝑝𝑛𝑘2subscript𝑓𝑛11superscript𝑒1𝜋\displaystyle\lim_{n\to\infty}\frac{\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}kp_{% n,k}}{2f_{n+1}}=1-e^{-\frac{1}{\pi}}.roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT divide start_ARG ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_k italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_ARG = 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_π end_ARG end_POSTSUPERSCRIPT . (35)

We now address the second term in 27 by invoking Lemma 5 and noting that

k=02fn+11pn,ksuperscriptsubscript𝑘02subscript𝑓𝑛11subscript𝑝𝑛𝑘\displaystyle\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}p_{n,k}∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT =(1+o(1))2nπk=02fn+11ek22nabsent1𝑜12𝑛𝜋superscriptsubscript𝑘02subscript𝑓𝑛11superscript𝑒superscript𝑘22𝑛\displaystyle=(1+o(1))\sqrt{\frac{2}{n\pi}}\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor% -1}e^{-\frac{k^{2}}{2n}}= ( 1 + italic_o ( 1 ) ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT
=(a)(1+o(1))02fn+1ex22n𝑑xsuperscript𝑎absent1𝑜1superscriptsubscript02subscript𝑓𝑛1superscript𝑒superscript𝑥22𝑛differential-d𝑥\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}(1+o(1))\int_{0}^{2f_{n+1}}e^{-% \frac{x^{2}}{2n}}dxstart_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( italic_a ) end_ARG end_RELOP ( 1 + italic_o ( 1 ) ) ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG end_POSTSUPERSCRIPT italic_d italic_x
=(1+o(1))2π02πet22𝑑tabsent1𝑜12𝜋superscriptsubscript02𝜋superscript𝑒superscript𝑡22differential-d𝑡\displaystyle=(1+o(1))\sqrt{\frac{2}{\pi}}\int_{0}^{\sqrt{\frac{2}{\pi}}}e^{-% \frac{t^{2}}{2}}dt= ( 1 + italic_o ( 1 ) ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_d italic_t
=12π2π2πet22𝑑tabsent12𝜋superscriptsubscript2𝜋2𝜋superscript𝑒superscript𝑡22differential-d𝑡\displaystyle=\frac{1}{\sqrt{2\pi}}\int_{-\sqrt{\frac{2}{\pi}}}^{\sqrt{\frac{2% }{\pi}}}e^{-\frac{t^{2}}{2}}dt= divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG end_ARG ∫ start_POSTSUBSCRIPT - square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_d italic_t
=(2πX2π)absent2𝜋𝑋2𝜋\displaystyle=\mathbb{P}\left(-\sqrt{\frac{2}{\pi}}\leq X\leq\sqrt{\frac{2}{% \pi}}\right)= blackboard_P ( - square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG ≤ italic_X ≤ square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG ) (36)

where in 36 X𝒩(0,1)similar-to𝑋𝒩01X\sim\mathcal{N}(0,1)italic_X ∼ caligraphic_N ( 0 , 1 ). Then,

1k=02fn+11pn,k1superscriptsubscript𝑘02subscript𝑓𝑛11subscript𝑝𝑛𝑘\displaystyle 1-\sum_{k=0}^{\lfloor 2f_{n+1}\rfloor-1}p_{n,k}1 - ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⌊ 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ⌋ - 1 end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT 1(2πX2π)absent12𝜋𝑋2𝜋\displaystyle\to 1-\mathbb{P}\left(-\sqrt{\frac{2}{\pi}}\leq X\leq\sqrt{\frac{% 2}{\pi}}\right)→ 1 - blackboard_P ( - square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG ≤ italic_X ≤ square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG )
=2Q(2π).absent2𝑄2𝜋\displaystyle=2Q\left(\sqrt{\frac{2}{\pi}}\right).= 2 italic_Q ( square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG ) . (37)

Substituting 35 and 37 in 27 yields that

1γn=𝔼[min{c(ϵn),2fn+1}]2fn+11e1π+2Q(2π)1subscript𝛾𝑛𝔼𝑐superscriptitalic-ϵ𝑛2subscript𝑓𝑛12subscript𝑓𝑛11superscript𝑒1𝜋2𝑄2𝜋\displaystyle\frac{1}{\gamma_{n}}=\frac{\operatorname{\mathbb{E}}[\min\{c(% \epsilon^{n}),2f_{n+1}\}]}{2f_{n+1}}\to 1-e^{-\frac{1}{\pi}}+2Q\left(\sqrt{% \frac{2}{\pi}}\right)divide start_ARG 1 end_ARG start_ARG italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG = divide start_ARG blackboard_E [ roman_min { italic_c ( italic_ϵ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ) , 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT } ] end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_ARG → 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_π end_ARG end_POSTSUPERSCRIPT + 2 italic_Q ( square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG ) (38)

and therefore

γn11e1π+2Q(2π).subscript𝛾𝑛11superscript𝑒1𝜋2𝑄2𝜋\displaystyle\gamma_{n}\to\frac{1}{1-e^{-\frac{1}{\pi}}+2Q\left(\sqrt{\frac{2}% {\pi}}\right)}.italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → divide start_ARG 1 end_ARG start_ARG 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_π end_ARG end_POSTSUPERSCRIPT + 2 italic_Q ( square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG ) end_ARG . (39)

B.1 Proof of Lemma 5

We first note the following.

Proposition 1 (Feller , Chapter 3, Exercise 11).
pn,k=12nk(nkn/2).subscript𝑝𝑛𝑘1superscript2𝑛𝑘binomial𝑛𝑘𝑛2p_{n,k}=\frac{1}{2^{n-k}}\binom{n-k}{n/2}.italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT end_ARG ( FRACOP start_ARG italic_n - italic_k end_ARG start_ARG italic_n / 2 end_ARG ) .

Therefore,

pn,k2n2k=(nkn/2)=(nk)!n2!(n2k)!.subscript𝑝𝑛𝑘superscript2𝑛superscript2𝑘binomial𝑛𝑘𝑛2𝑛𝑘𝑛2𝑛2𝑘\frac{p_{n,k}2^{n}}{2^{k}}=\binom{n-k}{n/2}=\frac{(n-k)!}{\frac{n}{2}!\left(% \frac{n}{2}-k\right)!}.divide start_ARG italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG = ( FRACOP start_ARG italic_n - italic_k end_ARG start_ARG italic_n / 2 end_ARG ) = divide start_ARG ( italic_n - italic_k ) ! end_ARG start_ARG divide start_ARG italic_n end_ARG start_ARG 2 end_ARG ! ( divide start_ARG italic_n end_ARG start_ARG 2 end_ARG - italic_k ) ! end_ARG .

We now use the Stirling approximation:

2πm(me)me112m+1m!2πm(me)me112m.2𝜋𝑚superscript𝑚𝑒𝑚superscript𝑒112𝑚1𝑚2𝜋𝑚superscript𝑚𝑒𝑚superscript𝑒112𝑚\displaystyle\sqrt{2\pi m}\left(\frac{m}{e}\right)^{m}e^{\frac{1}{12m+1}}\leq m% !\leq\sqrt{2\pi m}\left(\frac{m}{e}\right)^{m}e^{\frac{1}{12m}}.square-root start_ARG 2 italic_π italic_m end_ARG ( divide start_ARG italic_m end_ARG start_ARG italic_e end_ARG ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 12 italic_m + 1 end_ARG end_POSTSUPERSCRIPT ≤ italic_m ! ≤ square-root start_ARG 2 italic_π italic_m end_ARG ( divide start_ARG italic_m end_ARG start_ARG italic_e end_ARG ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 12 italic_m end_ARG end_POSTSUPERSCRIPT . (40)

Using 40 we have

pn,k2n2ksubscript𝑝𝑛𝑘superscript2𝑛superscript2𝑘\displaystyle\frac{p_{n,k}2^{n}}{2^{k}}divide start_ARG italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_ARG start_ARG 2 start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG 2π(nk)2π(n/2)2π(n/2k)(nk)nk(n/2)n/2(n/2k)n/2kabsent2𝜋𝑛𝑘2𝜋𝑛22𝜋𝑛2𝑘superscript𝑛𝑘𝑛𝑘superscript𝑛2𝑛2superscript𝑛2𝑘𝑛2𝑘\displaystyle\leq\frac{\sqrt{2\pi(n-k)}}{\sqrt{2\pi(n/2)}\sqrt{2\pi(n/2-k)}}% \cdot\frac{(n-k)^{n-k}}{(n/2)^{n/2}(n/2-k)^{n/2-k}}≤ divide start_ARG square-root start_ARG 2 italic_π ( italic_n - italic_k ) end_ARG end_ARG start_ARG square-root start_ARG 2 italic_π ( italic_n / 2 ) end_ARG square-root start_ARG 2 italic_π ( italic_n / 2 - italic_k ) end_ARG end_ARG ⋅ divide start_ARG ( italic_n - italic_k ) start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT end_ARG start_ARG ( italic_n / 2 ) start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT ( italic_n / 2 - italic_k ) start_POSTSUPERSCRIPT italic_n / 2 - italic_k end_POSTSUPERSCRIPT end_ARG
exp(112(nk)16n+116n12k+1)absent112𝑛𝑘16𝑛116𝑛12𝑘1\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\cdot\exp{\left(\frac{1}{12(n% -k)}-\frac{1}{6n+1}-\frac{1}{6n-12k+1}\right)}⋅ roman_exp ( divide start_ARG 1 end_ARG start_ARG 12 ( italic_n - italic_k ) end_ARG - divide start_ARG 1 end_ARG start_ARG 6 italic_n + 1 end_ARG - divide start_ARG 1 end_ARG start_ARG 6 italic_n - 12 italic_k + 1 end_ARG )
=2nπnkn2k(nk)nk2nknn/2(n2k)n/2kexp(112nk)absent2𝑛𝜋𝑛𝑘𝑛2𝑘superscript𝑛𝑘𝑛𝑘superscript2𝑛𝑘superscript𝑛𝑛2superscript𝑛2𝑘𝑛2𝑘112𝑛𝑘\displaystyle=\sqrt{\frac{2}{n\pi}}\sqrt{\frac{n-k}{n-2k}}\frac{(n-k)^{n-k}2^{% n-k}}{n^{n/2}(n-2k)^{n/2-k}}\cdot\exp{\left(\frac{1}{12n-k}\right)}= square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG square-root start_ARG divide start_ARG italic_n - italic_k end_ARG start_ARG italic_n - 2 italic_k end_ARG end_ARG divide start_ARG ( italic_n - italic_k ) start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT 2 start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT ( italic_n - 2 italic_k ) start_POSTSUPERSCRIPT italic_n / 2 - italic_k end_POSTSUPERSCRIPT end_ARG ⋅ roman_exp ( divide start_ARG 1 end_ARG start_ARG 12 italic_n - italic_k end_ARG )
(a)2nπ2nk(nk)nknn/2(n2k)n/2kexp(112nk)1C/n12C/nsuperscript𝑎absent2𝑛𝜋superscript2𝑛𝑘superscript𝑛𝑘𝑛𝑘superscript𝑛𝑛2superscript𝑛2𝑘𝑛2𝑘112𝑛𝑘1𝐶𝑛12𝐶𝑛\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\sqrt{\frac{2}{n\pi}}\cdot 2^% {n-k}\cdot\frac{(n-k)^{n-k}}{n^{n/2}(n-2k)^{n/2-k}}\exp\left(\frac{1}{12n-k}% \right)\cdot\sqrt{\frac{1-C/\sqrt{n}}{1-2C/\sqrt{n}}}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG ( italic_a ) end_ARG end_RELOP square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT ⋅ divide start_ARG ( italic_n - italic_k ) start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT italic_n / 2 end_POSTSUPERSCRIPT ( italic_n - 2 italic_k ) start_POSTSUPERSCRIPT italic_n / 2 - italic_k end_POSTSUPERSCRIPT end_ARG roman_exp ( divide start_ARG 1 end_ARG start_ARG 12 italic_n - italic_k end_ARG ) ⋅ square-root start_ARG divide start_ARG 1 - italic_C / square-root start_ARG italic_n end_ARG end_ARG start_ARG 1 - 2 italic_C / square-root start_ARG italic_n end_ARG end_ARG end_ARG
=2nπ2nk(1k/n)nk(12k/n)n/2k=:Texp(112nk)1C/n12C/n.absent2𝑛𝜋superscript2𝑛𝑘subscriptsuperscript1𝑘𝑛𝑛𝑘superscript12𝑘𝑛𝑛2𝑘:absent𝑇112𝑛𝑘1𝐶𝑛12𝐶𝑛\displaystyle=\sqrt{\frac{2}{n\pi}}\cdot 2^{n-k}\underbrace{\frac{(1-k/n)^{n-k% }}{(1-2k/n)^{n/2-k}}}_{=\mathrel{\mathop{:}}T}\exp\left(\frac{1}{12n-k}\right)% \cdot\sqrt{\frac{1-C/\sqrt{n}}{1-2C/\sqrt{n}}}.= square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG ⋅ 2 start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT under⏟ start_ARG divide start_ARG ( 1 - italic_k / italic_n ) start_POSTSUPERSCRIPT italic_n - italic_k end_POSTSUPERSCRIPT end_ARG start_ARG ( 1 - 2 italic_k / italic_n ) start_POSTSUPERSCRIPT italic_n / 2 - italic_k end_POSTSUPERSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT = : italic_T end_POSTSUBSCRIPT roman_exp ( divide start_ARG 1 end_ARG start_ARG 12 italic_n - italic_k end_ARG ) ⋅ square-root start_ARG divide start_ARG 1 - italic_C / square-root start_ARG italic_n end_ARG end_ARG start_ARG 1 - 2 italic_C / square-root start_ARG italic_n end_ARG end_ARG end_ARG . (41)

We now analyze the term T𝑇Titalic_T in 41 in more detail.

Proposition 2.
k22n2ck3n2lnTk22n2+ck3n2superscript𝑘22superscript𝑛2𝑐superscript𝑘3superscript𝑛2𝑇superscript𝑘22superscript𝑛2𝑐superscript𝑘3superscript𝑛2\displaystyle-\frac{k^{2}}{2n^{2}}-c\frac{k^{3}}{n^{2}}\leq\ln T\leq-\frac{k^{% 2}}{2n^{2}}+c\frac{k^{3}}{n^{2}}- divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - italic_c divide start_ARG italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ roman_ln italic_T ≤ - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_c divide start_ARG italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (42)

where c15𝑐15c\leq 15italic_c ≤ 15.

Proof.

By Taylor theorem, we have

ln(1x)=xx22x33(1μ)2 for μ(0,x).1𝑥𝑥superscript𝑥22superscript𝑥33superscript1𝜇2 for 𝜇0𝑥\displaystyle\ln(1-x)=-x-\frac{x^{2}}{2}-\frac{x^{3}}{3(1-\mu)^{2}}\text{ for % }\mu\in(0,x).roman_ln ( 1 - italic_x ) = - italic_x - divide start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - divide start_ARG italic_x start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG 3 ( 1 - italic_μ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG for italic_μ ∈ ( 0 , italic_x ) . (43)

Therefore, for kCn𝑘𝐶𝑛k\leq C\sqrt{n}italic_k ≤ italic_C square-root start_ARG italic_n end_ARG and n16C2𝑛16superscript𝐶2n\geq 16C^{2}italic_n ≥ 16 italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT we have

ln(1kn)1𝑘𝑛\displaystyle\ln\left(1-\frac{k}{n}\right)roman_ln ( 1 - divide start_ARG italic_k end_ARG start_ARG italic_n end_ARG ) =knk22n2α1k3n3absent𝑘𝑛superscript𝑘22superscript𝑛2subscript𝛼1superscript𝑘3superscript𝑛3\displaystyle=-\frac{k}{n}-\frac{k^{2}}{2n^{2}}-\frac{\alpha_{1}k^{3}}{n^{3}}= - divide start_ARG italic_k end_ARG start_ARG italic_n end_ARG - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG (44)
ln(12kn)12𝑘𝑛\displaystyle\ln\left(1-\frac{2k}{n}\right)roman_ln ( 1 - divide start_ARG 2 italic_k end_ARG start_ARG italic_n end_ARG ) =2kn2k22n2α2k3n3absent2𝑘𝑛2superscript𝑘22superscript𝑛2subscript𝛼2superscript𝑘3superscript𝑛3\displaystyle=-\frac{2k}{n}-\frac{2k^{2}}{2n^{2}}-\frac{\alpha_{2}k^{3}}{n^{3}}= - divide start_ARG 2 italic_k end_ARG start_ARG italic_n end_ARG - divide start_ARG 2 italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG (45)

for α1,α2(13,83]subscript𝛼1subscript𝛼21383\alpha_{1},\alpha_{2}\in\Big{(}\frac{1}{3},\frac{8}{3}\Big{]}italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 8 end_ARG start_ARG 3 end_ARG ]. Evaluating lnT𝑇\ln Troman_ln italic_T we have

lnT𝑇\displaystyle\ln Troman_ln italic_T =(nk)ln(1kn)(nk2)ln(12kn)absent𝑛𝑘1𝑘𝑛𝑛𝑘212𝑘𝑛\displaystyle=(n-k)\ln\left(1-\frac{k}{n}\right)-\left(n-\frac{k}{2}\right)\ln% \left(1-\frac{2k}{n}\right)= ( italic_n - italic_k ) roman_ln ( 1 - divide start_ARG italic_k end_ARG start_ARG italic_n end_ARG ) - ( italic_n - divide start_ARG italic_k end_ARG start_ARG 2 end_ARG ) roman_ln ( 1 - divide start_ARG 2 italic_k end_ARG start_ARG italic_n end_ARG )
=(a)(nk)(knk22n2α1k3n3)(nk2)(2kn2k22n2α2k3n3)superscript𝑎absent𝑛𝑘𝑘𝑛superscript𝑘22superscript𝑛2subscript𝛼1superscript𝑘3superscript𝑛3𝑛𝑘22𝑘𝑛2superscript𝑘22superscript𝑛2subscript𝛼2superscript𝑘3superscript𝑛3\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}(n-k)\left(-\frac{k}{n}-\frac{k^% {2}}{2n^{2}}-\frac{\alpha_{1}k^{3}}{n^{3}}\right)-\left(n-\frac{k}{2}\right)% \left(-\frac{2k}{n}-\frac{2k^{2}}{2n^{2}}-\frac{\alpha_{2}k^{3}}{n^{3}}\right)start_RELOP SUPERSCRIPTOP start_ARG = end_ARG start_ARG ( italic_a ) end_ARG end_RELOP ( italic_n - italic_k ) ( - divide start_ARG italic_k end_ARG start_ARG italic_n end_ARG - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ) - ( italic_n - divide start_ARG italic_k end_ARG start_ARG 2 end_ARG ) ( - divide start_ARG 2 italic_k end_ARG start_ARG italic_n end_ARG - divide start_ARG 2 italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG )
=(k22n+k2n+k2n2k2n)+(α1+12+α222)k3n2+(α1α2)k4n3absentsuperscript𝑘22𝑛superscript𝑘2𝑛superscript𝑘2𝑛2superscript𝑘2𝑛subscript𝛼112subscript𝛼222superscript𝑘3superscript𝑛2subscript𝛼1subscript𝛼2superscript𝑘4superscript𝑛3\displaystyle=\left(-\frac{k^{2}}{2n}+\frac{k^{2}}{n}+\frac{k^{2}}{n}-\frac{2k% ^{2}}{n}\right)+\left(-\alpha_{1}+\frac{1}{2}+\frac{\alpha_{2}}{2}-2\right)% \frac{k^{3}}{n^{2}}+\left(\alpha_{1}-\alpha_{2}\right)\frac{k^{4}}{n^{3}}= ( - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG + divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG + divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG - divide start_ARG 2 italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n end_ARG ) + ( - italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG + divide start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG - 2 ) divide start_ARG italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) divide start_ARG italic_k start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG
=k22n2+c1k3n2+c2k4n3absentsuperscript𝑘22superscript𝑛2subscript𝑐1superscript𝑘3superscript𝑛2subscript𝑐2superscript𝑘4superscript𝑛3\displaystyle=-\frac{k^{2}}{2n^{2}}+c_{1}\frac{k^{3}}{n^{2}}+c_{2}\frac{k^{4}}% {n^{3}}= - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG italic_k start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG (46)

where (a)𝑎(a)( italic_a ) follows by substituting 44 and 45. Now, since k4n3k3n2superscript𝑘4superscript𝑛3superscript𝑘3superscript𝑛2\frac{k^{4}}{n^{3}}\leq\frac{k^{3}}{n^{2}}divide start_ARG italic_k start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG we have

lnTk22n2+(|c1|+|c2|)k3n2𝑇superscript𝑘22superscript𝑛2subscript𝑐1subscript𝑐2superscript𝑘3superscript𝑛2\displaystyle\ln T\leq-\frac{k^{2}}{2n^{2}}+(|c_{1}|+|c_{2}|)\frac{k^{3}}{n^{2}}roman_ln italic_T ≤ - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + ( | italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | + | italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ) divide start_ARG italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (47)

and on the other hand for the same reason

lnTk22n(|c1|+|c2|)k3n2𝑇superscript𝑘22𝑛subscript𝑐1subscript𝑐2superscript𝑘3superscript𝑛2\displaystyle\ln T\geq-\frac{k^{2}}{2n}-(|c_{1}|+|c_{2}|)\frac{k^{3}}{n^{2}}roman_ln italic_T ≥ - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG - ( | italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | + | italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ) divide start_ARG italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (48)

The proposition follows by noticing |c1|+|c2|15subscript𝑐1subscript𝑐215|c_{1}|+|c_{2}|\leq 15| italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | + | italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ≤ 15 by using that α1,α2(13,83]subscript𝛼1subscript𝛼21383\alpha_{1},\alpha_{2}\in\left(\frac{1}{3},\frac{8}{3}\right]italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ ( divide start_ARG 1 end_ARG start_ARG 3 end_ARG , divide start_ARG 8 end_ARG start_ARG 3 end_ARG ]. ∎

Using Proposition 2 in 41 we have the upper bound

pn,ksubscript𝑝𝑛𝑘\displaystyle p_{n,k}italic_p start_POSTSUBSCRIPT italic_n , italic_k end_POSTSUBSCRIPT 2nπexp(k22n+15k3n2)exp(112nk)1C/n12C/nabsent2𝑛𝜋superscript𝑘22𝑛15superscript𝑘3superscript𝑛2112𝑛𝑘1𝐶𝑛12𝐶𝑛\displaystyle\leq\sqrt{\frac{2}{n\pi}}\cdot\exp\left(-\frac{k^{2}}{2n}+\frac{1% 5k^{3}}{n^{2}}\right)\cdot\exp\left(\frac{1}{12n-k}\right)\cdot\sqrt{\frac{1-C% /\sqrt{n}}{1-2C/\sqrt{n}}}≤ square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG ⋅ roman_exp ( - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG + divide start_ARG 15 italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ⋅ roman_exp ( divide start_ARG 1 end_ARG start_ARG 12 italic_n - italic_k end_ARG ) ⋅ square-root start_ARG divide start_ARG 1 - italic_C / square-root start_ARG italic_n end_ARG end_ARG start_ARG 1 - 2 italic_C / square-root start_ARG italic_n end_ARG end_ARG end_ARG
(a)2nπexp(k22n)exp(16k3n2)1C/n12C/nsuperscript𝑎absent2𝑛𝜋superscript𝑘22𝑛16superscript𝑘3superscript𝑛21𝐶𝑛12𝐶𝑛\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\sqrt{\frac{2}{n\pi}}\cdot% \exp\left(-\frac{k^{2}}{2n}\right)\cdot\exp\left(\frac{16k^{3}}{n^{2}}\right)% \cdot\sqrt{\frac{1-C/\sqrt{n}}{1-2C/\sqrt{n}}}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG ( italic_a ) end_ARG end_RELOP square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG ⋅ roman_exp ( - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG ) ⋅ roman_exp ( divide start_ARG 16 italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) ⋅ square-root start_ARG divide start_ARG 1 - italic_C / square-root start_ARG italic_n end_ARG end_ARG start_ARG 1 - 2 italic_C / square-root start_ARG italic_n end_ARG end_ARG end_ARG
(b)2nπexp(k22n)exp(16C3n)1C/n12C/nsuperscript𝑏absent2𝑛𝜋superscript𝑘22𝑛16superscript𝐶3𝑛1𝐶𝑛12𝐶𝑛\displaystyle\stackrel{{\scriptstyle(b)}}{{\leq}}\sqrt{\frac{2}{n\pi}}\cdot% \exp\left(-\frac{k^{2}}{2n}\right)\cdot\exp\left(\frac{16C^{3}}{\sqrt{n}}% \right)\cdot\sqrt{\frac{1-C/\sqrt{n}}{1-2C/\sqrt{n}}}start_RELOP SUPERSCRIPTOP start_ARG ≤ end_ARG start_ARG ( italic_b ) end_ARG end_RELOP square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_n italic_π end_ARG end_ARG ⋅ roman_exp ( - divide start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_n end_ARG ) ⋅ roman_exp ( divide start_ARG 16 italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_n end_ARG end_ARG ) ⋅ square-root start_ARG divide start_ARG 1 - italic_C / square-root start_ARG italic_n end_ARG end_ARG start_ARG 1 - 2 italic_C / square-root start_ARG italic_n end_ARG end_ARG end_ARG (49)

where (a)𝑎(a)( italic_a ) uses 112nk+15k3n216k3n2112𝑛𝑘15superscript𝑘3superscript𝑛216superscript𝑘3superscript𝑛2\frac{1}{12n-k}+\frac{15k^{3}}{n^{2}}\leq\frac{16k^{3}}{n^{2}}divide start_ARG 1 end_ARG start_ARG 12 italic_n - italic_k end_ARG + divide start_ARG 15 italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ divide start_ARG 16 italic_k start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG start_ARG italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG, and (b)𝑏(b)( italic_b ) uses the fact that kCn𝑘𝐶𝑛k\leq C\sqrt{n}italic_k ≤ italic_C square-root start_ARG italic_n end_ARG, which yields the upper bound in 28. The lower bound follows analogously by the Stirling approximation 40 and Proposition 2.

References

  • Fagin et al. [2001] Ronald Fagin, Amnon Lotem, and Moni Naor. Optimal aggregation algorithms for middleware. In Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 102–113, 2001.
  • Roughgarden [2021] Tim Roughgarden. Beyond the worst-case analysis of algorithms. Cambridge University Press, 2021.
  • Blackwell [1956] D Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 6(1):1–8, 1956.
  • Hannan [1957] James Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3:97–139, 1957.
  • Cesa-Bianchi and Lugosi [2006] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
  • Slivkins [2019] Aleksandrs Slivkins. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12(1-2):1–286, 2019.
  • Cover [1966] Thomas M. Cover. Behavior of sequential predictors of binary sequences. In Transactions of the Fourth Prague Conference on Information Theory, 1966.
  • Huang et al. [2016] Ruitong Huang, Tor Lattimore, András György, and Csaba Szepesvári. Following the leader and fast rates in linear prediction: Curved constraint sets and other regularities. Advances in Neural Information Processing Systems, 29, 2016.
  • Feder et al. [1992] Meir Feder, Neri Merhav, and Michael Gutman. Universal prediction of individual sequences. IEEE transactions on Information Theory, 38(4):1258–1270, 1992.
  • Agarwal et al. [2017] Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, and Robert E Schapire. Corralling a band of bandit algorithms. In Conference on Learning Theory, pages 12–38. PMLR, 2017.
  • Pacchiano et al. [2020] Aldo Pacchiano, My Phan, Yasin Abbasi Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, and Csaba Szepesvari. Model selection in contextual stochastic bandit problems. Advances in Neural Information Processing Systems, 33:10328–10337, 2020.
  • Dann et al. [2023] Christoph Dann, Chen-Yu Wei, and Julian Zimmert. Best of both worlds policy optimization. arXiv preprint arXiv:2302.09408, 2023.
  • Auer et al. [2002a] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002a.
  • De Rooij et al. [2014] Steven De Rooij, Tim Van Erven, Peter D Grünwald, and Wouter M Koolen. Follow the leader if you can, hedge if you must. The Journal of Machine Learning Research, 15(1):1281–1316, 2014.
  • Orabona and Pál [2015] Francesco Orabona and Dávid Pál. Scale-free algorithms for online linear optimization. In International Conference on Algorithmic Learning Theory, pages 287–301. Springer, 2015.
  • Mourtada and Gaïffas [2019] Jaouad Mourtada and Stéphane Gaïffas. On the optimality of the hedge algorithm in the stochastic regime. Journal of Machine Learning Research, 20:1–28, 2019.
  • Bilodeau et al. [2023] Blair Bilodeau, Jeffrey Negrea, and Daniel M Roy. Relaxing the iid assumption: Adaptively minimax optimal regret via root-entropic regularization. The Annals of Statistics, 51(4):1850–1876, 2023.
  • Bubeck and Slivkins [2012] Sébastien Bubeck and Aleksandrs Slivkins. The best of both worlds: Stochastic and adversarial bandits. In Conference on Learning Theory, pages 42–1. JMLR Workshop and Conference Proceedings, 2012.
  • Zimmert and Seldin [2019] Julian Zimmert and Yevgeny Seldin. An optimal algorithm for stochastic and adversarial bandits. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 467–475. PMLR, 2019.
  • Lykouris et al. [2018] Thodoris Lykouris, Vahab Mirrokni, and Renato Paes Leme. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 114–122, 2018.
  • Kotłowski [2018] Wojciech Kotłowski. On minimaxity of follow the leader strategy in the stochastic setting. Theoretical Computer Science, 742:50–65, 2018.
  • Karlin et al. [1994] Anna R. Karlin, Mark S. Manasse, Lyle A. McGeoch, and Susan Owicki. Competitive randomized algorithms for nonuniform problems. Algorithmica, 11(6):542–571, 1994.
  • Borodin and El-Yaniv [2005] Allan Borodin and Ran El-Yaniv. Online computation and competitive analysis. cambridge university press, 2005.
  • Cesa-Bianchi et al. [1997] Nicolo Cesa-Bianchi, Yoav Freund, David Haussler, David P Helmbold, Robert E Schapire, and Manfred K Warmuth. How to use expert advice. Journal of the ACM (JACM), 44(3):427–485, 1997.
  • Wei and Luo [2018] Chen-Yu Wei and Haipeng Luo. More adaptive algorithms for adversarial bandits. In Conference On Learning Theory, pages 1263–1291. PMLR, 2018.
  • Bubeck et al. [2019] Sébastien Bubeck, Yuanzhi Li, Haipeng Luo, and Chen-Yu Wei. Improved path-length regret bounds for bandits. In Conference On Learning Theory, pages 508–528. PMLR, 2019.
  • Bhuyan et al. [2023] Neelkamal Bhuyan, Debankur Mukherjee, and Adam Wierman. Best of both worlds: Stochastic and adversarial convex function chasing. arXiv preprint arXiv:2311.00181, 2023.
  • Sabag et al. [2021] Oron Sabag, Gautam Goel, Sahin Lale, and Babak Hassibi. Regret-optimal controller for the full-information problem. In 2021 American Control Conference (ACC), pages 4777–4782. IEEE, 2021.
  • Goel et al. [2023] Gautam Goel, Naman Agarwal, Karan Singh, and Elad Hazan. Best of both worlds in online control: Competitive ratio and policy regret. In Learning for Dynamics and Control Conference, pages 1345–1356. PMLR, 2023.
  • Rakhlin et al. [2011] Alexander Rakhlin, Karthik Sridharan, and Ambuj Tewari. Online learning: Stochastic, constrained, and smoothed adversaries. Advances in neural information processing systems, 24, 2011.
  • Haghtalab et al. [2022] Nika Haghtalab, Tim Roughgarden, and Abhishek Shetty. Smoothed analysis with adaptive adversaries. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 942–953. IEEE, 2022.
  • Block et al. [2022] Adam Block, Yuval Dagan, Noah Golowich, and Alexander Rakhlin. Smoothed online learning is as easy as statistical learning. In Conference on Learning Theory, pages 1716–1786. PMLR, 2022.
  • Bhatt et al. [2023] Alankrita Bhatt, Nika Haghtalab, and Abhishek Shetty. Smoothed analysis of sequential probability assignment. Neural Information Processing Systems, 2023.
  • Amir et al. [2020] Idan Amir, Idan Attias, Tomer Koren, Yishay Mansour, and Roi Livni. Prediction with corrupted expert advice. Advances in Neural Information Processing Systems, 33:14315–14325, 2020.
  • Auer et al. [2002b] Peter Auer, Nicolo Cesa-Bianchi, and Claudio Gentile. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64(1):48–75, 2002b.
  • Cesa-Bianchi et al. [2005] Nicolo Cesa-Bianchi, Gábor Lugosi, and Gilles Stoltz. Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51(6):2152–2162, 2005.
  • Hazan and Kale [2010] Elad Hazan and Satyen Kale. Extracting certainty from uncertainty: Regret bounded by variation in costs. Machine learning, 80:165–188, 2010.
  • Koolen et al. [2014] Wouter M Koolen, Tim Van Erven, and Peter Grünwald. Learning the learning rate for prediction with expert advice. Advances in neural information processing systems, 27, 2014.
  • Van Erven et al. [2015] Tim Van Erven, Peter Grunwald, Nishant A Mehta, Mark Reid, Robert Williamson, et al. Fast rates in statistical and online learning. 2015.
  • Van Erven and Koolen [2016] Tim Van Erven and Wouter M Koolen. Metagrad: Multiple learning rates in online learning. Advances in Neural Information Processing Systems, 29, 2016.
  • Gaillard et al. [2014] Pierre Gaillard, Gilles Stoltz, and Tim Van Erven. A second-order bound with excess losses. In Conference on Learning Theory, pages 176–196. PMLR, 2014.
  • Bamas et al. [2020] Etienne Bamas, Andreas Maggiori, and Ola Svensson. The primal-dual method for learning augmented algorithms. Advances in Neural Information Processing Systems, 33:20083–20094, 2020.
  • Dinitz et al. [2022] Michael Dinitz, Sungjin Im, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Algorithms with prediction portfolios. Advances in neural information processing systems, 35:20273–20286, 2022.
  • Anand et al. [2022] Keerti Anand, Rong Ge, Amit Kumar, and Debmalya Panigrahi. Online algorithms with multiple predictions. In International Conference on Machine Learning, pages 582–598. PMLR, 2022.
  • Kalai and Vempala [2005] Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
  • Cesa-Bianchi et al. [2007] Nicolo Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz. Improved second-order bounds for prediction with expert advice. Machine Learning, 66:321–352, 2007.
  • [47] William Feller. An introduction to probability theory and its applications, Volume 1, Third Edition. John Wiley & Sons, New York.