\pdfcolInitStack

tcb@breakable

Robust Phase Retrieval by Alternating Minimization

Seonho Kim,  and Kiryung Lee,  Seonho Kim and Kiryung Lee are with the Department of ECE, The Ohio State University, Columbus, OH 43210 USA (e-mail: [email protected]). This work was supported in part by NSF CAREER Award CCF-1943201. A preliminary version of this work will be presented at the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) [1]
Abstract

We consider a least absolute deviation (LAD) approach to the robust phase retrieval problem that aims to recover a signal from its absolute measurements corrupted with sparse noise. To solve the resulting non-convex optimization problem, we propose a robust alternating minimization (Robust-AM) derived as an unconstrained Gauss-Newton method. To solve the inner optimization arising in each step of Robust-AM, we adopt two computationally efficient methods for linear programs. We provide a non-asymptotic convergence analysis of these practical algorithms for Robust-AM under the standard Gaussian measurement assumption. These algorithms, when suitably initialized, are guaranteed to converge linearly to the ground truth at an order-optimal sample complexity with high probability while the support of sparse noise is arbitrarily fixed and the sparsity level is no larger than 1/4141/41 / 4. Additionally, through comprehensive numerical experiments on synthetic and image datasets, we show that Robust-AM outperforms existing methods for robust phase retrieval offering comparable theoretical performance guarantees.

Index Terms:
phase retrieval, outliers, least absolute deviation, linear program, convex optimization

I Introduction

Phase retrieval refers to the recovery of unknown signals 𝒙dsubscript𝒙superscript𝑑\bm{x}_{\star}\in\mathbb{R}^{d}bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (or dsuperscript𝑑\mathbb{C}^{d}blackboard_C start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT) from the magnitudes of its linear measurements, which are formulated as

bi=|𝒂i,𝒙|,i=1,,m,formulae-sequencesubscript𝑏𝑖subscript𝒂𝑖subscript𝒙𝑖1𝑚b_{i}=|\langle\bm{a}_{i},\bm{x}_{\star}\rangle|,\quad i=1,\ldots,m,italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | , italic_i = 1 , … , italic_m , (1)

where 𝒂1,,𝒂mdsubscript𝒂1subscript𝒂𝑚superscript𝑑\bm{a}_{1},\dots,\bm{a}_{m}\in\mathbb{R}^{d}bold_italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (or dsuperscript𝑑\mathbb{C}^{d}blackboard_C start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT) and are known measurement vectors. Solving the set of nonlinear equations in (1) arises in numerous applications including X-ray crystallography, diffraction and array imaging, and optics (e.g. [2, 3, 4, 5]). We consider the robust phase retrieval from the noisy amplitude measurements in (1) corrupted with sparse noise, i.e.

bi={ξiif iIout|𝒂i,𝒙|if iIinsubscript𝑏𝑖casessubscript𝜉𝑖if 𝑖subscript𝐼outsubscript𝒂𝑖subscript𝒙if 𝑖subscript𝐼inb_{i}=\begin{cases}\xi_{i}&\text{if }i\in I_{\mathrm{out}}\\ |\langle\bm{a}_{i},\bm{x}_{\star}\rangle|&\text{if }i\in I_{\mathrm{in}}\end{cases}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { start_ROW start_CELL italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL if italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | end_CELL start_CELL if italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_CELL end_ROW (2)

where Iout[m]subscript𝐼outdelimited-[]𝑚I_{\mathrm{out}}\subset[m]italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT ⊂ [ italic_m ] and Iin=[m]Ioutsubscript𝐼indelimited-[]𝑚subscript𝐼outI_{\mathrm{in}}=[m]\setminus I_{\mathrm{out}}italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT = [ italic_m ] ∖ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT collect the unknown indices of outliers and inliers respectively, and {ξi}iIoutsubscriptsubscript𝜉𝑖𝑖subscript𝐼out\{\xi_{i}\}_{i\in I_{\mathrm{out}}}{ italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT is an arbitrary sequence in \mathbb{R}blackboard_R. For example, such a scenario arises in phase retrieval imaging applications [6] due to various reasons including detection failures and recording errors.

A suite of methods designed for the plain phase retrieval [7] has been adapted to address the outliers. These methods provide not only empirically successful performances but also theoretical analyses under random measurement models. For instance, anchored regression [8] and PhaseMax [9] formulate phase retrieval given an initial estimate as a linear program. RobustPhaseMax [10] modifies these methods to offer robust estimation by introducing an auxiliary variable to describe the outliers. In another example, Reshaped Wirtinger Flow (RWF) [11] and Amplitude Flow [12] follow a subgradient descent approach for a least squares estimator (LSE). Median-RWF [13] is a variant of these methods tailored to robust phase retrieval. Specifically, Median-RWF uses a truncation type of regularization that identifies and excludes outliers in each iteration by median-based thresholding on the consistency of the current estimate to the measurements. Median-RWF significantly improves the empirical performance of RobustPhaseMax by tolerating a higher fraction of outliers. However, the regularization of Median-RWF involves algorithmic parameters that have been tuned specifically for the Gaussian measurement model. However, it has not been discussed how to generalize the tuning parameters to other measurement models.

A recent work proposed an approach to robust phase retrieval in the classical robust regression framework in statistics [14]. Instead of the least squares, they adopted the least absolute deviation (LAD) [15] to enforce the consistency to the squared amplitude measurements with outliers. The parameter estimation is then cast as a nonconvex optimization problem. They proposed a prox-linear method that updates the estimate iteratively through local linearization of the forward model. This algorithm can be viewed as a variant of the Gauss-Newton method that regularizes the updates with the proximity to the previous iterate. The prox-linear algorithm iteratively refines the estimate through a sequence of quadratic programs for prox-linear problems and provides comparable performance to Median-RWF. Importantly, the Gauss-Newton method does not involve any tuning parameter. However, for large-scale applications such as those in astronomical or medical imaging, further acceleration of this iterative method is desired. They developed the proximal operator graph splitting (POGS) solver for this purpose.

In this paper, we propose an optimization approach to robust phase retrieval that shares strong theoretical guarantees (high tolerance of outlier ratio and no tuning parameters) with the prox-linear algorithm and further improves its computational cost. The objective is achieved by a simple unconstrained Gauss-Newton method for LAD. The resulting optimization is equivalent to an alternating minimization algorithm for LAD, as described in [16], which is solved by a sequence of linear programs. Since this alternating minimization approach is robust in the presence of outliers, we refer to the optimization as Robust-AM. Since this alternating minimization is a robust estimator in the presence of outliers, we refer to the optimization as Robust-AM Our main theoretical result demonstrates that a suitably initialized Robust-AM converges to the ground-truth signal linearly from m=𝒪(d)𝑚𝒪𝑑m=\mathcal{O}(d)italic_m = caligraphic_O ( italic_d ) random amplitude-only measurements including up to 25%percent2525\%25 % outliers. The desired initialization can be obtained by the existing robust spectral estimators [13, 14]. We verified through comprehensive numerical simulations that Robust-AM empirically outperforms the existing methods for robust phase retrieval. Particularly, it can tolerate a higher fraction of outliers and provide exact recovery with fewer observations. Furthermore, due to its unconstrained optimization formulation with the absolute amplitude measurement model, Robust-AM admits a computationally efficient ADMM algorithm, which runs faster than POGS for the prox-linear method. As shown in Figure 1, ADMM for Robust-AM converges faster than POGS for the prox-linear method. In this experiment, the fraction of outliers is set to η:=|Iout|/m=0.3assign𝜂subscript𝐼out𝑚0.3\eta:={|I_{\mathrm{out}}|}/{m}=0.3italic_η := | italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT | / italic_m = 0.3, with outlier entries generated following zero and a Cauchy distribution with median 00 and mean-absolute-deviation 1111. The convergence is measured by the metric dist(𝐱,𝐱):=minα{±1}𝐱α𝐱2assigndist𝐱subscript𝐱subscript𝛼plus-or-minus1subscriptnorm𝐱𝛼subscript𝐱2\mathrm{dist}(\mathbf{x},\mathbf{x}_{\star}):=\min_{\alpha\in\{\pm 1\}}\|% \mathbf{x}-\alpha\mathbf{x}_{\star}\|_{2}roman_dist ( bold_x , bold_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) := roman_min start_POSTSUBSCRIPT italic_α ∈ { ± 1 } end_POSTSUBSCRIPT ∥ bold_x - italic_α bold_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for 𝐱,𝐱d𝐱subscript𝐱superscript𝑑\mathbf{x},\mathbf{x}_{\star}\in\mathbb{R}^{d}bold_x , bold_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. Figure 1 shows that the unconstrained Gauss-Newton method, without any explicit control over the proximity to previous iterates, converges to the ground truth signal 𝐱subscript𝐱\mathbf{x}_{\star}bold_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT without overshooting.

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Convergence of Robust-AM by ADMM [17] (blue) and prox-linear by POGS (red) in run time (d=1,000,m=10,000,formulae-sequence𝑑1000𝑚10000d=1,000,m=10,000,italic_d = 1 , 000 , italic_m = 10 , 000 , and η=0.3𝜂0.3\eta=0.3italic_η = 0.3).
TABLE I: Comparison of RobustPhaseMax [10], Median-RWF [13], Prox-linear [14] and Robust-AM for robust phase retrieval in terms of computational cost to obtain ϵitalic-ϵ\epsilonitalic_ϵ-accurate solution and sparse noise assumptions for the performance guarantees.
Method Computational cost Algorithm type Support model Tolerable sparsity level
RobustPhaseMax 𝒪(m3+(m+d)2log(1/ϵ))𝒪superscript𝑚3superscript𝑚𝑑21italic-ϵ\mathcal{O}(m^{3}+(m+d)^{2}\log(1/\epsilon))caligraphic_O ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ ) ) ADMM for LP [18] adversarial unspecified
𝒪~((m+d)2.38log(1/ϵ))~𝒪superscript𝑚𝑑2.381italic-ϵ\widetilde{\mathcal{O}}((m+d)^{2.38}\log(1/\epsilon))over~ start_ARG caligraphic_O end_ARG ( ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2.38 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ ) ) Deterministic LP [19]
Median-RWF 𝒪(mdlog(1/ϵ))𝒪𝑚𝑑1italic-ϵ\mathcal{O}(md\log(1/\epsilon))caligraphic_O ( italic_m italic_d roman_log ( 1 / italic_ϵ ) ) truncated gradient descent arbitrary fixed unspecified
Prox-linear 𝒪(loglog(1/ϵ)(md2+mdlog(1/ϵ)))𝒪1italic-ϵ𝑚superscript𝑑2𝑚𝑑1italic-ϵ\mathcal{O}\left(\log\log(1/\epsilon)(md^{2}+md\log(1/\epsilon))\right)caligraphic_O ( roman_log roman_log ( 1 / italic_ϵ ) ( italic_m italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_m italic_d roman_log ( 1 / italic_ϵ ) ) )11footnotemark: 1 regularized Gauss-Newton (POGS) arbitrary fixed 1/4141/41 / 4
Robust-AM 𝒪(m3+(m+d)2log2(1/ϵ))𝒪superscript𝑚3superscript𝑚𝑑2superscript21italic-ϵ\mathcal{O}\left(m^{3}+(m+d)^{2}\log^{2}(1/\epsilon)\right)caligraphic_O ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 / italic_ϵ ) ) unconstrained Gauss-Newton via [18] arbitrary fixed 1/4141/41 / 4
(Theorem IV.1) 𝒪~((m+d)2.38log2(1/ϵ))~𝒪superscript𝑚𝑑2.38superscript21italic-ϵ\widetilde{\mathcal{O}}\left((m+d)^{2.38}\log^{2}(1/\epsilon)\right)over~ start_ARG caligraphic_O end_ARG ( ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2.38 end_POSTSUPERSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 / italic_ϵ ) ) unconstrained Gauss-Newton via [19]

1We establish this computational cost under the assumption that the POGS linear converges to the solution for the inner optimization of prox-linear. However, to the best of our knowledge, the convergence rate of POGS has not been shown. Thus, this computational cost is a conjecture.

Notations : Boldface lowercase letters denote column vectors. We use 1\|\cdot\|_{1}∥ ⋅ ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2\|\cdot\|_{2}∥ ⋅ ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to denote the 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm and the Euclidean norm respectively. For brevity, the shorthand notation [n]delimited-[]𝑛[n][ italic_n ] denotes the set {1,,n}1𝑛\{1,\ldots,n\}{ 1 , … , italic_n } for n𝑛n\in\mathbb{N}italic_n ∈ blackboard_N. We adopt the big-O notation so that qpless-than-or-similar-to𝑞𝑝q\lesssim pitalic_q ≲ italic_p is alternatively written as q=𝒪(p)𝑞𝒪𝑝q=\mathcal{O}(p)italic_q = caligraphic_O ( italic_p ). With a notation 𝒪~~𝒪\widetilde{\mathcal{O}}over~ start_ARG caligraphic_O end_ARG, we ignore logarithmic factors.

II Robust Alternating Minimization

We consider the minimization of the composite function =hF𝐹\ell=h\circ Froman_ℓ = italic_h ∘ italic_F where h:m:superscript𝑚h:\mathbb{R}^{m}\to\mathbb{R}italic_h : blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → blackboard_R is a convex function and F:dm:𝐹superscript𝑑superscript𝑚F:\mathbb{R}^{d}\to\mathbb{R}^{m}italic_F : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT is a nonlinear mapping. In the special case when F𝐹Fitalic_F is differentiable, Burke and Ferris [20] proposed a constrained Gauss-Newton method where the amount of the update is upper-bounded by a threshold. Duchi and Ruan [14] considered a variant where the constraint on the proximity on consecutive iterates is substituted by regularization with an additive penalty. We consider a more challenging case where F𝐹Fitalic_F is non-differentiable and propose an unconstrained Gauss-Newton method where the variable sequence (𝒙k)k{0}subscriptsubscript𝒙𝑘𝑘0(\bm{x}_{k})_{k\in\mathbb{N}\cup\{0\}}( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ blackboard_N ∪ { 0 } end_POSTSUBSCRIPT is iteratively updated by

𝒙k+1argmin𝒙dh(F(𝒙k)+F(𝒙k)(𝒙𝒙k))subscript𝒙𝑘1subscriptargmin𝒙superscript𝑑𝐹subscript𝒙𝑘superscript𝐹subscript𝒙𝑘𝒙subscript𝒙𝑘\bm{x}_{k+1}\in\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{d}}\,h(F(\bm{x}_{k% })+F^{\prime}(\bm{x}_{k})(\bm{x}-\bm{x}_{k}))bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∈ roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_h ( italic_F ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ( bold_italic_x - bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) (3)

where F(𝒙k)m×dsuperscript𝐹subscript𝒙𝑘superscript𝑚𝑑F^{\prime}(\bm{x}_{k})\in\mathbb{R}^{m\times d}italic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT denotes the Clarke’s generalized Jacobian matrix at 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT [21]. Due to the local linear approximation of F𝐹Fitalic_F at 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in (3), 𝒙k+1subscript𝒙𝑘1\bm{x}_{k+1}bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT is obtained as a solution to a convex program. In a special case where h:m:superscript𝑚h:\mathbb{R}^{m}\to\mathbb{R}italic_h : blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT → blackboard_R and F:dm:𝐹superscript𝑑superscript𝑚F:\mathbb{R}^{d}\to\mathbb{R}^{m}italic_F : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are respectively given by

h(𝒛)=𝒛1𝒛subscriptnorm𝒛1h(\bm{z})=\|\bm{z}\|_{1}italic_h ( bold_italic_z ) = ∥ bold_italic_z ∥ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (4)

and

F(𝒙)=(|𝒂i,𝒙|bi)i=1m,𝐹𝒙superscriptsubscriptsubscript𝒂𝑖𝒙subscript𝑏𝑖𝑖1𝑚F(\bm{x})=\left(\left|\langle\bm{a}_{i},\bm{x}\rangle\right|-b_{i}\right)_{i=1% }^{m},italic_F ( bold_italic_x ) = ( | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ⟩ | - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT , (5)

their composition reduces to

(𝒙):=1mi=1m||𝒂i,𝒙|bi|.assign𝒙1𝑚superscriptsubscript𝑖1𝑚subscript𝒂𝑖𝒙subscript𝑏𝑖\ell(\bm{x}):=\frac{1}{m}\sum_{i=1}^{m}\left|\left|\langle\bm{a}_{i},\bm{x}% \rangle\right|-b_{i}\right|.roman_ℓ ( bold_italic_x ) := divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ⟩ | - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | . (6)

Then the minimization of \ellroman_ℓ corresponds to the LAD approach to robust phase retrieval with the absolute amplitude measurement model. Furthermore, given hhitalic_h and F𝐹Fitalic_F as in (4) and (5), the update rule in (3) is explicitly written as

𝒙k+1argmin𝒙di=1m|𝒂i,𝒙sign(𝒂i,𝒙k)bi|.subscript𝒙𝑘1subscriptargmin𝒙superscript𝑑superscriptsubscript𝑖1𝑚subscript𝒂𝑖𝒙signsubscript𝒂𝑖subscript𝒙𝑘subscript𝑏𝑖\bm{x}_{k+1}\in\operatorname*{argmin}_{\bm{x}\in\mathbb{R}^{d}}\sum_{i=1}^{m}% \left|\langle\bm{a}_{i},\bm{x}\rangle-\mathrm{sign}(\langle\bm{a}_{i},\bm{x}_{% k}\rangle)\cdot b_{i}\right|.bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∈ roman_argmin start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ⟩ - roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⋅ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | . (7)

The resulting algorithm (7), derived from an unconstrained Gauss-Newton method of robust phase retrieval, is equivalent to an alternating minimization approach to the LAD formulation of robust phase retrieval when noisy measurements with a negative sign are discarded. An analogous alternating minimization for least-squares phase retrieval has been studied in the literature [16, 22]. Due to the robustness of LAD, we refer to the iterative algorithm by (7) as a robust alternating minimization (Robust-AM).

Duchi and Ruan [14] considered a similar robust phase retrieval with the squared amplitude measurement model via their regularized Gauss-Newton method.

III Optimization Algorithms

This section discusses numerical algorithms for Robust-AM. First, we note that the optimization in (7) is equivalent to a linear program

minimize𝒙d,(ti)i=1m𝒕,𝟏msubjecttoti𝒂i,𝒙sign(𝒂i,𝒙k)bi,ti𝒂i,𝒙+sign(𝒂i,𝒙k)bi,i[m]subscriptminimize𝒙superscript𝑑superscriptsubscriptsubscript𝑡𝑖𝑖1𝑚𝒕subscript1𝑚subjecttosubscript𝑡𝑖subscript𝒂𝑖𝒙signsubscript𝒂𝑖subscript𝒙𝑘subscript𝑏𝑖missing-subexpressionformulae-sequencesubscript𝑡𝑖subscript𝒂𝑖𝒙signsubscript𝒂𝑖subscript𝒙𝑘subscript𝑏𝑖for-all𝑖delimited-[]𝑚\begin{array}[]{cl}\displaystyle\mathop{\mathrm{minimize}}_{\bm{x}\in\mathbb{R% }^{d},(t_{i})_{i=1}^{m}}&\displaystyle\langle{\bm{t}},\bm{1}_{m}\rangle\\ \mathrm{subject~{}to}&\displaystyle t_{i}\geq\langle\bm{a}_{i},\bm{x}\rangle-% \mathrm{sign}(\langle\bm{a}_{i},\bm{x}_{k}\rangle)\cdot b_{i},\\ &t_{i}\geq-\langle\bm{a}_{i},\bm{x}\rangle+\mathrm{sign}(\langle\bm{a}_{i},\bm% {x}_{k}\rangle)\cdot b_{i},\quad\forall i\in[m]\end{array}start_ARRAY start_ROW start_CELL roman_minimize start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL ⟨ bold_italic_t , bold_1 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⟩ end_CELL end_ROW start_ROW start_CELL roman_subject roman_to end_CELL start_CELL italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ⟩ - roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⋅ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ - ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ⟩ + roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⋅ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ∀ italic_i ∈ [ italic_m ] end_CELL end_ROW end_ARRAY (8)

where 𝟏m=[1,,1]𝖳msubscript1𝑚superscript11𝖳superscript𝑚\bm{1}_{m}=[1,\dots,1]^{\mathsf{T}}\in\mathbb{R}^{m}bold_1 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = [ 1 , … , 1 ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. There exist various computationally efficient numerical methods to solve linear programs. For example, the derandomized algorithm by van den Brand [19] finds an exact solution to a linear program with d𝑑ditalic_d variables and m𝑚mitalic_m constraints at the cost of 𝒪~((m+d)c)~𝒪superscript𝑚𝑑𝑐\widetilde{\mathcal{O}}\left((m+d)^{c}\right)over~ start_ARG caligraphic_O end_ARG ( ( italic_m + italic_d ) start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ) multiplications where c2.38𝑐2.38c\approx 2.38italic_c ≈ 2.38.

To further accelerate the convergence of Robust-AM, we also adopt iterative numerical algorithms that provide an approximate solution to the inner optimization in (7). In particular, we consider two alternating direction method of multipliers (ADMM) algorithms and a subgradient descent algorithm for the inner optimization. We refer to the Robust-AM with approximate solutions to the inner optimization by these ADMM algorithms as fast Robust-AM since they provide a significantly lower computational cost for the entire convergence of Robust-AM to an ϵitalic-ϵ\epsilonitalic_ϵ-accurate estimate.

III-A ADMM for LAD

Given 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, the optimization in (7) is viewed as LAD for linear regression and one can use an ADMM algorithm for LAD [17, Chapter 6.1]. To describe the update rule of the ADMM algorithm, we introduce shorthand notations for the sake of brevity. Let 𝑨m×d𝑨superscript𝑚𝑑\bm{A}\in\mathbb{R}^{m\times d}bold_italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT be a matrix whose i𝑖iitalic_i-th row is 𝒂iTsuperscriptsubscript𝒂𝑖T\bm{a}_{i}^{\scriptscriptstyle{\textup{{T}}}}bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT for i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ], 𝒃:=(b1,,bm)massign𝒃subscript𝑏1subscript𝑏𝑚superscript𝑚\bm{b}:=(b_{1},\ldots,b_{m})\in\mathbb{R}^{m}bold_italic_b := ( italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and 𝚲k=diag(sign(𝒂1,𝒙k),,sign(𝒂m,𝒙k))subscript𝚲𝑘diagsignsubscript𝒂1subscript𝒙𝑘signsubscript𝒂𝑚subscript𝒙𝑘\bm{\Lambda}_{k}=\mathrm{diag}(\mathrm{sign}(\langle\bm{a}_{1},\bm{x}_{k}% \rangle),\ldots,\mathrm{sign}(\langle\bm{a}_{m},\bm{x}_{k}\rangle))bold_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_diag ( roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) , … , roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ). By following [17, Chapter 6.1] with an auxiliary variable 𝒚tdsuperscript𝒚𝑡superscript𝑑\bm{y}^{t}\in\mathbb{R}^{d}bold_italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and dual variable ϕtmsuperscriptbold-italic-ϕ𝑡superscript𝑚\bm{\phi}^{t}\in\mathbb{R}^{m}bold_italic_ϕ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, the update rules are given in a closed form as follows:

𝒙t+1=𝑨+(𝒚t1ρϕt),superscript𝒙𝑡1superscript𝑨superscript𝒚𝑡1𝜌superscriptbold-italic-ϕ𝑡\displaystyle\bm{x}^{t+1}=\bm{A}^{+}\left(\bm{y}^{t}-\frac{1}{\rho}\bm{\phi}^{% t}\right),bold_italic_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_A start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ( bold_italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG bold_italic_ϕ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , (9a)
𝒚t+1=𝚲k𝒃superscript𝒚𝑡1subscript𝚲𝑘𝒃\displaystyle\bm{y}^{t+1}=\bm{\Lambda}_{k}\bm{b}bold_italic_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_b
+sign(𝑨𝒙+1ρϕ𝚲k𝒃)[|𝑨𝒙+1ρϕ𝚲k𝒃|1ρ]+,direct-productsign𝑨𝒙1𝜌bold-italic-ϕsubscript𝚲𝑘𝒃subscriptdelimited-[]𝑨𝒙1𝜌bold-italic-ϕsubscript𝚲𝑘𝒃1𝜌\displaystyle\,+\mathrm{sign}\left(\bm{A}\bm{x}+\frac{1}{\rho}\bm{\phi}-\bm{% \Lambda}_{k}\bm{b}\right)\odot\left[\left|\bm{A}\bm{x}+\frac{1}{\rho}\bm{\phi}% -\bm{\Lambda}_{k}\bm{b}\right|-\frac{1}{\rho}\right]_{+},+ roman_sign ( bold_italic_A bold_italic_x + divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG bold_italic_ϕ - bold_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_b ) ⊙ [ | bold_italic_A bold_italic_x + divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG bold_italic_ϕ - bold_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_b | - divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , (9b)
ϕt+1=ϕt+ρ(𝑨𝒙t+1𝒚t+1),superscriptbold-italic-ϕ𝑡1superscriptbold-italic-ϕ𝑡𝜌𝑨superscript𝒙𝑡1superscript𝒚𝑡1\displaystyle\bm{\phi}^{t+1}=\bm{\phi}^{t}+\rho(\bm{A}\bm{x}^{t+1}-\bm{y}^{t+1% }),bold_italic_ϕ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_ϕ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_ρ ( bold_italic_A bold_italic_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_italic_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , (9c)

where direct-product\odot denotes the Hadamard product. The most expensive step in (9) is the least squares problem in (9a). Since it repeats with the same 𝑨𝑨\bm{A}bold_italic_A, the pseudo inverse 𝑨+superscript𝑨\bm{A}^{+}bold_italic_A start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT of 𝑨𝑨\bm{A}bold_italic_A can be pre-computed as 𝑨+=(𝑨T𝑨)1𝑨Tsuperscript𝑨superscriptsuperscript𝑨T𝑨1superscript𝑨T\bm{A}^{+}=(\bm{A}^{\scriptscriptstyle{\textup{{T}}}}\bm{A})^{-1}\bm{A}^{% \scriptscriptstyle{\textup{{T}}}}bold_italic_A start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = ( bold_italic_A start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT bold_italic_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_A start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT with cost 𝒪(d3+d2m)𝒪superscript𝑑3superscript𝑑2𝑚\mathcal{O}(d^{3}+d^{2}m)caligraphic_O ( italic_d start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + italic_d start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_m ) and be used on memory over iterations. For faster convergence, we adopt the varying step size strategy for ρ𝜌\rhoitalic_ρ [17, Section 3.4.1]. Importantly, 𝑨𝑨\bm{A}bold_italic_A remains the same over the outer iteration of Robust-AM, the pseudo inverse is computed only once. The POGS algorithm [23] for the prox-linear [14, Section 5] involves a similar matrix inversion. However, since their matrix evolves over the outer iteration, unlike the fast Robust-AM with ADMM, it is necessary for POGS to repeat the matrix inversion. Recall that we wanted to adopt ADMM for the inner iteration of Robust-AM to accelerate the convergence with approximate solutions. Therefore, the convergence rate in the inner optimization is crucial. However, to the best of our knowledge, the convergence rate has not been shown for the above ADMM algorithm and the POGS algorithm. Below we will present another ADMM algorithm and a subgradient descent method for (7) with proven linear convergence in the next section. Despite their theoretical convergence results, the ADMM by (9) empirically outperformed the other methods. In our numerical studies, we found that the fast Robust-AM with ADMM by (9) provides faster empirical convergence than POGS (see Figure 1).

III-B ADMM for linear program with linear convergence

Wang and Shroff [18] proposed the ADMM approach for a linear program and showed that their ADMM approach solves a linear program significantly faster than standard software such as CPLEX [24] and Gurobi [25]. Moreover, they showed the linear convergence result for their ADMM approach. To apply their approach to our linear program (8), we reformulate it into the standard form of a linear program (only with equality constraints) [18, Equation 1] by introducing 2m2𝑚2m2 italic_m auxiliary variables 𝒖,𝒗m𝒖𝒗superscript𝑚\bm{u},\bm{v}\in\mathbb{R}^{m}bold_italic_u , bold_italic_v ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT as

minimize𝒘d+3m𝒄,𝒘subjectto𝑩𝒘=𝒑k,𝒖,𝒔𝟎m,subscriptminimize𝒘superscript𝑑3𝑚𝒄𝒘subjecttoformulae-sequence𝑩𝒘subscript𝒑𝑘𝒖𝒔subscript0𝑚\begin{array}[]{ll}\displaystyle\mathop{\mathrm{minimize}}_{\bm{w}\in\mathbb{R% }^{d+3m}}&\langle\bm{c},\bm{w}\rangle\\ \mathrm{subject~{}to}&\displaystyle\bm{B}\bm{w}=\bm{p}_{k},\quad\bm{u},\bm{s}% \geq\bm{0}_{m},\end{array}start_ARRAY start_ROW start_CELL roman_minimize start_POSTSUBSCRIPT bold_italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 3 italic_m end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_CELL start_CELL ⟨ bold_italic_c , bold_italic_w ⟩ end_CELL end_ROW start_ROW start_CELL roman_subject roman_to end_CELL start_CELL bold_italic_B bold_italic_w = bold_italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_u , bold_italic_s ≥ bold_0 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , end_CELL end_ROW end_ARRAY (10)

where 𝟎m:=[0,,0]Tmassignsubscript0𝑚superscript00Tsuperscript𝑚\bm{0}_{m}:=[0,\ldots,0]^{\scriptscriptstyle{\textup{{T}}}}\in\mathbb{R}^{m}bold_0 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT := [ 0 , … , 0 ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, 𝟎m,d:=[𝟎m,,𝟎m]m×dassignsubscript0𝑚𝑑subscript0𝑚subscript0𝑚superscript𝑚𝑑\bm{0}_{m,d}:=[\bm{0}_{m},\ldots,\bm{0}_{m}]\in\mathbb{R}^{m\times d}bold_0 start_POSTSUBSCRIPT italic_m , italic_d end_POSTSUBSCRIPT := [ bold_0 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , … , bold_0 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT, and

𝒄:=[𝟎d; 1m; 0m; 0m]d+3massign𝒄subscript0𝑑subscript1𝑚subscript 0𝑚subscript 0𝑚superscript𝑑3𝑚\displaystyle\bm{c}:=[\bm{0}_{d};\,\bm{1}_{m};\,\bm{0}_{m};\,\bm{0}_{m}]\in% \mathbb{R}^{d+3m}bold_italic_c := [ bold_0 start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ; bold_1 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ; bold_0 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ; bold_0 start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 3 italic_m end_POSTSUPERSCRIPT
𝒘:=[𝒙;𝒕;𝒖;𝒔]d+3massign𝒘𝒙𝒕𝒖𝒔superscript𝑑3𝑚\displaystyle\bm{w}:=[\bm{x};\,\bm{t};\,\bm{u};\,\bm{s}]\in\mathbb{R}^{d+3m}bold_italic_w := [ bold_italic_x ; bold_italic_t ; bold_italic_u ; bold_italic_s ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 3 italic_m end_POSTSUPERSCRIPT
𝒑k:=[𝚲k𝒃;𝚲k𝒃]2massignsubscript𝒑𝑘subscript𝚲𝑘𝒃subscript𝚲𝑘𝒃superscript2𝑚\displaystyle\bm{p}_{k}:=[\bm{\Lambda}_{k}\bm{b};\,\bm{\Lambda}_{k}\bm{b}]\in% \mathbb{R}^{2m}bold_italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := [ bold_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_b ; bold_Λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_b ] ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_m end_POSTSUPERSCRIPT
𝐁:=[𝐀𝐈m𝟎m,m𝐈m𝐀𝐈m𝐈m𝟎m,m]2m×(d+3m).assign𝐁matrix𝐀subscript𝐈𝑚subscript0𝑚𝑚subscript𝐈𝑚𝐀subscript𝐈𝑚subscript𝐈𝑚subscript0𝑚𝑚superscript2𝑚𝑑3𝑚\displaystyle\mathbf{B}:=\begin{bmatrix}\mathbf{A}&-\mathbf{I}_{m}&\mathbf{0}_% {m,m}&\mathbf{I}_{m}\\ \mathbf{A}&\mathbf{I}_{m}&-\mathbf{I}_{m}&\mathbf{0}_{m,m}\end{bmatrix}\in% \mathbb{R}^{2m\times(d+3m)}.bold_B := [ start_ARG start_ROW start_CELL bold_A end_CELL start_CELL - bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_m , italic_m end_POSTSUBSCRIPT end_CELL start_CELL bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_A end_CELL start_CELL bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL start_CELL - bold_I start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL start_CELL bold_0 start_POSTSUBSCRIPT italic_m , italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_m × ( italic_d + 3 italic_m ) end_POSTSUPERSCRIPT .

Then, by following [18, Algorithm 1], the update rule is given as a closed form with auxiliary variable 𝒚t=[𝒚1t;𝒚2t]d+3msuperscript𝒚𝑡superscriptsubscript𝒚1𝑡superscriptsubscript𝒚2𝑡superscript𝑑3𝑚\bm{y}^{t}=[\bm{y}_{1}^{t};\,\bm{y}_{2}^{t}]\in\mathbb{R}^{d+3m}bold_italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 3 italic_m end_POSTSUPERSCRIPT and dual variable 𝒛t=[𝒛1t;𝒛2t]d+5msuperscript𝒛𝑡superscriptsubscript𝒛1𝑡superscriptsubscript𝒛2𝑡superscript𝑑5𝑚\bm{z}^{t}=[\bm{z}_{1}^{t};\,\bm{z}_{2}^{t}]\in\mathbb{R}^{d+5m}bold_italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ bold_italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; bold_italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 5 italic_m end_POSTSUPERSCRIPT for 𝒚1d+msubscript𝒚1superscript𝑑𝑚\bm{y}_{1}\in\mathbb{R}^{d+m}bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + italic_m end_POSTSUPERSCRIPT, 𝒚2,𝒛12m,subscript𝒚2subscript𝒛1superscript2𝑚\bm{y}_{2},\bm{z}_{1}\in\mathbb{R}^{2m},bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_m end_POSTSUPERSCRIPT , and 𝒛2d+3msubscript𝒛2superscript𝑑3𝑚\bm{z}_{2}\in\mathbb{R}^{d+3m}bold_italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d + 3 italic_m end_POSTSUPERSCRIPT as

𝒘t+1=1ρ(𝑰+𝑩T𝑩)1(𝑩1T(𝒛t+ρ(𝑩2𝒚t𝒑¯k))+𝒄),superscript𝒘𝑡11𝜌superscript𝑰superscript𝑩T𝑩1superscriptsubscript𝑩1Tsuperscript𝒛𝑡𝜌subscript𝑩2superscript𝒚𝑡subscript¯𝒑𝑘𝒄\displaystyle\bm{w}^{t+1}=\frac{1}{\rho}\left(\bm{I}+\bm{B}^{% \scriptscriptstyle{\textup{{T}}}}\bm{B}\right)^{-1}\left(\bm{B}_{1}^{% \scriptscriptstyle{\textup{{T}}}}\left(\bm{z}^{t}+\rho(\bm{B}_{2}\bm{y}^{t}-% \bar{\bm{p}}_{k})\right)+\bm{c}\right),bold_italic_w start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_ρ end_ARG ( bold_italic_I + bold_italic_B start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT bold_italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ( bold_italic_z start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_ρ ( bold_italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_p end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) + bold_italic_c ) , (11a)
𝒚t+1=𝒘t+1+𝒛ytρ,𝒚2t+1=[𝒚2t+1]+,formulae-sequencesuperscript𝒚𝑡1superscript𝒘𝑡1superscriptsubscript𝒛𝑦𝑡𝜌subscriptsuperscript𝒚𝑡12subscriptdelimited-[]subscriptsuperscript𝒚𝑡12\displaystyle\bm{y}^{t+1}=\bm{w}^{t+1}+\frac{\bm{z}_{y}^{t}}{\rho},\quad\bm{y}% ^{t+1}_{2}=[\bm{y}^{t+1}_{2}]_{+},bold_italic_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_w start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT + divide start_ARG bold_italic_z start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG italic_ρ end_ARG , bold_italic_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = [ bold_italic_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , (11b)
𝒛1t+1=𝒛1t+ρ(𝑩𝒙t+1𝒑),𝒛2t+1=𝒛2t+ρ(𝒘t+1𝒚t+1),formulae-sequencesuperscriptsubscript𝒛1𝑡1superscriptsubscript𝒛1𝑡𝜌𝑩superscript𝒙𝑡1𝒑superscriptsubscript𝒛2𝑡1superscriptsubscript𝒛2𝑡𝜌superscript𝒘𝑡1superscript𝒚𝑡1\displaystyle\bm{z}_{1}^{t+1}=\bm{z}_{1}^{t}+\rho\left(\bm{B}\bm{x}^{t+1}-\bm{% p}\right),\,\,\bm{z}_{2}^{t+1}=\bm{z}_{2}^{t}+\rho(\bm{w}^{t+1}-\bm{y}^{t+1}),bold_italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_ρ ( bold_italic_B bold_italic_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_italic_p ) , bold_italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_ρ ( bold_italic_w start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT - bold_italic_y start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) , (11c)

where

𝑩1:=[𝑩𝑰d+3m],𝑩2:=[𝟎d+2m,d+3m𝑰d+3m],𝒑¯k:=[𝒑k𝟎3m],formulae-sequenceassignsubscript𝑩1matrix𝑩subscript𝑰𝑑3𝑚formulae-sequenceassignsubscript𝑩2matrixsubscript0𝑑2𝑚𝑑3𝑚subscript𝑰𝑑3𝑚assignsubscript¯𝒑𝑘matrixsubscript𝒑𝑘subscript03𝑚\bm{B}_{1}:=\begin{bmatrix}\bm{B}\\ \bm{I}_{d+3m}\end{bmatrix},\quad\bm{B}_{2}:=\begin{bmatrix}\bm{0}_{d+2m,d+3m}% \\ -\bm{I}_{d+3m}\end{bmatrix},\quad\bar{\bm{p}}_{k}:=\begin{bmatrix}\bm{p}_{k}\\ \bm{0}_{3m}\end{bmatrix},bold_italic_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL bold_italic_B end_CELL end_ROW start_ROW start_CELL bold_italic_I start_POSTSUBSCRIPT italic_d + 3 italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , bold_italic_B start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL bold_0 start_POSTSUBSCRIPT italic_d + 2 italic_m , italic_d + 3 italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL - bold_italic_I start_POSTSUBSCRIPT italic_d + 3 italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , over¯ start_ARG bold_italic_p end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL bold_italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_0 start_POSTSUBSCRIPT 3 italic_m end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,

and []+subscriptdelimited-[][\cdot]_{+}[ ⋅ ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT takes the positive part of each entry of the input vector. The most expensive step is the matrix inversion given in (11a). It is calculated via the matrix-inversion lemma as

(𝑰d+3m+𝑩T𝑩)1=𝑰d+3m𝑩T(𝑰2m+𝑩𝑩T)1𝑩superscriptsubscript𝑰𝑑3𝑚superscript𝑩T𝑩1subscript𝑰𝑑3𝑚superscript𝑩Tsuperscriptsubscript𝑰2𝑚𝑩superscript𝑩T1𝑩(\bm{I}_{d+3m}+\bm{B}^{\scriptscriptstyle{\textup{{T}}}}\bm{B})^{-1}=\bm{I}_{d% +3m}-\bm{B}^{\scriptscriptstyle{\textup{{T}}}}(\bm{I}_{2m}+\bm{B}\bm{B}^{% \scriptscriptstyle{\textup{{T}}}})^{-1}\bm{B}( bold_italic_I start_POSTSUBSCRIPT italic_d + 3 italic_m end_POSTSUBSCRIPT + bold_italic_B start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT bold_italic_B ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_italic_I start_POSTSUBSCRIPT italic_d + 3 italic_m end_POSTSUBSCRIPT - bold_italic_B start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ( bold_italic_I start_POSTSUBSCRIPT 2 italic_m end_POSTSUBSCRIPT + bold_italic_B bold_italic_B start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_B

with cost 𝒪(m3)𝒪superscript𝑚3\mathcal{O}(m^{3})caligraphic_O ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ). Since this step does not depend on previous outer iterations, one can use a pre-computed result on memory over the inner and outer iterations. Hence, by the linear convergence result [18, Theorem 1], the cost for an ϵksubscriptitalic-ϵ𝑘\epsilon_{k}italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT-accurate solution to (10) is 𝒪(m3+(m+d)2log(1/ϵk))𝒪superscript𝑚3superscript𝑚𝑑21subscriptitalic-ϵ𝑘\mathcal{O}\left(m^{3}+(m+d)^{2}\log(1/\epsilon_{k})\right)caligraphic_O ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ). However, due to more auxiliary variables in (10) compared to (7), in our numerical studies, the ADMM algorithm by (11) showed slower convergence in the run time relative to the algorithm by (9).

III-C Subgradient descent for LAD

Yang and Lin [26] proposed a restarted subgradient (RSG) for non-smooth optimization. The specification of their subgradient descent to LAD in (7) is written as

𝒙t+1=𝒙tηtmi=1msign(𝒂i,𝒙tsign(𝒂i,𝒙i)bi)𝒂i,superscript𝒙𝑡1superscript𝒙𝑡subscript𝜂𝑡𝑚superscriptsubscript𝑖1𝑚signsubscript𝒂𝑖superscript𝒙𝑡signsubscript𝒂𝑖subscript𝒙𝑖subscript𝑏𝑖subscript𝒂𝑖\bm{x}^{t+1}=\bm{x}^{t}-\frac{\eta_{t}}{m}\sum_{i=1}^{m}\mathrm{sign}\left(% \langle\bm{a}_{i},\bm{x}^{t}\rangle-\mathrm{sign}(\langle\bm{a}_{i},\bm{x}_{i}% \rangle)\cdot b_{i}\right)\cdot\bm{a}_{i},bold_italic_x start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = bold_italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - divide start_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⟩ - roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ ) ⋅ italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (12)

where ηt>0subscript𝜂𝑡0\eta_{t}>0italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 0 denotes a step size. The step size remains the same for T𝑇Titalic_T consecutive iterations and then decreases by half. They showed that the subsequence of iterates sampled at every T𝑇Titalic_T indices converges at a linear rate for a sufficiently large T𝑇Titalic_T. Therefore, the cost for an ϵitalic-ϵ\epsilonitalic_ϵ-accurate solution to (7) is 𝒪(mdTlog(1/ϵ))𝒪𝑚𝑑𝑇1italic-ϵ\mathcal{O}(mdT\log(1/\epsilon))caligraphic_O ( italic_m italic_d italic_T roman_log ( 1 / italic_ϵ ) ). However, in our numerical studies, RSG did not provide the fastest convergence in the run time compared with the other ADMM algorithms.

IV Theoretical results

In this section, we present the convergence analysis of the Robust-AM algorithms under the following assumptions. First, we adopt the standard random linear measurements and outliers with arbitrary support and adversarial values [14].

Assumption 1: The measurement vectors (𝒂i)i=1msuperscriptsubscriptsubscript𝒂𝑖𝑖1𝑚(\bm{a}_{i})_{i=1}^{m}( bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are independent copies of 𝒂Normal(𝟎,𝑰d)similar-to𝒂Normal0subscript𝑰𝑑\bm{a}\sim\mathrm{Normal}(\bm{0},\bm{I}_{d})bold_italic_a ∼ roman_Normal ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ).

Assumption 2: The outliers are supported on an arbitrarily fixed set Ioutsubscript𝐼outI_{\mathrm{out}}italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT with |Iout|=ηmsubscript𝐼out𝜂𝑚|I_{\mathrm{out}}|=\eta m| italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT | = italic_η italic_m for η[0,1/4]𝜂014\eta\in[0,1/4]italic_η ∈ [ 0 , 1 / 4 ] and their magnitudes |ξi|subscript𝜉𝑖|\xi_{i}|| italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | can be adversarial.

Additionally, to provide the convergence analysis of the fast Robust-AM, we introduce an extra assumption that quantifies the suboptimality of solving (13) by ADMM.

Assumption 3: There exists a bounded sequence (ϵk)ksubscriptsubscriptitalic-ϵ𝑘𝑘(\epsilon_{k})_{k\in\mathbb{N}}( italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT such that 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is an inexact minimizer up to the sub-optimality level ϵksubscriptitalic-ϵ𝑘\epsilon_{k}italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N, i.e.

i=1m|sign(𝒂i,𝒙k)𝒂i,𝒙k+1bi|superscriptsubscript𝑖1𝑚signsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖subscript𝒙𝑘1subscript𝑏𝑖\displaystyle\sum_{i=1}^{m}\left|\mathrm{sign}(\langle\bm{a}_{i},\bm{x}_{k}% \rangle)\langle\bm{a}_{i},\bm{x}_{k+1}\rangle-b_{i}\right|∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⟩ - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | (13)
ϵk+min𝒙di=1m|sign(𝒂i,𝒙k)𝒂i,𝒙bi|.absentsubscriptitalic-ϵ𝑘subscript𝒙superscript𝑑superscriptsubscript𝑖1𝑚signsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖𝒙subscript𝑏𝑖\displaystyle\leq\epsilon_{k}+\min_{\bm{x}\in\mathbb{R}^{d}}\sum_{i=1}^{m}% \left|\mathrm{sign}(\langle\bm{a}_{i},\bm{x}_{k}\rangle)\langle\bm{a}_{i},\bm{% x}\rangle-b_{i}\right|.≤ italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_min start_POSTSUBSCRIPT bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ⟩ - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | .

We denote the highest sub-optimality level as ϵmaxsubscriptitalic-ϵ\epsilon_{\max}italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT, i.e.

ϵmax:=maxkϵk.assignsubscriptitalic-ϵsubscript𝑘subscriptitalic-ϵ𝑘\epsilon_{\max}:=\max_{k\in\mathbb{N}}\epsilon_{k}.italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT := roman_max start_POSTSUBSCRIPT italic_k ∈ blackboard_N end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .
Theorem IV.1.

Suppose that Assumptions IV, IV, and IV hold. Then there exist absolute constants C,c>0𝐶𝑐0C,c>0italic_C , italic_c > 0 and constants νη(0,1),λη>0formulae-sequencesubscript𝜈𝜂01subscript𝜆𝜂0\nu_{\eta}\in(0,1),\lambda_{\eta}>0italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ∈ ( 0 , 1 ) , italic_λ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT > 0 depending only on η𝜂\etaitalic_η, for which the following statement holds for all 𝐱dsubscript𝐱superscript𝑑\bm{x}_{\star}\in\mathbb{R}^{d}bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with probability at least 1exp(cd)1𝑐𝑑1-\exp(-cd)1 - roman_exp ( - italic_c italic_d ): If mCd𝑚𝐶𝑑m\geq Cditalic_m ≥ italic_C italic_d and

max(dist(𝒙0,𝒙),ληϵmax)sin(2/25)𝒙2,distsubscript𝒙0subscript𝒙subscript𝜆𝜂subscriptitalic-ϵ225subscriptnormsubscript𝒙2\max\left(\mathrm{dist}\left(\bm{x}_{0},\bm{x}_{\star}\right),\lambda_{\eta}% \epsilon_{\max}\right)\leq\sin(2/25)\|\bm{x}_{\star}\|_{2},roman_max ( roman_dist ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) , italic_λ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ) ≤ roman_sin ( 2 / 25 ) ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (14)

then the sequence (𝐱k)k{0}subscriptsubscript𝐱𝑘𝑘0\left(\bm{x}_{k}\right)_{k\in\mathbb{N}\cup\{0\}}( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_k ∈ blackboard_N ∪ { 0 } end_POSTSUBSCRIPT by the fast Robust-AM algorithm satisfies

dist(𝒙k,𝒙)νηkdist(𝒙0,𝒙)+ληϵmaxdistsubscript𝒙𝑘subscript𝒙superscriptsubscript𝜈𝜂𝑘distsubscript𝒙0subscript𝒙subscript𝜆𝜂subscriptitalic-ϵ\displaystyle\mathrm{dist}\left(\bm{x}_{k},\bm{x}_{\star}\right)\leq\nu_{\eta}% ^{k}\cdot\mathrm{dist}\left(\bm{x}_{0},\bm{x}_{\star}\right)+\lambda_{\eta}% \epsilon_{\max}roman_dist ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ≤ italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⋅ roman_dist ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) + italic_λ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT (15)

for all k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N, where dist(𝐱,𝐱):=minα{±1}𝐱α𝐱2assigndist𝐱subscript𝐱subscript𝛼plus-or-minus1subscriptnorm𝐱𝛼subscript𝐱2\mathrm{dist}(\bm{x},\bm{x}_{\star}):=\min_{\alpha\in\{\pm 1\}}\|\bm{x}-\alpha% \bm{x}_{\star}\|_{2}roman_dist ( bold_italic_x , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) := roman_min start_POSTSUBSCRIPT italic_α ∈ { ± 1 } end_POSTSUBSCRIPT ∥ bold_italic_x - italic_α bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

The Robust-AM algorithm updates iterates with an exact solution to (7). Therefore, setting ϵmaxsubscriptitalic-ϵ\epsilon_{\max}italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT to 00 in Theorem IV.1 provides a sufficient condition for the exact recovery of 𝒙subscript𝒙\bm{x}_{\star}bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT by Robust-AM. We compare the specification of Theorem IV.1 to this scenario to the analogous results for competing methods: RobustPhaseMax [10], Median-RWF[13], and prox-linear [14]. Theorem IV.1 as well as the previous results achieve the exact recovery when the number of observations m𝑚mitalic_m exceeds a multiple of the signal dimension d𝑑ditalic_d. Earlier theoretical results on RobustPhaseMax and Median-RWF showed that there exists an unspecified numerical constant so that the algorithms provide the exact recovery if the outlier fraction is below this constant. In contrast, the analyses of the prox-linear [14] and Robust-AM (Theorem IV.1) demonstrate that these methods can tolerate outliers up to 1/4141/41 / 4 of the total observations. These theoretical guarantees consider different degrees of adversary for their outlier models. The performance guarantee of RobustPhaseMax by Hand [10] assumed the highest adversary so that both the support and values of sparse noise are adversarial. The performance guarantees of Median-RWF by Zhang et al. [13] considered the same outlier model as in Assumption IV, but they also introduced additive noise of a bounded norm in addition to sparse noise. Duchi and Ruan [14] used the lowest adversary so that the support of sparse noise is random but the nonzero values of sparse noise can depend on the measurements. Despite providing performance guarantees under the highest adversary, as shown in Section V, RobustPhaseMax showed significantly inferior empirical performance relative to the other methods in terms of the tolerable outlier ratio.

Theorem IV.1 establishes a local linear convergence of the Robust-AM algorithms. As discussed in Section II, Robust-AM has no explicit control over the amount of the update in each iteration unlike the constrained or regularized versions of the Gauss-Newton method [20, 14]. However, despite its simple form, Robust-AM provides the monotone decrease of the estimation error toward zero without any overshooting for robust phase retrieval in the setting of Theorem IV.1. All convergence analyses by Theorem IV.1 and previous work [13, 14] require an initialization within a neighborhood of the ground truth. The size of the basin of convergence was determined with an explicit numerical constant only in [10] and Theorem IV.1. Various initialization methods with theoretical performance guarantees have been developed to obtain the desired initial estimate [13, 14]. The sample complexity for these initialization methods does not exceed those for the subsequent estimators in order.

Next, we discuss the computational costs for the robust estimators. First, RobustPhaseMax is formulated as a linear program and thus it can be exactly solved with 𝒪~((m+d)2.38log(1/ϵ))~𝒪superscript𝑚𝑑2.381italic-ϵ\widetilde{\mathcal{O}}((m+d)^{2.38}\log(1/\epsilon))over~ start_ARG caligraphic_O end_ARG ( ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2.38 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ ) ) multiplications by derandomized algorithm [19]. Furthermore, as we discussed in Section III-B, there exists an ADMM algorithm for the linear program that costs 𝒪(m3+(m+d)2log(1/ϵ))𝒪superscript𝑚3superscript𝑚𝑑21italic-ϵ\mathcal{O}(m^{3}+(m+d)^{2}\log(1/\epsilon))caligraphic_O ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ ) ) for an ϵitalic-ϵ\epsilonitalic_ϵ-accurate solution. Due to the term log(1/ϵ)1italic-ϵ\log(1/\epsilon)roman_log ( 1 / italic_ϵ ), if the desired accuracy decreases in proportion to the size of the problem, it is preferable to use ADMM. Otherwise, the derandomized algorithm will be computationally efficient. The other estimators are given as an iterative algorithm with a proven convergence rate. Therefore, we compare their computational costs to obtain an ϵitalic-ϵ\epsilonitalic_ϵ-accurate solution. Median-RWF is a truncated gradient descent with the per-iteration cost of 𝒪(md)𝒪𝑚𝑑\mathcal{O}(md)caligraphic_O ( italic_m italic_d ). Since the linear convergence of Median-RWF has been established, the total cost is 𝒪(mdlog(1/ϵ))𝒪𝑚𝑑1italic-ϵ\mathcal{O}(md\log(1/\epsilon))caligraphic_O ( italic_m italic_d roman_log ( 1 / italic_ϵ ) ). Unlike Median-RWF, the updates in prox-linear and Robust-AM involve a nontrivial inner optimization, respectively cast as a quadratic program and a linear program. One may use an exact solver for these sub-problems. For example, there exists an interior point method for quadratic programs with the cost 𝒪((m+d)4)𝒪superscript𝑚𝑑4\mathcal{O}((m+d)^{4})caligraphic_O ( ( italic_m + italic_d ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) [27]. Since it has been shown that prox-linear converges quadratically, the total cost with this exact inner solver is 𝒪((m+d)4)loglog(1/ϵ)𝒪superscript𝑚𝑑41italic-ϵ\mathcal{O}((m+d)^{4})\log\log(1/\epsilon)caligraphic_O ( ( italic_m + italic_d ) start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) roman_log roman_log ( 1 / italic_ϵ ). The inner optimization in Robust-AM can be exactly solved at the cost 𝒪~((m+d)2.38log(1/ϵ))~𝒪superscript𝑚𝑑2.381italic-ϵ\widetilde{\mathcal{O}}((m+d)^{2.38}\log(1/\epsilon))over~ start_ARG caligraphic_O end_ARG ( ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2.38 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ ) ) by the derandomized algorithm [19]. Due to its linear convergence, the total cost of Robust-AM is 𝒪~((m+d)2.38log(1/ϵ))~𝒪superscript𝑚𝑑2.381italic-ϵ\widetilde{\mathcal{O}}((m+d)^{2.38}\log(1/\epsilon))over~ start_ARG caligraphic_O end_ARG ( ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2.38 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ ) ). However, as shown in Theorem IV.1, the linear convergence of Robust-AM remains valid when the inner optimization problems are solved only approximately. The fast Robust-AM with the ADMM solver for linear programs has the per-iteration cost of 𝒪(m3+(m+d)2log(1/ϵmax))𝒪superscript𝑚3superscript𝑚𝑑21subscriptitalic-ϵ\mathcal{O}(m^{3}+(m+d)^{2}\log(1/\epsilon_{\max}))caligraphic_O ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ) ) as shown in Section III. Due to its linear convergence in Theorem IV.1, the total cost to obtain the ϵ+ληϵmaxitalic-ϵsubscript𝜆𝜂subscriptitalic-ϵ\epsilon+\lambda_{\eta}\epsilon_{\max}italic_ϵ + italic_λ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT accuracy is 𝒪(m3+(m+d)2log(1/ϵmax)log(1/ϵ))𝒪superscript𝑚3superscript𝑚𝑑21subscriptitalic-ϵ1italic-ϵ\mathcal{O}(m^{3}+(m+d)^{2}\log(1/\epsilon_{\max})\log(1/\epsilon))caligraphic_O ( italic_m start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_m + italic_d ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_log ( 1 / italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ) roman_log ( 1 / italic_ϵ ) ). In contrast, the convergence rate of POGS for the inner optimization in prox-linear has not been established. We summarize the comparison for the computational costs of algorithms in Table I.

Lastly, we elaborate on the dependence of the parameters νηsubscript𝜈𝜂\nu_{\eta}italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT and ληsubscript𝜆𝜂\lambda_{\eta}italic_λ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT in Theorem IV.1 on the outlier ratio η𝜂\etaitalic_η. The linear convergence parameter νηsubscript𝜈𝜂\nu_{\eta}italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT in (15) is explicitly specified as an increasing function of η𝜂\etaitalic_η in the proof of Theorem IV.1 and illustrated in Figure 2(a). Therefore, smaller η𝜂\etaitalic_η implies faster convergence. The final error bound by (15) with k𝑘kitalic_k going to infinity is given as the amplification of the sub-optimality parameter ϵmaxsubscriptitalic-ϵ\epsilon_{\max}italic_ϵ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT in the inner optimization by a factor of ληsubscript𝜆𝜂\lambda_{\eta}italic_λ start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT. First, similar to νηsubscript𝜈𝜂\nu_{\eta}italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT, the parameter is also explicitly given as an increasing function of η𝜂\etaitalic_η in the proof (see Figure 2(b)). However, the final estimation can still be sufficiently small, as one can set the accuracy parameter to a sufficiently low value (less than 1010superscript101010^{-10}10 start_POSTSUPERSCRIPT - 10 end_POSTSUPERSCRIPT) using linear program packages in readily available software such as CPLEX and Gurobi. Hence, the assumption on {ϵi}i=1ksuperscriptsubscriptsubscriptitalic-ϵ𝑖𝑖1𝑘\{\epsilon_{i}\}_{i=1}^{k}{ italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT in Theorem IV.1 is easily satisfied.

Refer to caption
(a)
Refer to caption
(b)
Figure 2: The dependence of parameters ηnsubscript𝜂𝑛\eta_{n}italic_η start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and λnsubscript𝜆𝑛\lambda_{n}italic_λ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT in Theorem IV.1 on the outlier fraction η𝜂\etaitalic_η.

V Numerical Results

This section compares the empirical performances of Robust-AM to its theoretical analysis in Theorem IV.1. Robust-AM is also compared against the competing methods for robust phase retrieval, which are RobustPhaseMax, Median-RWF, and the prox-linear. Recall that all these methods require an initial estimate. For this purpose, we adopt the spectral method by Zhang et al. [13].

V-A Synthetic data experiments

First, through experiments on synthetic data, we show that the numerical results corroborate our theoretical findings in Theorem IV.1 and Robust-AM outperforms the competing methods. In this experiment, the measurement vectors are generated so that {𝒂i}i=1mi.i.d.Normal(𝟎,𝑰d)\{\bm{a}_{i}\}_{i=1}^{m}\overset{i.i.d.}{\sim}\mathrm{Normal}(\bm{0},\bm{I}_{d}){ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_OVERACCENT italic_i . italic_i . italic_d . end_OVERACCENT start_ARG ∼ end_ARG roman_Normal ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) by following the assumptions in Theorem IV.1 and analogous theoretical analyses of the other methods. The ground-truth signal is generated as 𝒙Normal(𝟎,𝑰d)similar-tosubscript𝒙Normal0subscript𝑰𝑑\bm{x}_{\star}\sim\mathrm{Normal}(\bm{0},\bm{I}_{d})bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∼ roman_Normal ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) independently from the measurement vectors. The outlier support is randomly selected following the uniform distribution on all possible subsets Iout[m]subscript𝐼outdelimited-[]𝑚I_{\mathrm{out}}\subset[m]italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT ⊂ [ italic_m ] of size ηm𝜂𝑚\eta mitalic_η italic_m.

Refer to caption
Figure 3: Phase transition of empirical success rate by Robust-AM per the number of measurements m𝑚mitalic_m and the dimension d𝑑ditalic_d.
Refer to captionRefer to caption
Refer to captionRefer to caption
(a) Cauchy distribution
Refer to captionRefer to caption
Refer to captionRefer to caption
(b) uniform distribution
Refer to captionRefer to caption
Refer to captionRefer to caption
(c) zero
Figure 4: Phase transition of success rate per the measurement ratio m/d𝑚𝑑m/ditalic_m / italic_d and the fraction of outliers η𝜂\etaitalic_η for various outlier magnitude models. Subfigures are displayed according to RobustPhaseMax (top-left), Median-RWF (top-right), prox-linear method (bottom-left), and Robust-AM (bottom-right).

Figure 3 shows the phase transition of the empirical success rate by Robust-AM through Monte Carlo simulations, where the outlier values are i.i.d. following the Cauchy distribution with median 00 and mean-absolute-deviation 1111. The fraction of outliers is fixed to η=0.25𝜂0.25\eta=0.25italic_η = 0.25 Recall that the performance guarantee in Theorem IV.1 applies uniformly to all ground-truth signals. To observe the empirical performance in an analogous setting, we design the experiment as follows: 1) Generate 20202020 sets of random measurement vectors {𝒂i}i=1msuperscriptsubscriptsubscript𝒂𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Generate 30303030 sets of random ground-truth 𝒙subscript𝒙\bm{x}_{\star}bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT; 2) For each fixed {𝒂i}i=1msuperscriptsubscriptsubscript𝒂𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, success is declared if the estimator recovers all 30303030 ground-truth signals by satisfying dist(𝒙^,𝒙)103dist^𝒙subscript𝒙superscript103\mathrm{dist}(\widehat{\bm{x}},\bm{x}_{\star})\leq 10^{-3}roman_dist ( over^ start_ARG bold_italic_x end_ARG , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ≤ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT where 𝒙^^𝒙\widehat{\bm{x}}over^ start_ARG bold_italic_x end_ARG denotes the estimate; 3) The empirical success rate is calculated on the outcomes from 20202020 distinct sets of measurement vectors. The transition occurs at the boundary where the number of measurements is proportional to the ambient dimension (signal length). This empirical result corroborates our theoretical finding in Theorem IV.1. Next, we repeat the same experiment on RobustPhaseMax, Median-RWF, and the prox-linear. Figure 4(a) compares the empirical performance of Robust-AM against RobustPhaseMax, Median-RWF, and the prox-linear by displaying the phase transition of these methods for a range of the outlier fraction η𝜂\etaitalic_η in this setting. The ambient dimension is set to d=100𝑑100d=100italic_d = 100. Figure 4(a) shows that Robust-AM outperforms all the other methods with a significantly lower threshold for the phase transition. We further expand the comparison to other models for outlier values. The second scenario draws ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from the uniform distribution on (d𝒙22/2,d𝒙22/2)𝑑superscriptsubscriptnormsubscript𝒙222𝑑superscriptsubscriptnormsubscript𝒙222(-d\|\bm{x}_{\star}\|_{2}^{2}/2,d\|\bm{x}_{\star}\|_{2}^{2}/2)( - italic_d ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 , italic_d ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ). The third scenario sets ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to 00. As observed in Figures 4(b) and 4(c), similar trends appear in the other outlier models. RobustPhaseMax, while providing the strongest theoretical performance guarantee, shows the worst empirical performance in the comparison. There is no consistent dominance between Median-RWF and the prox-linear algorithm. Median-RWF outperforms the prox-linear in the second scenario, but the other way around in the other scenarios.

Refer to caption
(a) η=0.1𝜂0.1\eta=0.1italic_η = 0.1
Refer to caption
(b) η=0.2𝜂0.2\eta=0.2italic_η = 0.2
Refer to caption
(c) η=0.3𝜂0.3\eta=0.3italic_η = 0.3
Figure 5: Convergence of Robust-AM (blue) and the prox-linear (red) in the iteration count.

Next, we compare the convergence speed of Robust-AM and the prox-linear algorithm. In this experiment, the dimension parameters are set to m=1,500𝑚1500m=1,500italic_m = 1 , 500 and d=200𝑑200d=200italic_d = 200 where the values of outliers are zero. The outlier ratio varies over η{0.1,0.2,0.3}𝜂0.10.20.3\eta\in\{0.1,0.2,0.3\}italic_η ∈ { 0.1 , 0.2 , 0.3 }. Figure 5 illustrates how the log of dist(𝒙k,𝒙)distsubscript𝒙𝑘subscript𝒙\mathrm{dist}({\bm{x}_{k}},\bm{x}_{\star})roman_dist ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) decays over the iteration index k𝑘kitalic_k. The median over 10101010 trials is plotted. In their theoretical analyses, the prox-linear algorithm converges faster at a quadratic rate than the linear convergence of Robust-AM in Theorem IV.1. However, as shown in Figure 5, Robust-AM empirically converges faster than the prox-linear algorithm in the iteration count for all considered η𝜂\etaitalic_η. Moreover, Figure 5 illustrates that the number of iterations for Robust-AM increases as η𝜂\etaitalic_η increases. This implies that for each iteration, the convergence rate of Robust-AM is proportional to η𝜂\etaitalic_η. This supports our theoretical finding that the convergence parameter νηsubscript𝜈𝜂\nu_{\eta}italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT in Theorem IV.1 is an increasing function of η𝜂\etaitalic_η as shown in Figure 2(a).

V-B Real image experiments

Refer to caption
(a) Ground-truth
Refer to caption
(b) Recovered image by the prox-linear method
Refer to caption
(c) Recovered image by our method
Figure 6: Example of recovery for an image data.

We further apply Robust-AM to a set of image data to show that Robust-AM continues outperforming the other competing methods for non-Gaussian measurement models. We adopt the structured random measurement model in the experimental setting in [14, Section 6.3] given by

𝑨H=(𝑰k𝑯n)[𝑺1,𝑺2,,𝑺k]Tkn×n,subscript𝑨Htensor-productsubscript𝑰𝑘subscript𝑯𝑛superscriptsubscript𝑺1subscript𝑺2subscript𝑺𝑘Tsuperscript𝑘𝑛𝑛\bm{A}_{\mathrm{H}}=(\bm{I}_{k}\otimes\bm{H}_{n})[\bm{S}_{1},\bm{S}_{2},\cdots% ,\bm{S}_{k}]^{\scriptscriptstyle{\textup{{T}}}}\in\mathbb{R}^{kn\times n},bold_italic_A start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT = ( bold_italic_I start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⊗ bold_italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) [ bold_italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_k italic_n × italic_n end_POSTSUPERSCRIPT , (16)

where 𝑯nn×nsubscript𝑯𝑛superscript𝑛𝑛\bm{H}_{n}\in\mathbb{R}^{n\times n}bold_italic_H start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT denotes the normalized Hadamard matrix and 𝑺1,𝑺kn×nsubscript𝑺1subscript𝑺𝑘superscript𝑛𝑛\bm{S}_{1},\ldots\bm{S}_{k}\in\mathbb{R}^{n\times n}bold_italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … bold_italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT are diagonal matrices whose diagonal entries are independently drawn uniformly random from {±1}plus-or-minus1\{\pm 1\}{ ± 1 }. The measurement vector 𝒂isubscript𝒂𝑖\bm{a}_{i}bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_i-th column of 𝑨HTsuperscriptsubscript𝑨HT\bm{A}_{\mathrm{H}}^{\scriptscriptstyle{\textup{{T}}}}bold_italic_A start_POSTSUBSCRIPT roman_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT for i[m]𝑖delimited-[]𝑚i\in[m]italic_i ∈ [ italic_m ], where m=kn𝑚𝑘𝑛m=knitalic_m = italic_k italic_n. The linear measurement operator in (16) applies to the vectorized version of a 2D input image 𝑿n1×n2subscript𝑿superscriptsubscript𝑛1subscript𝑛2\bm{X}_{\star}\in\mathbb{R}^{n_{1}\times n_{2}}bold_italic_X start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denoted by 𝒙:=Vec(𝑿)nassignsubscript𝒙Vecsubscript𝑿superscript𝑛\bm{x}_{\star}:=\mathrm{Vec}(\bm{X}_{\star})\in\mathbb{R}^{n}bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT := roman_Vec ( bold_italic_X start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT with n=n1×n2𝑛subscript𝑛1subscript𝑛2n=n_{1}\times n_{2}italic_n = italic_n start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The measurements corresponding to outliers are substituted by zero in the experiment.

Refer to captionRefer to caption
Refer to captionRefer to caption
Figure 7: Phase transition of success rate per k𝑘kitalic_k and the fraction of outliers η𝜂\etaitalic_η for zero outlier magnitude models. Subfigures are displayed according to RobustPhaseMax (top-left), Median-RWF (top-right), prox-linear method (bottom-left), and Robust-AM (bottom-right).

Robust-AM and the competing algorithms are tested on the collection of 50505050 images of handwritten digits111https://hastie.su.domains/ElemStatLearn/datasets/zip.digits. Figure 7 compares the two methods in the empirical success rate over 50505050 images, where the number of random modulations k𝑘kitalic_k and the outlier fraction η𝜂\etaitalic_η respectively vary over k{1,,12}𝑘112k\in\{1,\ldots,12\}italic_k ∈ { 1 , … , 12 } and η[0,0.4]𝜂00.4\eta\in[0,0.4]italic_η ∈ [ 0 , 0.4 ]. Similar to the previous experiments on synthetic data, Figure 7 demonstrates that Robust-AM outperforms the competing algorithms by providing recovery with smaller k𝑘kitalic_k for each observed η𝜂\etaitalic_η. Since the algorithmic parameters of Median-RWF are specifically selected for Gaussian measurements in [13], we heuristically tuned the step size to 0.20.20.20.2 so that Median-RWF performs for the measurement setting (16).

VI Proof of Theorem IV.1

We first prove by the induction on the iteration index j𝑗jitalic_j that

dist(𝒙j,𝒙)νηdist(𝒙j1,𝒙)+ϵj1Cηdistsubscript𝒙𝑗subscript𝒙subscript𝜈𝜂distsubscript𝒙𝑗1subscript𝒙subscriptitalic-ϵ𝑗1subscript𝐶𝜂\mathrm{dist}\left(\bm{x}_{j},\bm{x}_{\star}\right)\leq\nu_{\eta}\cdot\mathrm{% dist}\left(\bm{x}_{j-1},\bm{x}_{\star}\right)+\frac{\epsilon_{j-1}}{C_{\eta}}roman_dist ( bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ≤ italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ⋅ roman_dist ( bold_italic_x start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) + divide start_ARG italic_ϵ start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_ARG (17)

holds for all j𝑗j\in\mathbb{N}italic_j ∈ blackboard_N for some numerical constant νη(0,1)subscript𝜈𝜂01\nu_{\eta}\in(0,1)italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ∈ ( 0 , 1 ) and Cη>0subscript𝐶𝜂0C_{\eta}>0italic_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT > 0 depending only on η𝜂\etaitalic_η. Let k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N be arbitrarily fixed. Suppose that 𝒙jsubscript𝒙𝑗\bm{x}_{j}bold_italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT satisfies (17) for all jk𝑗𝑘j\leq kitalic_j ≤ italic_k. Note that the distance between 𝒙𝒙\bm{x}bold_italic_x and 𝒙subscript𝒙\bm{x}_{\star}bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is written as

dist(𝒙,𝒙)=𝒙φ(𝒙)𝒙2,dist𝒙subscript𝒙subscriptnorm𝒙𝜑𝒙subscript𝒙2\mathrm{dist}(\bm{x},\bm{x}_{\star})=\|\bm{x}-\varphi(\bm{x})\bm{x}_{\star}\|_% {2},roman_dist ( bold_italic_x , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) = ∥ bold_italic_x - italic_φ ( bold_italic_x ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (18)

where

φ(𝒙):=argminα{±1}𝒙α𝒙2.\varphi(\bm{x}):=\operatorname*{argmin}_{\alpha\in\{\pm 1\}}\left\lVert\bm{x}-% \alpha\bm{x}_{\star}\right\rVert_{2}.italic_φ ( bold_italic_x ) := roman_argmin start_POSTSUBSCRIPT italic_α ∈ { ± 1 } end_POSTSUBSCRIPT ∥ bold_italic_x - italic_α bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

Then we have dist(𝒙k+1,𝒙)𝒙k+1φ(𝒙k)𝒙2distsubscript𝒙𝑘1subscript𝒙subscriptnormsubscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙2\mathrm{dist}\left(\bm{x}_{k+1},\bm{x}_{\star}\right){\leq}\|\bm{x}_{k+1}-% \varphi(\bm{x}_{k})\bm{x}_{\star}\|_{2}roman_dist ( bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ≤ ∥ bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and dist(𝒙k,𝒙)=𝒙kφ(𝒙k)𝒙2distsubscript𝒙𝑘subscript𝒙subscriptnormsubscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙2\mathrm{dist}(\bm{x}_{k},\bm{x}_{\star})=\|\bm{x}_{k}-\varphi(\bm{x}_{k})\bm{x% }_{\star}\|_{2}roman_dist ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) = ∥ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Therefore, it follows that

𝒙k+1φ(𝒙k)𝒙2νη𝒙kφ(𝒙k)𝒙2+ϵkCηsubscriptnormsubscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙2subscript𝜈𝜂subscriptnormsubscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙2subscriptitalic-ϵ𝑘subscript𝐶𝜂\|\bm{x}_{k+1}-\varphi(\bm{x}_{k})\bm{x}_{\star}\|_{2}\leq\nu_{\eta}\|\bm{x}_{% k}-\varphi(\bm{x}_{k})\bm{x}_{\star}\|_{2}+\frac{\epsilon_{k}}{C_{\eta}}∥ bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + divide start_ARG italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_ARG (19)

implies (17) for j=k+1𝑗𝑘1j=k+1italic_j = italic_k + 1. This completes the induction argument.

Therefore, it suffices to show that the hypothesis of the theorem implies (19). For the sake of brevity, we denote the objective function of the optimization formulation in (7) by

f𝒙k(𝒙)subscript𝑓subscript𝒙𝑘𝒙\displaystyle f_{\bm{x}_{k}}(\bm{x})italic_f start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_x ) =1mi=1m|sign(𝒂i,𝒙k)𝒂i,𝒙bi|.absent1𝑚superscriptsubscript𝑖1𝑚signsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖𝒙subscript𝑏𝑖\displaystyle=\frac{1}{m}\sum_{i=1}^{m}\left|\mathrm{sign}\left(\langle\bm{a}_% {i},\bm{x}_{k}\rangle\right)\langle\bm{a}_{i},\bm{x}\rangle-b_{i}\right|.= divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x ⟩ - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | .

Then (13) provides

f𝒙k(𝒙k+1)(A)f𝒙k(φ(𝒙k)𝒙)(B)+ϵk.subscriptsubscript𝑓subscript𝒙𝑘subscript𝒙𝑘1Asubscriptsubscript𝑓subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙Bsubscriptitalic-ϵ𝑘\underbrace{f_{\bm{x}_{k}}(\bm{x}_{k+1})}_{\mathrm{(A)}}\leq\underbrace{f_{\bm% {x}_{k}}(\varphi(\bm{x}_{k})\bm{x}_{\star})}_{\mathrm{(B)}}+\epsilon_{k}.under⏟ start_ARG italic_f start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT ( roman_A ) end_POSTSUBSCRIPT ≤ under⏟ start_ARG italic_f start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT ( roman_B ) end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT . (20)

Next, we derive a lower bound (resp. an upper bound) on (A) (resp. (B)) of (20). By from the definition of bisubscript𝑏𝑖b_{i}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (2), (A) is written as

(A)A\displaystyle\mathrm{(A)}( roman_A ) =1mi=1m|sign(𝒂i,𝒙k)𝒂i,𝒙k+1bi|absent1𝑚superscriptsubscript𝑖1𝑚signsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖subscript𝒙𝑘1subscript𝑏𝑖\displaystyle=\frac{1}{m}\sum_{i=1}^{m}\left|\mathrm{sign}\left(\langle\bm{a}_% {i},\bm{x}_{k}\rangle\right)\langle\bm{a}_{i},\bm{x}_{k+1}\rangle-b_{i}\right|= divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⟩ - italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | (21)
=1miIin|sign(𝒂i,𝒙k)𝒂i,𝒙k+1|𝒂i,φ(𝒙k)𝒙||(a)absentsubscript1𝑚subscript𝑖subscript𝐼insignsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖subscript𝒙𝑘1subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙a\displaystyle=\underbrace{\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}\left|\mathrm{% sign}(\langle\bm{a}_{i},\bm{x}_{k}\rangle)\langle\bm{a}_{i},\bm{x}_{k+1}% \rangle-|\langle\bm{a}_{i},\varphi(\bm{x}_{k})\bm{x}_{\star}\rangle|\right|}_{% \mathrm{(a)}}= under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⟩ - | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | | end_ARG start_POSTSUBSCRIPT ( roman_a ) end_POSTSUBSCRIPT
+1miIout|sign(𝒂i,𝒙k)𝒂i,𝒙k+1ξi|.1𝑚subscript𝑖subscript𝐼outsignsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖subscript𝒙𝑘1subscript𝜉𝑖\displaystyle\quad+\frac{1}{m}\sum_{i\in I_{\mathrm{out}}}\left|\mathrm{sign}% \left(\langle\bm{a}_{i},\bm{x}_{k}\rangle\right)\langle\bm{a}_{i},\bm{x}_{k+1}% \rangle-\xi_{i}\right|.+ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⟩ - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | .

To simplify the partial summation over Iinsubscript𝐼inI_{\mathrm{in}}italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT, we introduce the spherical wedge [28] defined by

W𝒙,𝒛:={𝒗𝕊d1|sign(𝒗,𝒙)sign(𝒗,𝒛)}.assignsubscript𝑊𝒙𝒛conditional-set𝒗superscript𝕊𝑑1sign𝒗𝒙sign𝒗𝒛W_{\bm{x},\bm{z}}:=\{\bm{v}\in\mathbb{S}^{d-1}\,|\,\mathrm{sign}(\langle\bm{v}% ,\bm{x}\rangle)\neq\mathrm{sign}(\langle\bm{v},\bm{z}\rangle)\}.italic_W start_POSTSUBSCRIPT bold_italic_x , bold_italic_z end_POSTSUBSCRIPT := { bold_italic_v ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT | roman_sign ( ⟨ bold_italic_v , bold_italic_x ⟩ ) ≠ roman_sign ( ⟨ bold_italic_v , bold_italic_z ⟩ ) } . (22)

Then if follows that 𝒂i,φ(𝒙k)𝒙subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙\langle\bm{a}_{i},\varphi(\bm{x}_{k})\bm{x}_{\star}\rangle⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ and 𝒂i,𝒙ksubscript𝒂𝑖subscript𝒙𝑘\langle\bm{a}_{i},\bm{x}_{k}\rangle⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ have the opposite sign if and only if 𝒂iW𝒙k,φ(𝒙k)𝒙subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙\bm{a}_{i}\in W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Therefore, the summand in (a) is rewritten as

(a)a\displaystyle\mathrm{(a)}( roman_a ) =1miIin𝟙{𝒂iW𝒙k,φ(𝒙k)𝒙}|𝒂i,𝒙k+1+φ(𝒙k)𝒙|absent1𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖subscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙\displaystyle=\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{\bm{a}_{i}% \in W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}\}}\left|\langle\bm{a}_{i}% ,\bm{x}_{k+1}+\varphi(\bm{x}_{k})\bm{x}_{\star}\rangle\right|= divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT + italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ |
+1miIin𝟙{𝒂iW𝒙k,φ(𝒙k)𝒙}|𝒂i,𝒙k+1φ(𝒙k)𝒙|.1𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖subscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙\displaystyle\quad+\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{\bm{a}_% {i}\notin W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}\}}\left|\langle\bm{% a}_{i},\bm{x}_{k+1}-\varphi(\bm{x}_{k})\bm{x}_{\star}\rangle\right|.+ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | .

The second summand on the right-hand side provides a valid lower bound on (a) since the other summand is nonnegative. Combining the above results, we obtain that

(A)A\displaystyle\mathrm{(A)}( roman_A ) 1miIin𝟙{𝒂iW𝒙k,φ(𝒙k)𝒙}|𝒂i,𝒙k+1φ(𝒙k)𝒙|absent1𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖subscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙\displaystyle\geq\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{\bm{a}_{i% }\notin W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}\}}\left|\langle\bm{a}% _{i},\bm{x}_{k+1}-\varphi(\bm{x}_{k})\bm{x}_{\star}\rangle\right|≥ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | (23)
+1miIout|sign(𝒂i,𝒙k)𝒂i,𝒙k+1ξi|.1𝑚subscript𝑖subscript𝐼outsignsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖subscript𝒙𝑘1subscript𝜉𝑖\displaystyle\quad+\frac{1}{m}\sum_{i\in I_{\mathrm{out}}}\left|\mathrm{sign}% \left(\langle\bm{a}_{i},\bm{x}_{k}\rangle\right)\langle\bm{a}_{i},\bm{x}_{k+1}% \rangle-\xi_{i}\right|.+ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⟩ - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | .

Similarly, (B) is written as

(B)B\displaystyle\mathrm{(B)}( roman_B ) =1miIin|sign(𝒂i,𝒙k)𝒂i,φ(𝒙k)𝒙|𝒂i,φ(𝒙k)𝒙||(b)absent1𝑚subscript𝑖subscript𝐼insubscriptsignsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙b\displaystyle{=}\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}\underbrace{\left|% \mathrm{sign}(\langle\bm{a}_{i},\bm{x}_{k}\rangle)\langle\bm{a}_{i},\varphi(% \bm{x}_{k})\bm{x}_{\star}\rangle-|\langle\bm{a}_{i},\varphi(\bm{x}_{k})\bm{x}_% {\star}\rangle|\right|}_{\mathrm{(b)}}= divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT under⏟ start_ARG | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ - | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | | end_ARG start_POSTSUBSCRIPT ( roman_b ) end_POSTSUBSCRIPT
+1miIout|sign(𝒂i,𝒙k)𝒂i,φ(𝒙k)𝒙ξi|.1𝑚subscript𝑖subscript𝐼outsignsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙subscript𝜉𝑖\displaystyle\quad+\frac{1}{m}\sum_{i\in I_{\mathrm{out}}}\left|\mathrm{sign}(% \langle\bm{a}_{i},\bm{x}_{k}\rangle)\langle\bm{a}_{i},\varphi(\bm{x}_{k})\bm{x% }_{\star}\rangle-\xi_{i}\right|.+ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | .

If 𝒂iW𝒙k,φ(𝒙k)𝒙subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙\bm{a}_{i}\in W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT, then 𝒂i,𝒙ksubscript𝒂𝑖subscript𝒙𝑘\langle\bm{a}_{i},\bm{x}_{k}\rangle⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ and 𝒂i,φ(𝒙k)𝒙subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙\langle\bm{a}_{i},\varphi(\bm{x}_{k})\bm{x}_{\star}\rangle⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ have the opposite sign and hence (b) satisfies

(b)b\displaystyle\mathrm{(b)}( roman_b ) =2|𝒂i,𝒙|2|𝒂i,φ(𝒙k)𝒙𝒙k|.absent2subscript𝒂𝑖subscript𝒙2subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙subscript𝒙𝑘\displaystyle=2\left|\langle\bm{a}_{i},\bm{x}_{\star}\rangle\right|\leq 2\left% |\langle\bm{a}_{i},\varphi(\bm{x}_{k})\bm{x}_{\star}-\bm{x}_{k}\rangle\right|.= 2 | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | ≤ 2 | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ | .

Otherwise, if 𝒂iW𝒙k,φ(𝒙k)𝒙subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙\bm{a}_{i}\not\in W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT, then (b)=0b0\mathrm{(b)}=0( roman_b ) = 0. Therefore, we have

(B)B\displaystyle\mathrm{(B)}( roman_B ) 2miIin𝟙{𝒂iW𝒙k,φ(𝒙k)𝒙}|𝒂i,φ(𝒙k)𝒙𝒙k|absent2𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙subscript𝒙𝑘\displaystyle\leq\frac{2}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{\bm{a}_{i% }\in W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}\}}\left|\langle\bm{a}_{i% },\varphi(\bm{x}_{k})\bm{x}_{\star}-\bm{x}_{k}\rangle\right|≤ divide start_ARG 2 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ | (24)
+1miIout|sign(𝒂i,𝒙k)𝒂i,φ(𝒙k)𝒙ξi|.1𝑚subscript𝑖subscript𝐼outsignsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙subscript𝜉𝑖\displaystyle\quad+\frac{1}{m}\sum_{i\in I_{\mathrm{out}}}\left|\mathrm{sign}(% \langle\bm{a}_{i},\bm{x}_{k}\rangle)\langle\bm{a}_{i},\varphi(\bm{x}_{k})\bm{x% }_{\star}\rangle-\xi_{i}\right|.+ divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | .

By plugging in the bounds of (23) and (24) into (20), we obtain that (20) implies

1miIin𝟙{𝒂iW𝒙k,φ(𝒙k)𝒙}|𝒂i,𝒙k+1φ(𝒙k)𝒙|1𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖subscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙\displaystyle\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{\bm{a}_{i}% \notin W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}\}}\left|\langle\bm{a}_% {i},\bm{x}_{k+1}-\varphi(\bm{x}_{k})\bm{x}_{\star}\rangle\right|divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | (25)
+1miIout|sign(𝒂i,𝒙k)𝒂i,𝒙k+1ξi|()subscript1𝑚subscript𝑖subscript𝐼outsignsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖subscript𝒙𝑘1subscript𝜉𝑖\displaystyle\qquad+\underbrace{\frac{1}{m}\sum_{i\in I_{\mathrm{out}}}\left|% \mathrm{sign}\left(\langle\bm{a}_{i},\bm{x}_{k}\rangle\right)\langle\bm{a}_{i}% ,\bm{x}_{k+1}\rangle-\xi_{i}\right|}_{(*)}+ under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ⟩ - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | end_ARG start_POSTSUBSCRIPT ( ∗ ) end_POSTSUBSCRIPT
1miIout|sign(𝒂i,𝒙k)𝒂i,φ(𝒙k)𝒙ξi|()subscript1𝑚subscript𝑖subscript𝐼outsignsubscript𝒂𝑖subscript𝒙𝑘subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙subscript𝜉𝑖absent\displaystyle\quad\quad-\underbrace{\frac{1}{m}\sum_{i\in I_{\mathrm{out}}}% \left|\mathrm{sign}(\langle\bm{a}_{i},\bm{x}_{k}\rangle)\langle\bm{a}_{i},% \varphi(\bm{x}_{k})\bm{x}_{\star}\rangle-\xi_{i}\right|}_{(**)}- under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | roman_sign ( ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ ) ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | end_ARG start_POSTSUBSCRIPT ( ∗ ∗ ) end_POSTSUBSCRIPT
2miIin𝟙{𝒂iW𝒙k,φ(𝒙k)𝒙}|𝒂i,φ(𝒙k)𝒙𝒙k|+ϵk.absent2𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙subscript𝒙𝑘subscriptitalic-ϵ𝑘\displaystyle\leq\frac{2}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{\bm{a}_{i% }\in W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}\}}\left|\langle\bm{a}_{i% },\varphi(\bm{x}_{k})\bm{x}_{\star}-\bm{x}_{k}\rangle\right|+\epsilon_{k}.≤ divide start_ARG 2 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ | + italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .

By applying the triangle inequality to the summands in ()(*)( ∗ ) and ()(**)( ∗ ∗ ), we obtain a necessary condition of (25) given by

1miIin𝟙{𝒂iW𝒙k,φ(𝒙k)𝒙}|𝒂i,𝒙k+1φ(𝒙k)𝒙|(c)subscript1𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖subscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙c\displaystyle\underbrace{\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{% \bm{a}_{i}\notin W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}\}}\left|% \langle\bm{a}_{i},\bm{x}_{k+1}-\varphi(\bm{x}_{k})\bm{x}_{\star}\rangle\right|% }_{\mathrm{(c)}}under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | end_ARG start_POSTSUBSCRIPT ( roman_c ) end_POSTSUBSCRIPT (26)
1miIout|𝒂i,𝒙k+1φ(𝒙k)𝒙|(d)subscript1𝑚subscript𝑖subscript𝐼outsubscript𝒂𝑖subscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙d\displaystyle\qquad\qquad\qquad\qquad-\underbrace{{\frac{1}{m}\sum_{i\in I_{% \mathrm{out}}}\left|\langle\bm{a}_{i},\bm{x}_{k+1}-\varphi(\bm{x}_{k})\bm{x}_{% \star}\rangle\right|}}_{\mathrm{(d)}}- under⏟ start_ARG divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ⟩ | end_ARG start_POSTSUBSCRIPT ( roman_d ) end_POSTSUBSCRIPT
2miIin𝟙{𝒂iW𝒙k,φ(𝒙k)𝒙}|𝒂i,φ(𝒙k)𝒙𝒙k|(e)+ϵk.absentsubscript2𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙subscript𝒂𝑖𝜑subscript𝒙𝑘subscript𝒙subscript𝒙𝑘esubscriptitalic-ϵ𝑘\displaystyle\leq\underbrace{\frac{2}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_% {\{\bm{a}_{i}\in W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}\}}\left|% \langle\bm{a}_{i},\varphi(\bm{x}_{k})\bm{x}_{\star}-\bm{x}_{k}\rangle\right|}_% {\mathrm{(e)}}+\epsilon_{k}.≤ under⏟ start_ARG divide start_ARG 2 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ | end_ARG start_POSTSUBSCRIPT ( roman_e ) end_POSTSUBSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT .

We have shown that (20) implies (26). In the remainder of the proof, we demonstrate that if (26) is satisfied, then (19) holds with high probability. This is achieved by applying a probabilistic lower bound on (c) and probabilistic upper bounds on (d) and (e), using concentration inequalities.

To this end, note that the measurement vectors {𝒂i}i=1msuperscriptsubscriptsubscript𝒂𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT depend not only on the current iterate 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the next iterate 𝒙k+1subscript𝒙𝑘1\bm{x}_{k+1}bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, but also on the indication functions within the spherical wedge in (c) and (e). Therefore, we consider the uniform bounds for all iterates and the collection of spherical wedges with the largest angle less than θ(0,π)𝜃0𝜋\theta\in(0,\pi)italic_θ ∈ ( 0 , italic_π ). We introduce the corresponding lemmas below.

Lemma VI.1.

Let θ(0,π),η(0,1/2)formulae-sequence𝜃0𝜋𝜂012\theta\in(0,\pi),\eta\in(0,1/2)italic_θ ∈ ( 0 , italic_π ) , italic_η ∈ ( 0 , 1 / 2 ) and δ>0𝛿0\delta>0italic_δ > 0. Suppose that {𝐚i}i=1msuperscriptsubscriptsubscript𝐚𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are independent copies of 𝐠Normal(𝟎,𝐈d)similar-to𝐠Normal0subscript𝐈𝑑\bm{g}\sim\mathrm{Normal}(\bm{0},\bm{I}_{d})bold_italic_g ∼ roman_Normal ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). Let

𝒲θ:={W𝒙,𝒛:𝒙,𝒛d,(𝒙,𝒛)θ},assignsubscript𝒲𝜃conditional-setsubscript𝑊𝒙𝒛formulae-sequence𝒙𝒛superscript𝑑𝒙𝒛𝜃\mathcal{W}_{\theta}:=\left\{W_{\bm{x},\bm{z}}:\bm{x},\bm{z}\in\mathbb{R}^{d},% \angle\left(\bm{x},\bm{z}\right)\leq\theta\right\},caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT := { italic_W start_POSTSUBSCRIPT bold_italic_x , bold_italic_z end_POSTSUBSCRIPT : bold_italic_x , bold_italic_z ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , ∠ ( bold_italic_x , bold_italic_z ) ≤ italic_θ } , (27)

where W𝐱,𝐳subscript𝑊𝐱𝐳W_{\bm{x},\bm{z}}italic_W start_POSTSUBSCRIPT bold_italic_x , bold_italic_z end_POSTSUBSCRIPT is defined in (22). Then there exists an absolute constant C𝐶Citalic_C such that

infW𝒲θ𝒛𝕊d11miIin𝟙{𝒂iW}|𝒂i,𝒛|(1η)2πsubscriptinfimum𝑊subscript𝒲𝜃𝒛superscript𝕊𝑑11𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖𝑊subscript𝒂𝑖𝒛1𝜂2𝜋\displaystyle\inf_{\begin{subarray}{l}W\in\mathcal{W}_{\theta}\\ \bm{z}\in\mathbb{S}^{d-1}\end{subarray}}{\frac{1}{m}\sum_{i\in I_{\mathrm{in}}% }\mathbb{1}_{\{\bm{a}_{i}\notin W\}}\left|\langle\bm{a}_{i},\bm{z}\rangle% \right|}\geq(1-\eta)\sqrt{\frac{2}{\pi}}roman_inf start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_W } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | ≥ ( 1 - italic_η ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG
2θπ(2π+2log(eπ2θ))θ20(2θπ+1),2𝜃𝜋2𝜋2𝑒𝜋2𝜃𝜃202𝜃𝜋1\displaystyle-\frac{2\theta}{\pi}\left(\sqrt{\frac{2}{\pi}}+\sqrt{2\log\left(% \frac{e\pi}{2\theta}\right)}\right)-\frac{\theta}{20}\left(\sqrt{\frac{2\theta% }{\pi}}+1\right),- divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG ( square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG + square-root start_ARG 2 roman_log ( divide start_ARG italic_e italic_π end_ARG start_ARG 2 italic_θ end_ARG ) end_ARG ) - divide start_ARG italic_θ end_ARG start_ARG 20 end_ARG ( square-root start_ARG divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG end_ARG + 1 ) , (28)
sup𝒛𝕊d11miIout|𝒂i,𝒛|ηπ2+ηθ20,subscriptsupremum𝒛superscript𝕊𝑑11𝑚subscript𝑖subscript𝐼outsubscript𝒂𝑖𝒛𝜂𝜋2𝜂𝜃20\displaystyle\sup_{\bm{z}\in\mathbb{S}^{d-1}}\frac{1}{m}\sum_{i\in I_{\mathrm{% out}}}|\langle\bm{a}_{i},\bm{z}\rangle|\leq\eta\sqrt{\frac{\pi}{2}}+\sqrt{\eta% }\frac{\theta}{20},roman_sup start_POSTSUBSCRIPT bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | ≤ italic_η square-root start_ARG divide start_ARG italic_π end_ARG start_ARG 2 end_ARG end_ARG + square-root start_ARG italic_η end_ARG divide start_ARG italic_θ end_ARG start_ARG 20 end_ARG , (29)
andand\displaystyle\mathrm{and}roman_and
supW𝒲θ𝒛𝕊d11miIin𝟙{𝒂iW}|𝒂i,𝒛|subscriptsupremum𝑊subscript𝒲𝜃𝒛superscript𝕊𝑑11𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖𝑊subscript𝒂𝑖𝒛\displaystyle\sup_{\begin{subarray}{l}W\in\mathcal{W}_{\theta}\\ \bm{z}\in\mathbb{S}^{d-1}\end{subarray}}\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}% \mathbb{1}_{\{\bm{a}_{i}\in W\}}\left|\langle\bm{a}_{i},\bm{z}\rangle\right|roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ |
2θπ(2π+2log(eπ2θ))+2θπθ20absent2𝜃𝜋2𝜋2𝑒𝜋2𝜃2𝜃𝜋𝜃20\displaystyle\leq\frac{2\theta}{\pi}\left(\sqrt{\frac{2}{\pi}}+\sqrt{2\log% \left(\frac{e\pi}{2\theta}\right)}\right)+\sqrt{\frac{2\theta}{\pi}}\cdot\frac% {\theta}{20}≤ divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG ( square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG + square-root start_ARG 2 roman_log ( divide start_ARG italic_e italic_π end_ARG start_ARG 2 italic_θ end_ARG ) end_ARG ) + square-root start_ARG divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG end_ARG ⋅ divide start_ARG italic_θ end_ARG start_ARG 20 end_ARG (30)

hold with probability at least 1δ1𝛿1-\delta1 - italic_δ provided that

mCθ2(dlog(m/d)log(1/δ)).𝑚𝐶superscript𝜃2𝑑𝑚𝑑1𝛿m\geq C\cdot\theta^{-2}\left(d\log(m/d)\vee\log(1/\delta)\right).italic_m ≥ italic_C ⋅ italic_θ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ( italic_d roman_log ( italic_m / italic_d ) ∨ roman_log ( 1 / italic_δ ) ) . (31)
Proof:

See Section VIII. ∎

Now we derive the largest angle for the spherical wedge W𝒙k,φ(𝒙k)𝒙subscript𝑊subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙W_{\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star}}italic_W start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Since the angle between 𝒙ksubscript𝒙𝑘\bm{x}_{k}bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and φ(𝒙k)𝒙𝜑subscript𝒙𝑘subscript𝒙\varphi(\bm{x}_{k})\bm{x}_{\star}italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT is always acute, we have

sin((𝒙k,φ(𝒙k)𝒙))=(𝑰d𝒙k𝒙kT𝒙k22)φ(𝒙k)𝒙𝒙2subscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙normsubscript𝑰𝑑subscript𝒙𝑘superscriptsubscript𝒙𝑘Tsuperscriptsubscriptnormsubscript𝒙𝑘22𝜑subscript𝒙𝑘subscript𝒙subscriptnormsubscript𝒙2\displaystyle\sin\left(\angle\left(\bm{x}_{k},\varphi(\bm{x}_{k})\bm{x}_{\star% }\right)\right)=\left\|\left(\bm{I}_{d}-\frac{\bm{x}_{k}\bm{x}_{k}^{% \scriptscriptstyle{\textup{{T}}}}}{\|\bm{x}_{k}\|_{2}^{2}}\right)\frac{\varphi% (\bm{x}_{k})\bm{x}_{\star}}{\|\bm{x}_{\star}\|_{2}}\right\|roman_sin ( ∠ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) ) = ∥ ( bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - divide start_ARG bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) divide start_ARG italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥ (32)
(𝑰d𝒙k𝒙kT𝒙k22)φ(𝒙k)𝒙𝒙k𝒙2absentnormsubscript𝑰𝑑subscript𝒙𝑘superscriptsubscript𝒙𝑘Tsuperscriptsubscriptnormsubscript𝒙𝑘22𝜑subscript𝒙𝑘subscript𝒙subscript𝒙𝑘subscriptnormsubscript𝒙2\displaystyle\quad\leq\left\|\left(\bm{I}_{d}-\frac{\bm{x}_{k}\bm{x}_{k}^{% \scriptscriptstyle{\textup{{T}}}}}{\|\bm{x}_{k}\|_{2}^{2}}\right)\frac{\varphi% (\bm{x}_{k})\bm{x}_{\star}-\bm{x}_{k}}{\|\bm{x}_{\star}\|_{2}}\right\|≤ ∥ ( bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - divide start_ARG bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT T end_POSTSUPERSCRIPT end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) divide start_ARG italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT - bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∥
(i)𝒙kφ(𝒙k)𝒙2𝒙2=dist(𝒙k,𝒙)𝒙2isubscriptnormsubscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙2subscriptnormsubscript𝒙2distsubscript𝒙𝑘subscript𝒙subscriptnormsubscript𝒙2\displaystyle\quad\overset{\mathrm{(i)}}{\leq}\frac{\|\bm{x}_{k}-\varphi(\bm{x% }_{k})\bm{x}_{\star}\|_{2}}{\|\bm{x}_{\star}\|_{2}}=\frac{\mathrm{dist}\left(% \bm{x}_{k},\bm{x}_{\star}\right)}{\|\bm{x}_{\star}\|_{2}}start_OVERACCENT ( roman_i ) end_OVERACCENT start_ARG ≤ end_ARG divide start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG = divide start_ARG roman_dist ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) end_ARG start_ARG ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG
(ii)sin(225),ii225\displaystyle\quad\overset{\mathrm{(ii)}}{\leq}\sin\left(\frac{2}{25}\right),start_OVERACCENT ( roman_ii ) end_OVERACCENT start_ARG ≤ end_ARG roman_sin ( divide start_ARG 2 end_ARG start_ARG 25 end_ARG ) ,

where (i) holds since the project operator is non-expansive; (ii) follows since the induction hypothesis implies

dist(𝒙k,𝒙)distsubscript𝒙𝑘subscript𝒙\displaystyle\mathrm{dist}\left(\bm{x}_{k},\bm{x}_{\star}\right)roman_dist ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT )
νηkdist(𝒙0,𝒙)+maxi[0:k1]ϵiCηt=0k1νηtabsentsuperscriptsubscript𝜈𝜂𝑘distsubscript𝒙0subscript𝒙subscript𝑖delimited-[]:0𝑘1subscriptitalic-ϵ𝑖subscript𝐶𝜂superscriptsubscript𝑡0𝑘1superscriptsubscript𝜈𝜂𝑡\displaystyle\leq\nu_{\eta}^{k}\cdot\mathrm{dist}\left(\bm{x}_{0},\bm{x}_{% \star}\right)+\frac{\max_{i\in[0:k-1]}\epsilon_{i}}{C_{\eta}}\sum_{t=0}^{k-1}% \nu_{\eta}^{t}≤ italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⋅ roman_dist ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) + divide start_ARG roman_max start_POSTSUBSCRIPT italic_i ∈ [ 0 : italic_k - 1 ] end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
νηkdist(𝒙0,𝒙)+(1νη)sin(225)𝒙2t=0k1νηtabsentsuperscriptsubscript𝜈𝜂𝑘distsubscript𝒙0subscript𝒙1subscript𝜈𝜂225subscriptnormsubscript𝒙2superscriptsubscript𝑡0𝑘1superscriptsubscript𝜈𝜂𝑡\displaystyle\leq\nu_{\eta}^{k}\cdot\mathrm{dist}\left(\bm{x}_{0},\bm{x}_{% \star}\right)+(1-\nu_{\eta})\sin\left(\frac{2}{25}\right)\|\bm{x}_{\star}\|_{2% }\sum_{t=0}^{k-1}\nu_{\eta}^{t}≤ italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ⋅ roman_dist ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ) + ( 1 - italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ) roman_sin ( divide start_ARG 2 end_ARG start_ARG 25 end_ARG ) ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
sin(225)𝒙2,absent225subscriptnormsubscript𝒙2\displaystyle\leq\sin\left(\frac{2}{25}\right)\|\bm{x}_{\star}\|_{2},≤ roman_sin ( divide start_ARG 2 end_ARG start_ARG 25 end_ARG ) ∥ bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

where the second and the last inequalities follow from (14).

Hence, in Lemma VI.1, we plug in θ=2/25𝜃225\theta=2/25italic_θ = 2 / 25. Then the sample complexity in Theorem IV.1 invokes Lemma VI.1, (28), (29), and (30) hold with probability at least 1δ1𝛿1-\delta1 - italic_δ simultaneously. The remainder of the proof is conditioned on the events that (28), (29), and (30) hold.

By applying (28) and (29) to (c) and (d) of (26) and (30) to (e) of (26) with the choice of θ=2/25𝜃225\theta=2/25italic_θ = 2 / 25, we obtain

𝒙k+1φ(𝒙k)𝒙2νη𝒙kφ(𝒙k)𝒙2+ϵkCηsubscriptnormsubscript𝒙𝑘1𝜑subscript𝒙𝑘subscript𝒙2subscript𝜈𝜂subscriptnormsubscript𝒙𝑘𝜑subscript𝒙𝑘subscript𝒙2subscriptitalic-ϵ𝑘subscript𝐶𝜂\|\bm{x}_{k+1}-\varphi(\bm{x}_{k})\bm{x}_{\star}\|_{2}\leq\nu_{\eta}\|\bm{x}_{% k}-\varphi(\bm{x}_{k})\bm{x}_{\star}\|_{2}+\frac{\epsilon_{k}}{C_{\eta}}∥ bold_italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_φ ( bold_italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) bold_italic_x start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + divide start_ARG italic_ϵ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_ARG

for

νη:=c0CηandCη:=(12η)2πc01250(1+η),formulae-sequenceassignsubscript𝜈𝜂subscript𝑐0subscript𝐶𝜂andassignsubscript𝐶𝜂12𝜂2𝜋subscript𝑐012501𝜂\nu_{\eta}:=\frac{c_{0}}{C_{\eta}}\quad\text{and}\quad C_{\eta}:=(1-2\eta)% \sqrt{\frac{2}{\pi}}-c_{0}-\frac{1}{250}(1+\sqrt{\eta}),italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT := divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_ARG and italic_C start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT := ( 1 - 2 italic_η ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG - italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 250 end_ARG ( 1 + square-root start_ARG italic_η end_ARG ) , (33)

where

c0:=425π(2π+2log(25eπ4))+1625π.assignsubscript𝑐0425𝜋2𝜋225𝑒𝜋41625𝜋c_{0}:=\frac{4}{25\pi}\left(\sqrt{\frac{2}{\pi}}+\sqrt{2\log\left(\frac{25e\pi% }{4}\right)}\right)+\frac{1}{625\sqrt{\pi}}.italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := divide start_ARG 4 end_ARG start_ARG 25 italic_π end_ARG ( square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG + square-root start_ARG 2 roman_log ( divide start_ARG 25 italic_e italic_π end_ARG start_ARG 4 end_ARG ) end_ARG ) + divide start_ARG 1 end_ARG start_ARG 625 square-root start_ARG italic_π end_ARG end_ARG .

Since νηsubscript𝜈𝜂\nu_{\eta}italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT satisfies

dνηdη=c0(22π+1500η)((12η)2πc01250(1+η))2>0𝑑subscript𝜈𝜂𝑑𝜂subscript𝑐022𝜋1500𝜂superscript12𝜂2𝜋subscript𝑐012501𝜂20\frac{d\nu_{\eta}}{d\eta}=\frac{c_{0}\left(2\sqrt{\frac{2}{\pi}}+\frac{1}{500% \sqrt{\eta}}\right)}{\left((1-2\eta)\sqrt{\frac{2}{\pi}}-c_{0}-\frac{1}{250}(1% +\sqrt{\eta})\right)^{2}}>0divide start_ARG italic_d italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_η end_ARG = divide start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( 2 square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG + divide start_ARG 1 end_ARG start_ARG 500 square-root start_ARG italic_η end_ARG end_ARG ) end_ARG start_ARG ( ( 1 - 2 italic_η ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG - italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 250 end_ARG ( 1 + square-root start_ARG italic_η end_ARG ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG > 0

for all η[0,1/4]𝜂014\eta\in[0,1/4]italic_η ∈ [ 0 , 1 / 4 ], it is monotonically increasing in η𝜂\etaitalic_η and upper-bounded as νην1/4<9/10subscript𝜈𝜂subscript𝜈14910\nu_{\eta}\leq\nu_{1/4}<9/10italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ≤ italic_ν start_POSTSUBSCRIPT 1 / 4 end_POSTSUBSCRIPT < 9 / 10. This implies νη<1subscript𝜈𝜂1\nu_{\eta}<1italic_ν start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT < 1 uniformly over η[0,1/4]𝜂014\eta\in[0,1/4]italic_η ∈ [ 0 , 1 / 4 ]. This completes the proof of (19).

VII Supporting Lemmas

Lemma VII.1.

Let 𝐠Normal(𝟎,𝐈d)similar-to𝐠Normal0subscript𝐈𝑑\bm{g}\sim\mathrm{Normal}(\bm{0},\bm{I}_{d})bold_italic_g ∼ roman_Normal ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) and θ(0,π)𝜃0𝜋\theta\in(0,\pi)italic_θ ∈ ( 0 , italic_π ). Let 𝒲θsubscript𝒲𝜃\mathcal{W}_{\theta}caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT be defined as in (27). Then we have

supW𝒲θ(𝒈W)θπ.subscriptsupremum𝑊subscript𝒲𝜃𝒈𝑊𝜃𝜋\sup_{W\in\mathcal{W}_{\theta}}\mathbb{P}(\bm{g}\in W)\leq\frac{\theta}{\pi}.roman_sup start_POSTSUBSCRIPT italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_P ( bold_italic_g ∈ italic_W ) ≤ divide start_ARG italic_θ end_ARG start_ARG italic_π end_ARG .
Proof:

Let W𝒲θ𝑊subscript𝒲𝜃W\in\mathcal{W}_{\theta}italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT be arbitrarily fixed. It follows from the definitions in (27) and (22) that W𝑊Witalic_W is a cone. Therefore, 𝒈W𝒈𝑊\bm{g}\in Wbold_italic_g ∈ italic_W if and only if 𝒈/𝒈2W𝒈subscriptnorm𝒈2𝑊\bm{g}/\|\bm{g}\|_{2}\in Wbold_italic_g / ∥ bold_italic_g ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ italic_W. Furthermore, note that 𝒈/𝒈2𝒈subscriptnorm𝒈2\bm{g}/\|\bm{g}\|_{2}bold_italic_g / ∥ bold_italic_g ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is uniformly distributed in 𝕊d1superscript𝕊𝑑1\mathbb{S}^{d-1}blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT. Then we have

(𝒈W)=(𝒈𝒈2W)θπ.𝒈𝑊𝒈subscriptnorm𝒈2𝑊𝜃𝜋\mathbb{P}\left(\bm{g}\in W\right)=\mathbb{P}\left(\frac{\bm{g}}{\|\bm{g}\|_{2% }}\in W\right)\leq\frac{\theta}{\pi}.blackboard_P ( bold_italic_g ∈ italic_W ) = blackboard_P ( divide start_ARG bold_italic_g end_ARG start_ARG ∥ bold_italic_g ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ∈ italic_W ) ≤ divide start_ARG italic_θ end_ARG start_ARG italic_π end_ARG . (34)

The assertion follows since W𝑊Witalic_W was arbitrary. ∎

Lemma VII.2 ([29, Lemma 2.1]).

Let δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ) and {𝐚i}i=1msuperscriptsubscriptsubscript𝐚𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT be independent copies of 𝐠Normal(𝟎,𝐈d)similar-to𝐠Normal0subscript𝐈𝑑\bm{g}\sim\mathrm{Normal}(\bm{0},\bm{I}_{d})bold_italic_g ∼ roman_Normal ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). Then it holds with probability at least 1δ1𝛿1-\delta1 - italic_δ that

sup𝒛Sd1|1mi=1m|𝒂i,𝒛|2π|4dm+2log(2/δ)m.subscriptsupremum𝒛superscript𝑆𝑑11𝑚superscriptsubscript𝑖1𝑚subscript𝒂𝑖𝒛2𝜋4𝑑𝑚22𝛿𝑚\sup_{\bm{z}\in S^{d-1}}\left|\frac{1}{m}\sum_{i=1}^{m}|\langle\bm{a}_{i},\bm{% z}\rangle|-\sqrt{\frac{2}{\pi}}\right|\leq 4\sqrt{\frac{d}{m}}+\sqrt{\frac{2% \log(2/\delta)}{m}}.roman_sup start_POSTSUBSCRIPT bold_italic_z ∈ italic_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | - square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG | ≤ 4 square-root start_ARG divide start_ARG italic_d end_ARG start_ARG italic_m end_ARG end_ARG + square-root start_ARG divide start_ARG 2 roman_log ( 2 / italic_δ ) end_ARG start_ARG italic_m end_ARG end_ARG . (35)
Lemma VII.3 ([30, Lemma 6.4]).

Let δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ) and {𝐚i}i=1msuperscriptsubscriptsubscript𝐚𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT be independent copies of 𝐠Normal(𝟎,𝐈d)similar-to𝐠Normal0subscript𝐈𝑑\bm{g}\sim\mathrm{Normal}(\bm{0},\bm{I}_{d})bold_italic_g ∼ roman_Normal ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). Let s𝑠s\in\mathbb{N}italic_s ∈ blackboard_N satisfy s<m𝑠𝑚s<mitalic_s < italic_m. Then it holds with probability at least 1δ1𝛿1-\delta1 - italic_δ that

sup𝒛𝕊d1T:|T|s1siT|𝒂i,𝒛|subscriptsupremum𝒛superscript𝕊𝑑1:𝑇𝑇𝑠1𝑠subscript𝑖𝑇subscript𝒂𝑖𝒛\displaystyle\sup_{\begin{subarray}{l}\bm{z}\in\mathbb{S}^{d-1}\\ T:|T|\leq s\end{subarray}}\frac{1}{s}\sum_{i\in T}|\langle\bm{a}_{i},\bm{z}\rangle|roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_T : | italic_T | ≤ italic_s end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_s end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_T end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | (36)
2π+4ds+2log(ems)+2slog(2δ).absent2𝜋4𝑑𝑠2𝑒𝑚𝑠2𝑠2𝛿\displaystyle\quad\leq\sqrt{\frac{2}{\pi}}+4\sqrt{\frac{d}{s}}+\sqrt{2\log% \left(\frac{em}{s}\right)}+\sqrt{\frac{2}{s}\cdot\log\left(\frac{2}{\delta}% \right)}.≤ square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG + 4 square-root start_ARG divide start_ARG italic_d end_ARG start_ARG italic_s end_ARG end_ARG + square-root start_ARG 2 roman_log ( divide start_ARG italic_e italic_m end_ARG start_ARG italic_s end_ARG ) end_ARG + square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_s end_ARG ⋅ roman_log ( divide start_ARG 2 end_ARG start_ARG italic_δ end_ARG ) end_ARG .
Lemma VII.4 ([28, Lemma 5.1]).

Let δ(0,1)𝛿01\delta\in(0,1)italic_δ ∈ ( 0 , 1 ) and an acute angle θ>0𝜃0\theta>0italic_θ > 0. Suppose {𝐚i}i=1msuperscriptsubscriptsubscript𝐚𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT be independent copies of a random variable 𝐚d𝐚superscript𝑑\bm{a}\in\mathbb{R}^{d}bold_italic_a ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and we consider the set 𝒲θsubscript𝒲𝜃\mathcal{W}_{\theta}caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT given by (27). Then, if

m(4π/θ)2(2dlog(2em/d)+log(2/δ)),𝑚superscript4𝜋𝜃22𝑑2𝑒𝑚𝑑2𝛿m\geq(4\pi/\theta)^{2}(2d\log(2em/d)+\log(2/\delta)),italic_m ≥ ( 4 italic_π / italic_θ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 2 italic_d roman_log ( 2 italic_e italic_m / italic_d ) + roman_log ( 2 / italic_δ ) ) ,

we have

supW𝒲θ1mi=1m𝟙{𝒂iW}2θπ.subscriptsupremum𝑊subscript𝒲𝜃1𝑚superscriptsubscript𝑖1𝑚subscript1subscript𝒂𝑖𝑊2𝜃𝜋\sup_{W\in\mathcal{W}_{\theta}}\frac{1}{m}\sum_{i=1}^{m}\mathbb{1}_{\{\bm{a}_{% i}\in W\}}\leq\frac{2\theta}{\pi}.roman_sup start_POSTSUBSCRIPT italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W } end_POSTSUBSCRIPT ≤ divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG . (37)

holds with probability at least 1δ1𝛿1-\delta1 - italic_δ.

VIII Proof of Lemma VI.1

We proceed with the proof under the following four events, each of which holds with probability at least 1δ/41𝛿41-\delta/41 - italic_δ / 4. The first event is defined as

sup𝒛𝕊d1|1miIin|𝒂i,𝒛|(1η)2π|subscriptsupremum𝒛superscript𝕊𝑑11𝑚subscript𝑖subscript𝐼insubscript𝒂𝑖𝒛1𝜂2𝜋\displaystyle\sup_{\bm{z}\in\mathbb{S}^{d-1}}\left|\frac{1}{m}\sum_{i\in I_{% \mathrm{in}}}|\langle\bm{a}_{i},\bm{z}\rangle|-(1-\eta)\sqrt{\frac{2}{\pi}}\right|roman_sup start_POSTSUBSCRIPT bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | - ( 1 - italic_η ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG | (38)
4dm+2log(8/δ)m,absent4𝑑𝑚28𝛿𝑚\displaystyle\qquad\qquad\leq 4\sqrt{\frac{d}{m}}+\sqrt{\frac{2\log(8/\delta)}% {m}},≤ 4 square-root start_ARG divide start_ARG italic_d end_ARG start_ARG italic_m end_ARG end_ARG + square-root start_ARG divide start_ARG 2 roman_log ( 8 / italic_δ ) end_ARG start_ARG italic_m end_ARG end_ARG ,

which holds with probability at least 1δ/41𝛿41-\delta/41 - italic_δ / 4. Since by the assumption on outliers, we have a set |Iin|subscript𝐼in|I_{\mathrm{in}}|| italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT | with |Iin|=(1η)msubscript𝐼in1𝜂𝑚|I_{\mathrm{in}}|=(1-\eta)m| italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT | = ( 1 - italic_η ) italic_m and the outliers are independent of {𝒂i}i=1msuperscriptsubscriptsubscript𝒂𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Hence, (38) is a direct result of (35) in Lemma VII.2. By following the same argument, we also have that

sup𝒛𝕊d1|1miIout|𝒂i,𝒛|η2π|4ηdm+2ηlog(8/δ)msubscriptsupremum𝒛superscript𝕊𝑑11𝑚subscript𝑖subscript𝐼outsubscript𝒂𝑖𝒛𝜂2𝜋4𝜂𝑑𝑚2𝜂8𝛿𝑚\sup_{\bm{z}\in\mathbb{S}^{d-1}}\left|\frac{1}{m}\sum_{i\in I_{\mathrm{out}}}|% \langle\bm{a}_{i},\bm{z}\rangle|-\eta\sqrt{\frac{2}{\pi}}\right|\leq 4\sqrt{% \frac{\eta d}{m}}+\sqrt{\frac{2\eta\log(8/\delta)}{m}}roman_sup start_POSTSUBSCRIPT bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_out end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | - italic_η square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG | ≤ 4 square-root start_ARG divide start_ARG italic_η italic_d end_ARG start_ARG italic_m end_ARG end_ARG + square-root start_ARG divide start_ARG 2 italic_η roman_log ( 8 / italic_δ ) end_ARG start_ARG italic_m end_ARG end_ARG (39)

holds with probability at least 1δ/41𝛿41-\delta/41 - italic_δ / 4.

Next, we describe the following event: for an arbitrary fixed α(0,1)𝛼01\alpha\in(0,1)italic_α ∈ ( 0 , 1 ), it holds with probability at least 1δ/41𝛿41-\delta/41 - italic_δ / 4 that

supT:|T|αm𝒛𝕊d11miTIin|𝒂i,𝒛|subscriptsupremum:𝑇𝑇𝛼𝑚𝒛superscript𝕊𝑑11𝑚subscript𝑖𝑇subscript𝐼insubscript𝒂𝑖𝒛absent\displaystyle\sup_{\begin{subarray}{l}T:|T|\leq\alpha m\\ \bm{z}\in\mathbb{S}^{d-1}\end{subarray}}\frac{1}{m}\sum_{i\in T\cap I_{\mathrm% {in}}}\left|\langle\bm{a}_{i},\bm{z}\rangle\right|\leqroman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_T : | italic_T | ≤ italic_α italic_m end_CELL end_ROW start_ROW start_CELL bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_T ∩ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | ≤ (40)
α2π+4αdm+α2log(eα)+2αlog(8/δ)m.𝛼2𝜋4𝛼𝑑𝑚𝛼2𝑒𝛼2𝛼8𝛿𝑚\displaystyle\alpha\sqrt{\frac{2}{\pi}}+4\sqrt{\frac{\alpha d}{m}}+\alpha\sqrt% {2\log\left(\frac{e}{\alpha}\right)}+\sqrt{\frac{2\alpha\log(8/\delta)}{m}}.italic_α square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG + 4 square-root start_ARG divide start_ARG italic_α italic_d end_ARG start_ARG italic_m end_ARG end_ARG + italic_α square-root start_ARG 2 roman_log ( divide start_ARG italic_e end_ARG start_ARG italic_α end_ARG ) end_ARG + square-root start_ARG divide start_ARG 2 italic_α roman_log ( 8 / italic_δ ) end_ARG start_ARG italic_m end_ARG end_ARG .

Again, since by the Assumption 1, we have a fixed set |Iin|subscript𝐼in|I_{\mathrm{in}}|| italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT | with |Iin|=(1η)msubscript𝐼in1𝜂𝑚|I_{\mathrm{in}}|=(1-\eta)m| italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT | = ( 1 - italic_η ) italic_m and the outliers are independent of {𝒂i}i=1msuperscriptsubscriptsubscript𝒂𝑖𝑖1𝑚\{\bm{a}_{i}\}_{i=1}^{m}{ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, (40) holds by (36) in Lemma VII.3.

Since (31) invokes Lemma VII.4 with probability at least 1δ/41𝛿41-\delta/41 - italic_δ / 4, it holds with probability at least 1δ/41𝛿41-\delta/41 - italic_δ / 4 that

supW𝒲θi=1m𝟙{𝒂iW}2θmπ.subscriptsupremum𝑊subscript𝒲𝜃superscriptsubscript𝑖1𝑚subscript1subscript𝒂𝑖𝑊2𝜃𝑚𝜋\sup_{W\in\mathcal{W}_{\theta}}\sum_{i=1}^{m}\mathbb{1}_{\{\bm{a}_{i}\in W\}}% \leq\frac{2\theta m}{\pi}.roman_sup start_POSTSUBSCRIPT italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W } end_POSTSUBSCRIPT ≤ divide start_ARG 2 italic_θ italic_m end_ARG start_ARG italic_π end_ARG . (41)

Since we have shown that (38),(39),(40) and (41) hold with probability at least 1δ1𝛿1-\delta1 - italic_δ, we will move forward with the remainder of the proof by assuming those conditions are satisfied.

We first show (28). We observe that for an arbitrary W𝒲θ𝑊subscript𝒲𝜃W\in\mathcal{W}_{\theta}italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝒛𝕊d1𝒛superscript𝕊𝑑1\bm{z}\in\mathbb{S}^{d-1}bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, it holds deterministically that

1m1𝑚\displaystyle\frac{1}{m}divide start_ARG 1 end_ARG start_ARG italic_m end_ARG iIin𝟙{𝒂iW}|𝒂i,𝒛|=subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖𝑊subscript𝒂𝑖𝒛absent\displaystyle\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{\bm{a}_{i}\notin W\}}|% \langle\bm{a}_{i},\bm{z}\rangle|=∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_W } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | =
1miIin|𝒂i,𝒛|1miIin𝟙{𝒂iW}|𝒂i,𝒛|.1𝑚subscript𝑖subscript𝐼insubscript𝒂𝑖𝒛1𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖𝑊subscript𝒂𝑖𝒛\displaystyle\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}|\langle\bm{a}_{i},\bm{z}% \rangle|-\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}\mathbb{1}_{\{\bm{a}_{i}\in W\}% }|\langle\bm{a}_{i},\bm{z}\rangle|.divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | - divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | .

Hence, by taking infimum on both sides over sets 𝒲θsubscript𝒲𝜃\mathcal{W}_{\theta}caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and 𝕊d1superscript𝕊𝑑1\mathbb{S}^{d-1}blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT, we have

infW𝒲θ𝒛𝕊d11miIin𝟙{𝒂iW}|𝒂i,𝒛|subscriptinfimum𝑊subscript𝒲𝜃𝒛superscript𝕊𝑑11𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖𝑊subscript𝒂𝑖𝒛\displaystyle\inf_{\begin{subarray}{l}W\in\mathcal{W}_{\theta}\\ \bm{z}\in\mathbb{S}^{d-1}\end{subarray}}\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}% \mathbb{1}_{\{\bm{a}_{i}\notin W\}}|\langle\bm{a}_{i},\bm{z}\rangle|roman_inf start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∉ italic_W } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | (42)
inf𝒛𝕊d11miIin|𝒂i,𝒛|(A)supW𝒲θ𝒛𝕊d11miIin𝟙{𝒂iW}|𝒂i,𝒛|(B).absentsubscriptsubscriptinfimum𝒛superscript𝕊𝑑11𝑚subscript𝑖subscript𝐼insubscript𝒂𝑖𝒛Asubscriptsubscriptsupremum𝑊subscript𝒲𝜃𝒛superscript𝕊𝑑11𝑚subscript𝑖subscript𝐼insubscript1subscript𝒂𝑖𝑊subscript𝒂𝑖𝒛B\displaystyle\geq\underbrace{\inf_{\bm{z}\in\mathbb{S}^{d-1}}\frac{1}{m}\sum_{% i\in I_{\mathrm{in}}}|\langle\bm{a}_{i},\bm{z}\rangle|}_{\mathrm{(A)}}-% \underbrace{\sup_{\begin{subarray}{l}W\in\mathcal{W}_{\theta}\\ \bm{z}\in\mathbb{S}^{d-1}\end{subarray}}\frac{1}{m}\sum_{i\in I_{\mathrm{in}}}% \mathbb{1}_{\{\bm{a}_{i}\in W\}}|\langle\bm{a}_{i},\bm{z}\rangle|}_{\mathrm{(B% )}}.≥ under⏟ start_ARG roman_inf start_POSTSUBSCRIPT bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | end_ARG start_POSTSUBSCRIPT ( roman_A ) end_POSTSUBSCRIPT - under⏟ start_ARG roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_W ∈ caligraphic_W start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_1 start_POSTSUBSCRIPT { bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_W } end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | end_ARG start_POSTSUBSCRIPT ( roman_B ) end_POSTSUBSCRIPT .

We first obtain a lower bound on (A) and an upper bound on (B). We have a lower bound on (A) by (38):

(A)(1η)2π4dm2log(8/δ)m.A1𝜂2𝜋4𝑑𝑚28𝛿𝑚\mathrm{(A)}\geq(1-\eta)\sqrt{\frac{2}{\pi}}-4\sqrt{\frac{d}{m}}-\sqrt{\frac{2% \log(8/\delta)}{m}}.( roman_A ) ≥ ( 1 - italic_η ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG - 4 square-root start_ARG divide start_ARG italic_d end_ARG start_ARG italic_m end_ARG end_ARG - square-root start_ARG divide start_ARG 2 roman_log ( 8 / italic_δ ) end_ARG start_ARG italic_m end_ARG end_ARG . (43)

By taking m𝑚mitalic_m (31) in (43) for a sufficiently large C>0𝐶0C>0italic_C > 0, we have

(A)(1η)2πθ20.A1𝜂2𝜋𝜃20\mathrm{(A)}\geq(1-\eta)\sqrt{\frac{2}{\pi}}-\frac{\theta}{20}.( roman_A ) ≥ ( 1 - italic_η ) square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG - divide start_ARG italic_θ end_ARG start_ARG 20 end_ARG . (44)

It remains to show an upper bound on (B). Under the event (41), we have

(B)supT:|T|2θm/π𝒛𝕊d11miTIin|𝒂i,𝒛|.Bsubscriptsupremum:𝑇𝑇2𝜃𝑚𝜋𝒛superscript𝕊𝑑11𝑚subscript𝑖𝑇subscript𝐼insubscript𝒂𝑖𝒛\mathrm{(B)}\leq\sup_{\begin{subarray}{l}T:|T|\leq{2\theta m}/\pi\\ \bm{z}\in\mathbb{S}^{d-1}\end{subarray}}\frac{1}{m}\sum_{i\in T\cap I_{\mathrm% {in}}}\left|\langle\bm{a}_{i},\bm{z}\rangle\right|.( roman_B ) ≤ roman_sup start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_T : | italic_T | ≤ 2 italic_θ italic_m / italic_π end_CELL end_ROW start_ROW start_CELL bold_italic_z ∈ blackboard_S start_POSTSUPERSCRIPT italic_d - 1 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_T ∩ italic_I start_POSTSUBSCRIPT roman_in end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ⟨ bold_italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_z ⟩ | .

Therefore, by letting α=2θ/π𝛼2𝜃𝜋\alpha=2\theta/\piitalic_α = 2 italic_θ / italic_π in (40), (40) gives an upper bound on (B):

(B)2θπ2π+42θdπm+2θπ2log(eπ2θ)+4θlog(8/δ)πm.B2𝜃𝜋2𝜋42𝜃𝑑𝜋𝑚2𝜃𝜋2𝑒𝜋2𝜃4𝜃8𝛿𝜋𝑚\displaystyle\mathrm{(B)}\leq\frac{2\theta}{\pi}\sqrt{\frac{2}{\pi}}+4\sqrt{% \frac{2\theta d}{\pi m}}+\frac{2\theta}{\pi}\sqrt{2\log\left(\frac{e\pi}{2% \theta}\right)}+\sqrt{\frac{4\theta\log(8/\delta)}{\pi m}}.( roman_B ) ≤ divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG + 4 square-root start_ARG divide start_ARG 2 italic_θ italic_d end_ARG start_ARG italic_π italic_m end_ARG end_ARG + divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG square-root start_ARG 2 roman_log ( divide start_ARG italic_e italic_π end_ARG start_ARG 2 italic_θ end_ARG ) end_ARG + square-root start_ARG divide start_ARG 4 italic_θ roman_log ( 8 / italic_δ ) end_ARG start_ARG italic_π italic_m end_ARG end_ARG . (45)

Taking m𝑚mitalic_m according to (31) yields

(B)2θπ(2π+2log(eπ2θ))+θ202θπ.B2𝜃𝜋2𝜋2𝑒𝜋2𝜃𝜃202𝜃𝜋\mathrm{(B)}\leq\frac{2\theta}{\pi}\left(\sqrt{\frac{2}{\pi}}+\sqrt{2\log\left% (\frac{e\pi}{2\theta}\right)}\right)+\frac{\theta}{20}\sqrt{\frac{2\theta}{\pi% }}.( roman_B ) ≤ divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG ( square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_π end_ARG end_ARG + square-root start_ARG 2 roman_log ( divide start_ARG italic_e italic_π end_ARG start_ARG 2 italic_θ end_ARG ) end_ARG ) + divide start_ARG italic_θ end_ARG start_ARG 20 end_ARG square-root start_ARG divide start_ARG 2 italic_θ end_ARG start_ARG italic_π end_ARG end_ARG . (46)

Hence, putting the results (44) and (46) into (42) completes the proof of the statement (28).

For the proofs of remaining statements in (29) and (30), the upper bound in (29) is a direct consequence of (39) with choosing n𝑛nitalic_n according (31). Lastly, (30) is the result of the upper bound of (B) in (46). These complete the proof of (29) and (30).

IX Conclusion

The least absolute deviation (LAD) has been a popular statistical method for regression in the presence of outliers. We consider the LAD approach to robust phase retrieval with the magnitude-only measurement model. To solve the resulting non-convex optimization, we derive a robust alternating minimization method (Robust-AM) as an unconstrained Gauss-Newton method. Furthermore, we propose fast Robust-AM by exploiting efficient solvers and show that Robust-AM by ADMM converges faster than a similar approach known as the prox-linear by its efficient solver POGS [14].

We established a local convergence analysis of Robust-AM under the standard Gaussian measurement model when the support of sparse noise is arbitrarily fixed but magnitudes can be adversarial. A suitably initialized Robust-AM converges linearly to the ground truth uniformly over all ground-truth signals when the number of measurements m𝑚mitalic_m is proportional to the signal length d𝑑ditalic_d and the outlier fraction is up to 1/4141/41 / 4. This theoretical result is comparable to existing prior art in the literature. Furthermore, the numerical results show that Robust-AM outperforms the existing guaranteed methods for various outlier models in both synthetic data and real image data.

References

  • [1] S. Kim and K. Lee, “Sequence of linear program for robust phase retrieval,” 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, to appear.
  • [2] A. Walther, “The question of phase retrieval in optics,” Optica Acta: International Journal of Optics, vol. 10, no. 1, pp. 41–49, 1963.
  • [3] O. Bunk, A. Diaz, F. Pfeiffer, C. David, B. Schmitt, D. K. Satapathy, and J. F. Van Der Veen, “Diffractive imaging for periodic samples: retrieving one-dimensional concentration profiles across microfluidic channels,” Acta Crystallographica Section A: Foundations of Crystallography, vol. 63, no. 4, pp. 306–314, 2007.
  • [4] A. Chai, M. Moscoso, and G. Papanicolaou, “Array imaging using intensity-only measurements,” Inverse Problems, vol. 27, no. 1, p. 015005, 2010.
  • [5] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase retrieval with application to optical imaging: a contemporary overview,” IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 87–109, 2015.
  • [6] D. S. Weller, A. Pnueli, G. Divon, O. Radzyner, Y. C. Eldar, and J. A. Fessler, “Undersampled phase retrieval with outliers,” IEEE Transactions on Computational Imaging, vol. 1, no. 4, pp. 247–258, 2015.
  • [7] J. Dong, L. Valzania, A. Maillard, T.-a. Pham, S. Gigan, and M. Unser, “Phase retrieval: From computational imaging to machine learning: A tutorial,” IEEE Signal Processing Magazine, vol. 40, no. 1, pp. 45–57, 2023.
  • [8] S. Bahmani and J. Romberg, “Phase retrieval meets statistical learning theory: A flexible convex relaxation,” in Artificial Intelligence and Statistics.   PMLR, 2017, pp. 252–260.
  • [9] T. Goldstein and C. Studer, “Phasemax: Convex phase retrieval via basis pursuit,” IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2675–2689, 2018.
  • [10] P. Hand and V. Voroninski, “Corruption robust phase retrieval via linear programming,” arXiv preprint arXiv:1612.03547, 2016.
  • [11] H. Zhang, Y. Zhou, Y. Liang, and Y. Chi, “A nonconvex approach for phase retrieval: Reshaped wirtinger flow and incremental algorithms,” Journal of Machine Learning Research, 2017.
  • [12] G. Wang, G. B. Giannakis, and Y. C. Eldar, “Solving systems of random quadratic equations via truncated amplitude flow,” IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 773–794, 2017.
  • [13] H. Zhang, Y. Chi, and Y. Liang, “Median-truncated nonconvex approach for phase retrieval with outliers,” IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 7287–7310, 2018.
  • [14] J. C. Duchi and F. Ruan, “Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval,” Information and Inference: A Journal of the IMA, vol. 8, no. 3, pp. 471–529, 2019.
  • [15] P. Bloomfield and W. L. Steiger, Least absolute deviations: theory, applications, and algorithms.   Springer, 1983.
  • [16] R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik, vol. 35, p. 237, 1972.
  • [17] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
  • [18] S. Wang and N. Shroff, “A new alternating direction method for linear programming,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  • [19] J. van den Brand, “A deterministic linear program solver in current matrix multiplication time,” in Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms.   SIAM, 2020, pp. 259–278.
  • [20] J. V. Burke and M. C. Ferris, “A Gauss–Newton method for convex composite optimization,” Mathematical Programming, vol. 71, no. 2, pp. 179–194, 1995.
  • [21] F. Clarke, Optimization and Nonsmooth Analysis, ser. Classics in Applied Mathematics.   Society for Industrial and Applied Mathematics, 1990.
  • [22] P. Netrapalli, P. Jain, and S. Sanghavi, “Phase retrieval using alternating minimization,” Advances in Neural Information Processing Systems, vol. 26, 2013.
  • [23] N. Parikh and S. Boyd, “Block splitting for distributed optimization,” Mathematical Programming Computation, vol. 6, no. 1, pp. 77–102, 2014.
  • [24] K. Holmström, A. O. Göran, and M. M. Edvall, “User’s guide for tomlab/cplex v12. 1,” Tomlab Optim. Retrieved, vol. 1, p. 2017, 2009.
  • [25] L. Gurobi Optimization, “Gurobi optimizer reference manual,” 2021.
  • [26] T. Yang and Q. Lin, “Rsg: Beating subgradient method without smoothness and strong convexity,” The Journal of Machine Learning Research, vol. 19, no. 1, pp. 236–268, 2018.
  • [27] Y. Ye and E. Tse, “An extension of karmarkar’s projective algorithm for convex quadratic programming,” Mathematical programming, vol. 44, pp. 157–179, 1989.
  • [28] Y. S. Tan and R. Vershynin, “Phase retrieval via randomized kaczmarz: theoretical guarantees,” Information and Inference: A Journal of the IMA, vol. 8, no. 1, pp. 97–123, 2019.
  • [29] Y. Plan and R. Vershynin, “Dimension reduction by random hyperplane tessellations,” Discrete & Computational Geometry, vol. 51, no. 2, pp. 438–461, 2014.
  • [30] ——, “Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach,” IEEE Transactions on Information Theory, vol. 59, no. 1, pp. 482–494, 2012.