Robust Phase Retrieval by Alternating Minimization
Seonho Kim,
and Kiryung Lee,
Seonho Kim and Kiryung Lee are with the Department of ECE, The Ohio State University, Columbus, OH 43210 USA (e-mail: [email protected]). This work was supported in part by NSF CAREER Award CCF-1943201. A preliminary version of this work will be presented at the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) [1]
Abstract
We consider a least absolute deviation (LAD) approach to the robust phase retrieval problem that aims to recover a signal from its absolute measurements corrupted with sparse noise. To solve the resulting non-convex optimization problem, we propose a robust alternating minimization (Robust-AM) derived as an unconstrained Gauss-Newton method. To solve the inner optimization arising in each step of Robust-AM, we adopt two computationally efficient methods for linear programs. We provide a non-asymptotic convergence analysis of these practical algorithms for Robust-AM under the standard Gaussian measurement assumption.
These algorithms, when suitably initialized, are guaranteed to converge linearly to the ground truth at an order-optimal sample complexity with high probability while the support of sparse noise is arbitrarily fixed and the sparsity level is no larger than . Additionally, through comprehensive numerical experiments on synthetic and image datasets, we show that Robust-AM outperforms existing methods for robust phase retrieval offering comparable theoretical performance guarantees.
Index Terms:
phase retrieval, outliers, least absolute deviation, linear program, convex optimization
I Introduction
Phase retrieval refers to the recovery of unknown signals (or ) from the magnitudes of its linear measurements, which are formulated as
(1)
where (or ) and are known measurement vectors.
Solving the set of nonlinear equations in (1) arises in numerous applications including X-ray crystallography, diffraction and array imaging, and optics (e.g. [2, 3, 4, 5]).
We consider the robust phase retrieval from the noisy amplitude measurements in (1) corrupted with sparse noise, i.e.
(2)
where and collect the unknown indices of outliers and inliers respectively, and is an arbitrary sequence in .
For example, such a scenario arises in phase retrieval imaging applications [6] due to various reasons including detection failures and recording errors.
A suite of methods designed for the plain phase retrieval [7] has been adapted to address the outliers. These methods provide not only empirically successful performances but also theoretical analyses under random measurement models. For instance, anchored regression [8] and PhaseMax [9] formulate phase retrieval given an initial estimate as a linear program. RobustPhaseMax [10] modifies these methods to offer robust estimation by introducing an auxiliary variable to describe the outliers. In another example, Reshaped Wirtinger Flow (RWF) [11] and Amplitude Flow [12] follow a subgradient descent approach for a least squares estimator (LSE). Median-RWF [13] is a variant of these methods tailored to robust phase retrieval. Specifically, Median-RWF uses a truncation type of regularization that identifies and excludes outliers in each iteration by median-based thresholding on the consistency of the current estimate to the measurements. Median-RWF significantly improves the empirical performance of RobustPhaseMax by tolerating a higher fraction of outliers. However, the regularization of Median-RWF involves algorithmic parameters that have been tuned specifically for the Gaussian measurement model.
However, it has not been discussed how to generalize the tuning parameters to other measurement models.
A recent work proposed an approach to robust phase retrieval in the classical robust regression framework in statistics [14].
Instead of the least squares, they adopted the least absolute deviation (LAD) [15] to enforce the consistency to the squared amplitude measurements with outliers.
The parameter estimation is then cast as a nonconvex optimization problem. They proposed a prox-linear method that updates the estimate iteratively through local linearization of the forward model.
This algorithm can be viewed as a variant of the Gauss-Newton method that regularizes the updates with the proximity to the previous iterate.
The prox-linear algorithm iteratively refines the estimate through a sequence of quadratic programs for prox-linear problems and provides comparable performance to Median-RWF.
Importantly, the Gauss-Newton method does not involve any tuning parameter.
However, for large-scale applications such as those in astronomical or medical imaging, further acceleration of this iterative method is desired.
They developed the proximal operator graph splitting (POGS) solver for this purpose.
In this paper, we propose an optimization approach to robust phase retrieval that shares strong theoretical guarantees (high tolerance of outlier ratio and no tuning parameters) with the prox-linear algorithm and further improves its computational cost.
The objective is achieved by a simple unconstrained Gauss-Newton method for LAD. The resulting optimization is equivalent to an alternating minimization algorithm for LAD, as described in [16], which is solved by a sequence of linear programs. Since this alternating minimization approach is robust in the presence of outliers, we refer to the optimization as Robust-AM.
Since this alternating minimization is a robust estimator in the presence of outliers, we refer to the optimization as Robust-AM
Our main theoretical result demonstrates that a suitably initialized Robust-AM converges to the ground-truth signal linearly from random amplitude-only measurements including up to outliers.
The desired initialization can be obtained by the existing robust spectral estimators [13, 14].
We verified through comprehensive numerical simulations that Robust-AM empirically outperforms the existing methods for robust phase retrieval.
Particularly, it can tolerate a higher fraction of outliers and provide exact recovery with fewer observations.
Furthermore, due to its unconstrained optimization formulation with the absolute amplitude measurement model, Robust-AM admits a computationally efficient ADMM algorithm, which runs faster than POGS for the prox-linear method.
As shown in Figure1, ADMM for Robust-AM converges faster than POGS for the prox-linear method.
In this experiment, the fraction of outliers is set to , with outlier entries generated following zero and a Cauchy distribution with median and mean-absolute-deviation .
The convergence is measured by the metric for .
Figure1 shows that the unconstrained Gauss-Newton method, without any explicit control over the proximity to previous iterates, converges to the ground truth signal without overshooting.
TABLE I: Comparison of RobustPhaseMax [10], Median-RWF [13], Prox-linear [14] and Robust-AM for robust phase retrieval in terms of computational cost to obtain -accurate solution and sparse noise assumptions for the performance guarantees.
1We establish this computational cost under the assumption that the POGS linear converges to the solution for the inner optimization of prox-linear. However, to the best of our knowledge, the convergence rate of POGS has not been shown. Thus, this computational cost is a conjecture.
Notations : Boldface lowercase letters denote column vectors.
We use and to denote the norm and the Euclidean norm respectively. For brevity, the shorthand notation denotes the set for . We adopt the big-O notation so that is alternatively written as . With a notation , we ignore logarithmic factors.
II Robust Alternating Minimization
We consider the minimization of the composite function where is a convex function and is a nonlinear mapping.
In the special case when is differentiable, Burke and Ferris [20] proposed a constrained Gauss-Newton method where the amount of the update is upper-bounded by a threshold.
Duchi and Ruan [14] considered a variant where the constraint on the proximity on consecutive iterates is substituted by regularization with an additive penalty.
We consider a more challenging case where is non-differentiable and propose an unconstrained Gauss-Newton method where the variable sequence is iteratively updated by
(3)
where denotes the Clarke’s generalized Jacobian matrix at [21].
Due to the local linear approximation of at in (3), is obtained as a solution to a convex program.
In a special case where and are respectively given by
(4)
and
(5)
their composition reduces to
(6)
Then the minimization of corresponds to the LAD approach to robust phase retrieval with the absolute amplitude measurement model.
Furthermore, given and as in (4) and (5), the update rule in (3) is explicitly written as
(7)
The resulting algorithm (7), derived from an unconstrained Gauss-Newton method of robust phase retrieval, is equivalent to an alternating minimization approach to the LAD formulation of robust phase retrieval when noisy measurements with a negative sign are discarded.
An analogous alternating minimization for least-squares phase retrieval has been studied in the literature [16, 22].
Due to the robustness of LAD, we refer to the iterative algorithm by (7) as a robust alternating minimization (Robust-AM).
Duchi and Ruan [14] considered a similar robust phase retrieval with the squared amplitude measurement model via their regularized Gauss-Newton method.
III Optimization Algorithms
This section discusses numerical algorithms for Robust-AM.
First, we note that the optimization in (7) is equivalent to a linear program
(8)
where .
There exist various computationally efficient numerical methods to solve linear programs.
For example, the derandomized algorithm by van den Brand [19] finds an exact solution to a linear program with variables and constraints at the cost of multiplications where .
To further accelerate the convergence of Robust-AM, we also adopt iterative numerical algorithms that provide an approximate solution to the inner optimization in (7).
In particular, we consider two alternating direction method of multipliers (ADMM) algorithms and a subgradient descent algorithm for the inner optimization.
We refer to the Robust-AM with approximate solutions to the inner optimization by these ADMM algorithms as fast Robust-AM since they provide a significantly lower computational cost for the entire convergence of Robust-AM to an -accurate estimate.
III-AADMM for LAD
Given , the optimization in (7) is viewed as LAD for linear regression and one can use an ADMM algorithm for LAD [17, Chapter 6.1].
To describe the update rule of the ADMM algorithm, we introduce shorthand notations for the sake of brevity.
Let be a matrix whose -th row is for , , and .
By following [17, Chapter 6.1] with an auxiliary variable and dual variable , the update rules are given in a closed form as follows:
(9a)
(9b)
(9c)
where denotes the Hadamard product.
The most expensive step in (9) is the least squares problem in (9a).
Since it repeats with the same , the pseudo inverse of can be pre-computed as with cost and be used on memory over iterations.
For faster convergence, we adopt the varying step size strategy for [17, Section 3.4.1].
Importantly, remains the same over the outer iteration of Robust-AM, the pseudo inverse is computed only once.
The POGS algorithm [23] for the prox-linear [14, Section 5] involves a similar matrix inversion.
However, since their matrix evolves over the outer iteration, unlike the fast Robust-AM with ADMM, it is necessary for POGS to repeat the matrix inversion.
Recall that we wanted to adopt ADMM for the inner iteration of Robust-AM to accelerate the convergence with approximate solutions.
Therefore, the convergence rate in the inner optimization is crucial.
However, to the best of our knowledge, the convergence rate has not been shown for the above ADMM algorithm and the POGS algorithm.
Below we will present another ADMM algorithm and a subgradient descent method for (7) with proven linear convergence in the next section.
Despite their theoretical convergence results, the ADMM by (9) empirically outperformed the other methods.
In our numerical studies, we found that the fast Robust-AM with ADMM by (9) provides faster empirical convergence than POGS (see Figure1).
III-BADMM for linear program with linear convergence
Wang and Shroff [18] proposed the ADMM approach for a linear program and showed that their ADMM approach solves a linear program significantly faster than standard software such as CPLEX [24] and Gurobi [25].
Moreover, they showed the linear convergence result for their ADMM approach.
To apply their approach to our linear program (8), we reformulate it into the standard form of a linear program (only with equality constraints) [18, Equation 1] by introducing auxiliary variables as
(10)
where , , and
Then, by following [18, Algorithm 1], the update rule is given as a closed form with auxiliary variable and dual variable for , and as
(11a)
(11b)
(11c)
where
and takes the positive part of each entry of the input vector.
The most expensive step is the matrix inversion given in (11a).
It is calculated via the matrix-inversion lemma as
with cost .
Since this step does not depend on previous outer iterations, one can use a pre-computed result on memory over the inner and outer iterations.
Hence, by the linear convergence result [18, Theorem 1], the cost for an -accurate solution to (10) is .
However, due to more auxiliary variables in (10) compared to (7), in our numerical studies, the ADMM algorithm by (11) showed slower convergence in the run time relative to the algorithm by (9).
III-CSubgradient descent for LAD
Yang and Lin [26] proposed a restarted subgradient (RSG) for non-smooth optimization.
The specification of their subgradient descent to LAD in (7) is written as
(12)
where denotes a step size.
The step size remains the same for consecutive iterations and then decreases by half.
They showed that the subsequence of iterates sampled at every indices converges at a linear rate for a sufficiently large .
Therefore, the cost for an -accurate solution to (7) is .
However, in our numerical studies, RSG did not provide the fastest convergence in the run time compared with the other ADMM algorithms.
IV Theoretical results
In this section, we present the convergence analysis of the Robust-AM algorithms under the following assumptions.
First, we adopt the standard random linear measurements and outliers with arbitrary support and adversarial values [14].
Assumption 1:
The measurement vectors are independent copies of .
Assumption 2:
The outliers are supported on
an arbitrarily fixed set with for and their magnitudes can be adversarial.
Additionally, to provide the convergence analysis of the fast Robust-AM, we introduce an extra assumption that quantifies the suboptimality of solving (13) by ADMM.
Assumption 3:
There exists a bounded sequence such that is an inexact minimizer up to the sub-optimality level for all , i.e.
(13)
We denote the highest sub-optimality level as , i.e.
Theorem IV.1.
Suppose that Assumptions IV, IV, and IV hold. Then there exist absolute constants and constants depending only on , for which the following statement holds for all with probability at least : If and
(14)
then the sequence by the fast Robust-AM algorithm satisfies
(15)
for all , where .
The Robust-AM algorithm updates iterates with an exact solution to (7).
Therefore, setting to in TheoremIV.1 provides a sufficient condition for the exact recovery of by Robust-AM.
We compare the specification of TheoremIV.1 to this scenario to the analogous results for competing methods: RobustPhaseMax [10], Median-RWF[13], and prox-linear [14].
TheoremIV.1 as well as the previous results achieve the exact recovery when the number of observations exceeds a multiple of the signal dimension .
Earlier theoretical results on RobustPhaseMax and Median-RWF showed that there exists an unspecified numerical constant so that the algorithms provide the exact recovery if the outlier fraction is below this constant.
In contrast, the analyses of the prox-linear [14] and Robust-AM (Theorem IV.1) demonstrate that these methods can tolerate outliers up to of the total observations.
These theoretical guarantees consider different degrees of adversary for their outlier models.
The performance guarantee of RobustPhaseMax by Hand [10] assumed the highest adversary so that both the support and values of sparse noise are adversarial.
The performance guarantees of Median-RWF by Zhang et al. [13] considered the same outlier model as in Assumption IV, but they also introduced additive noise of a bounded norm in addition to sparse noise.
Duchi and Ruan [14] used the lowest adversary so that the support of sparse noise is random but the nonzero values of sparse noise can depend on the measurements.
Despite providing performance guarantees under the highest adversary, as shown in SectionV, RobustPhaseMax showed significantly inferior empirical performance relative to the other methods in terms of the tolerable outlier ratio.
TheoremIV.1 establishes a local linear convergence of the Robust-AM algorithms.
As discussed in SectionII, Robust-AM has no explicit control over the amount of the update in each iteration unlike the constrained or regularized versions of the Gauss-Newton method [20, 14].
However, despite its simple form, Robust-AM provides the monotone decrease of the estimation error toward zero without any overshooting for robust phase retrieval in the setting of TheoremIV.1.
All convergence analyses by TheoremIV.1 and previous work [13, 14] require an initialization within a neighborhood of the ground truth.
The size of the basin of convergence was determined with an explicit numerical constant only in [10] and TheoremIV.1.
Various initialization methods with theoretical performance guarantees have been developed to obtain the desired initial estimate [13, 14].
The sample complexity for these initialization methods does not exceed those for the subsequent estimators in order.
Next, we discuss the computational costs for the robust estimators.
First, RobustPhaseMax is formulated as a linear program and thus it can be exactly solved with multiplications by derandomized algorithm [19].
Furthermore, as we discussed in SectionIII-B, there exists an ADMM algorithm for the linear program that costs for an -accurate solution.
Due to the term , if the desired accuracy decreases in proportion to the size of the problem, it is preferable to use ADMM. Otherwise, the derandomized algorithm will be computationally efficient.
The other estimators are given as an iterative algorithm with a proven convergence rate. Therefore, we compare their computational costs to obtain an -accurate solution.
Median-RWF is a truncated gradient descent with the per-iteration cost of .
Since the linear convergence of Median-RWF has been established, the total cost is .
Unlike Median-RWF, the updates in prox-linear and Robust-AM involve a nontrivial inner optimization, respectively cast as a quadratic program and a linear program.
One may use an exact solver for these sub-problems.
For example, there exists an interior point method for quadratic programs with the cost [27].
Since it has been shown that prox-linear converges quadratically, the total cost with this exact inner solver is .
The inner optimization in Robust-AM can be exactly solved at the cost by the derandomized algorithm [19].
Due to its linear convergence, the total cost of
Robust-AM is .
However, as shown in TheoremIV.1, the linear convergence of Robust-AM remains valid when the inner optimization problems are solved only approximately.
The fast Robust-AM with the ADMM solver for linear programs has the per-iteration cost of as shown in SectionIII.
Due to its linear convergence in TheoremIV.1, the total cost to obtain the accuracy is .
In contrast, the convergence rate of POGS for the inner optimization in prox-linear has not been established.
We summarize the comparison for the computational costs of algorithms in TableI.
Lastly, we elaborate on the dependence of the parameters and in TheoremIV.1 on the outlier ratio .
The linear convergence parameter in (15) is explicitly specified as an increasing function of in the proof of TheoremIV.1 and illustrated in Figure2(a).
Therefore, smaller implies faster convergence.
The final error bound by (15) with going to infinity is given as the amplification of the sub-optimality parameter in the inner optimization by a factor of .
First, similar to , the parameter is also explicitly given as an increasing function of in the proof (see Figure2(b)).
However, the final estimation can still be sufficiently small, as one can set the accuracy parameter to a sufficiently low value (less than ) using linear program packages in readily available software such as CPLEX and Gurobi.
Hence, the assumption on in TheoremIV.1 is easily satisfied.
(a)
(b)
Figure 2: The dependence of parameters and in TheoremIV.1 on the outlier fraction .
V Numerical Results
This section compares the empirical performances of Robust-AM to its theoretical analysis in TheoremIV.1.
Robust-AM is also compared against the competing methods for robust phase retrieval, which are RobustPhaseMax, Median-RWF, and the prox-linear.
Recall that all these methods require an initial estimate. For this purpose, we adopt the spectral method by Zhang et al. [13].
V-ASynthetic data experiments
First, through experiments on synthetic data, we show that the numerical results corroborate our theoretical findings in TheoremIV.1 and Robust-AM outperforms the competing methods.
In this experiment, the measurement vectors are generated so that by following the assumptions in TheoremIV.1 and analogous theoretical analyses of the other methods.
The ground-truth signal is generated as independently from the measurement vectors.
The outlier support is randomly selected following the uniform distribution on all possible subsets of size .
Figure 3: Phase transition of empirical success rate by Robust-AM per the number of measurements and the dimension .
(a)Cauchy distribution
(b)uniform distribution
(c)zero
Figure 4: Phase transition of success rate per the measurement ratio and the fraction of outliers for various outlier magnitude models. Subfigures are displayed according to RobustPhaseMax (top-left), Median-RWF (top-right), prox-linear method (bottom-left), and Robust-AM (bottom-right).
Figure3 shows the phase transition of the empirical success rate by Robust-AM through Monte Carlo simulations, where the outlier values are i.i.d. following the Cauchy distribution with median and mean-absolute-deviation .
The fraction of outliers is fixed to
Recall that the performance guarantee in TheoremIV.1 applies uniformly to all ground-truth signals.
To observe the empirical performance in an analogous setting, we design the experiment as follows:
1) Generate sets of random measurement vectors . Generate sets of random ground-truth ;
2) For each fixed , success is declared if the estimator recovers all ground-truth signals by satisfying where denotes the estimate;
3) The empirical success rate is calculated on the outcomes from distinct sets of measurement vectors.
The transition occurs at the boundary where the number of measurements is proportional to the ambient dimension (signal length).
This empirical result corroborates our theoretical finding in TheoremIV.1.
Next, we repeat the same experiment on RobustPhaseMax, Median-RWF, and the prox-linear.
Figure4(a) compares the empirical performance of Robust-AM against RobustPhaseMax, Median-RWF, and the prox-linear by displaying the phase transition of these methods for a range of the outlier fraction in this setting.
The ambient dimension is set to .
Figure4(a) shows that Robust-AM outperforms all the other methods with a significantly lower threshold for the phase transition.
We further expand the comparison to other models for outlier values.
The second scenario draws from the uniform distribution on .
The third scenario sets to .
As observed in Figures4(b) and 4(c), similar trends appear in the other outlier models.
RobustPhaseMax, while providing the strongest theoretical performance guarantee, shows the worst empirical performance in the comparison.
There is no consistent dominance between Median-RWF and the prox-linear algorithm.
Median-RWF outperforms the prox-linear in the second scenario, but the other way around in the other scenarios.
(a)
(b)
(c)
Figure 5: Convergence of Robust-AM (blue) and the prox-linear (red) in the iteration count.
Next, we compare the convergence speed of Robust-AM and the prox-linear algorithm.
In this experiment, the dimension parameters are set to and where the values of outliers are zero. The outlier ratio varies over .
Figure5 illustrates how the log of decays over the iteration index .
The median over trials is plotted.
In their theoretical analyses, the prox-linear algorithm converges faster at a quadratic rate than the linear convergence of Robust-AM in TheoremIV.1.
However, as shown in Figure5, Robust-AM empirically converges faster than the prox-linear algorithm in the iteration count for all considered .
Moreover, Figure5 illustrates that the number of iterations for Robust-AM increases as increases.
This implies that for each iteration, the convergence rate of Robust-AM is proportional to .
This supports our theoretical finding that the convergence parameter in TheoremIV.1 is an increasing function of as shown in Figure2(a).
V-BReal image experiments
(a)Ground-truth
(b)Recovered image by the prox-linear method
(c)Recovered image by our method
Figure 6: Example of recovery for an image data.
We further apply Robust-AM to a set of image data to show that Robust-AM continues outperforming the other competing methods for non-Gaussian measurement models.
We adopt the structured random measurement model in the experimental setting in [14, Section 6.3] given by
(16)
where denotes the normalized Hadamard matrix and are diagonal matrices whose diagonal entries are independently drawn uniformly random from .
The measurement vector is the -th column of for , where .
The linear measurement operator in (16) applies to the vectorized version of a 2D input image denoted by with .
The measurements corresponding to outliers are substituted by zero in the experiment.
Figure 7: Phase transition of success rate per and the fraction of outliers for zero outlier magnitude models.
Subfigures are displayed according to RobustPhaseMax (top-left), Median-RWF (top-right), prox-linear method (bottom-left), and Robust-AM (bottom-right).
Robust-AM and the competing algorithms are tested on the collection of images of handwritten digits111https://hastie.su.domains/ElemStatLearn/datasets/zip.digits.Figure7 compares the two methods in the empirical success rate over images, where the number of random modulations and the outlier fraction respectively vary over and .
Similar to the previous experiments on synthetic data, Figure7 demonstrates that Robust-AM outperforms the competing algorithms by providing recovery with smaller for each observed .
Since the algorithmic parameters of Median-RWF are specifically selected for Gaussian measurements in [13], we heuristically tuned the step size to so that Median-RWF performs for the measurement setting (16).
We first prove by the induction on the iteration index that
(17)
holds for all for some numerical constant and depending only on .
Let be arbitrarily fixed. Suppose that satisfies (17) for all .
Note that the distance between and is written as
(18)
where
Then we have and .
Therefore, it follows that
(19)
implies (17) for .
This completes the induction argument.
Therefore, it suffices to show that the hypothesis of the theorem implies (19).
For the sake of brevity, we denote the objective function of the optimization formulation in (7) by
Next, we derive a lower bound (resp. an upper bound) on (A) (resp. (B)) of (20).
By from the definition of in (2), (A) is written as
(21)
To simplify the partial summation over , we introduce the spherical wedge [28] defined by
(22)
Then if follows that and have the opposite sign if and only if .
Therefore, the summand in (a) is rewritten as
The second summand on the right-hand side provides a valid lower bound on (a) since the other summand is nonnegative.
Combining the above results, we obtain that
(23)
Similarly, (B) is written as
If , then and have the opposite sign and hence (b) satisfies
Otherwise, if , then .
Therefore, we have
(24)
By plugging in the bounds of (23) and (24) into (20), we obtain that (20) implies
(25)
By applying the triangle inequality to the summands in and , we obtain a necessary condition of (25) given by
(26)
We have shown that (20) implies (26).
In the remainder of the proof, we demonstrate that if (26) is satisfied, then (19) holds with high probability. This is achieved by applying a probabilistic lower bound on (c) and probabilistic upper bounds on (d) and (e), using
concentration inequalities.
To this end, note that the measurement vectors depend not only on the current iterate and the next iterate , but also on the indication functions within the spherical wedge in (c) and (e). Therefore, we consider the uniform bounds for all iterates and the collection of spherical wedges with the largest angle less than . We introduce the corresponding lemmas below.
Lemma VI.1.
Let and . Suppose that are independent copies of . Let
(27)
where is defined in (22).
Then there exists an absolute constant such that
Now we derive the largest angle for the spherical wedge . Since the angle between and is always acute, we have
(32)
where (i) holds since the project operator is non-expansive; (ii) follows since the induction hypothesis implies
where the second and the last inequalities follow from (14).
Hence, in LemmaVI.1, we plug in . Then the sample complexity in TheoremIV.1 invokes LemmaVI.1, (28), (29), and (30) hold with probability at least simultaneously. The remainder of the proof is conditioned on the events that (28), (29), and (30) hold.
By applying (28) and (29) to (c) and (d) of (26) and (30) to (e) of (26) with the choice of , we obtain
for
(33)
where
Since satisfies
for all , it is monotonically increasing in and upper-bounded as . This implies uniformly over . This completes the proof of (19).
Let be arbitrarily fixed.
It follows from the definitions in (27) and (22) that is a cone.
Therefore, if and only if .
Furthermore, note that is uniformly distributed in .
Then we have
We proceed with the proof under the following four events, each of which holds with probability at least . The first event is defined as
(38)
which holds with probability at least .
Since by the assumption on outliers, we have a set with and the outliers are independent of . Hence, (38) is a direct result of (35) in LemmaVII.2. By following the same argument, we also have that
(39)
holds with probability at least .
Next, we describe the following event: for an arbitrary fixed , it holds with probability at least that
(40)
Again, since by the Assumption 1, we have a fixed set with and the outliers are independent of ,
(40) holds by (36) in LemmaVII.3.
Since (31) invokes LemmaVII.4 with probability at least , it holds with probability at least that
(41)
Since we have shown that (38),(39),(40) and (41) hold with probability
at least , we will move forward with the remainder of the proof by assuming those conditions are satisfied.
We first show (28). We observe that for an arbitrary and , it holds deterministically that
Hence, by taking infimum on both sides over sets and , we have
(42)
We first obtain a lower bound on (A) and an upper bound on (B). We have a lower bound on (A) by (38):
(43)
By taking (31) in (43) for a sufficiently large , we have
(44)
It remains to show an upper bound on (B). Under the event (41), we have
Therefore, by letting in (40), (40) gives an upper bound on (B):
Hence, putting the results (44) and (46) into (42) completes the proof of the statement (28).
For the proofs of remaining statements in (29) and (30), the upper bound in (29) is a direct consequence of (39) with choosing according (31). Lastly, (30) is the result of the upper bound of (B) in (46). These complete the proof of (29) and (30).
IX Conclusion
The least absolute deviation (LAD) has been a popular statistical method for regression in the presence of outliers. We consider the LAD approach to robust phase retrieval with the magnitude-only measurement model. To solve the resulting non-convex optimization, we derive a robust alternating minimization method (Robust-AM) as an unconstrained Gauss-Newton method.
Furthermore, we propose fast Robust-AM by exploiting efficient solvers and show that Robust-AM by ADMM converges faster than a similar approach known as the prox-linear by its efficient solver POGS [14].
We established a local convergence analysis of Robust-AM under the standard Gaussian measurement model when the support of sparse noise is arbitrarily fixed but magnitudes can be adversarial.
A suitably initialized Robust-AM converges linearly to the ground truth uniformly over all ground-truth signals when the number of measurements is proportional to the signal length and the outlier fraction is up to .
This theoretical result is comparable to existing prior art in the literature.
Furthermore, the numerical results show that Robust-AM outperforms the existing guaranteed methods for various outlier models in both synthetic data and real image data.
References
[1]
S. Kim and K. Lee, “Sequence of linear program for robust phase retrieval,” 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, to appear.
[2]
A. Walther, “The question of phase retrieval in optics,” Optica Acta: International Journal of Optics, vol. 10, no. 1, pp. 41–49, 1963.
[3]
O. Bunk, A. Diaz, F. Pfeiffer, C. David, B. Schmitt, D. K. Satapathy, and J. F. Van Der Veen, “Diffractive imaging for periodic samples: retrieving one-dimensional concentration profiles across microfluidic channels,” Acta Crystallographica Section A: Foundations of Crystallography, vol. 63, no. 4, pp. 306–314, 2007.
[4]
A. Chai, M. Moscoso, and G. Papanicolaou, “Array imaging using intensity-only measurements,” Inverse Problems, vol. 27, no. 1, p. 015005, 2010.
[5]
Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase retrieval with application to optical imaging: a contemporary overview,” IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 87–109, 2015.
[6]
D. S. Weller, A. Pnueli, G. Divon, O. Radzyner, Y. C. Eldar, and J. A. Fessler, “Undersampled phase retrieval with outliers,” IEEE Transactions on Computational Imaging, vol. 1, no. 4, pp. 247–258, 2015.
[7]
J. Dong, L. Valzania, A. Maillard, T.-a. Pham, S. Gigan, and M. Unser, “Phase retrieval: From computational imaging to machine learning: A tutorial,” IEEE Signal Processing Magazine, vol. 40, no. 1, pp. 45–57, 2023.
[8]
S. Bahmani and J. Romberg, “Phase retrieval meets statistical learning theory: A flexible convex relaxation,” in Artificial Intelligence and Statistics. PMLR, 2017, pp. 252–260.
[9]
T. Goldstein and C. Studer, “Phasemax: Convex phase retrieval via basis pursuit,” IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2675–2689, 2018.
[10]
P. Hand and V. Voroninski, “Corruption robust phase retrieval via linear programming,” arXiv preprint arXiv:1612.03547, 2016.
[11]
H. Zhang, Y. Zhou, Y. Liang, and Y. Chi, “A nonconvex approach for phase retrieval: Reshaped wirtinger flow and incremental algorithms,” Journal of Machine Learning Research, 2017.
[12]
G. Wang, G. B. Giannakis, and Y. C. Eldar, “Solving systems of random quadratic equations via truncated amplitude flow,” IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 773–794, 2017.
[13]
H. Zhang, Y. Chi, and Y. Liang, “Median-truncated nonconvex approach for phase retrieval with outliers,” IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 7287–7310, 2018.
[14]
J. C. Duchi and F. Ruan, “Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval,” Information and Inference: A Journal of the IMA, vol. 8, no. 3, pp. 471–529, 2019.
[15]
P. Bloomfield and W. L. Steiger, Least absolute deviations: theory, applications, and algorithms. Springer, 1983.
[16]
R. W. Gerchberg and W. O. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik, vol. 35, p. 237, 1972.
[17]
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011.
[18]
S. Wang and N. Shroff, “A new alternating direction method for linear programming,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[19]
J. van den Brand, “A deterministic linear program solver in current matrix multiplication time,” in Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 2020, pp. 259–278.
[20]
J. V. Burke and M. C. Ferris, “A Gauss–Newton method for convex composite optimization,” Mathematical Programming, vol. 71, no. 2, pp. 179–194, 1995.
[21]
F. Clarke, Optimization and Nonsmooth Analysis, ser. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics, 1990.
[22]
P. Netrapalli, P. Jain, and S. Sanghavi, “Phase retrieval using alternating minimization,” Advances in Neural Information Processing Systems, vol. 26, 2013.
[23]
N. Parikh and S. Boyd, “Block splitting for distributed optimization,” Mathematical Programming Computation, vol. 6, no. 1, pp. 77–102, 2014.
[24]
K. Holmström, A. O. Göran, and M. M. Edvall, “User’s guide for tomlab/cplex v12. 1,” Tomlab Optim. Retrieved, vol. 1, p. 2017, 2009.
[25]
L. Gurobi Optimization, “Gurobi optimizer reference manual,” 2021.
[26]
T. Yang and Q. Lin, “Rsg: Beating subgradient method without smoothness and strong convexity,” The Journal of Machine Learning Research, vol. 19, no. 1, pp. 236–268, 2018.
[27]
Y. Ye and E. Tse, “An extension of karmarkar’s projective algorithm for convex quadratic programming,” Mathematical programming, vol. 44, pp. 157–179, 1989.
[28]
Y. S. Tan and R. Vershynin, “Phase retrieval via randomized kaczmarz: theoretical guarantees,” Information and Inference: A Journal of the IMA, vol. 8, no. 1, pp. 97–123, 2019.
[29]
Y. Plan and R. Vershynin, “Dimension reduction by random hyperplane tessellations,” Discrete & Computational Geometry, vol. 51, no. 2, pp. 438–461, 2014.
[30]
——, “Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach,” IEEE Transactions on Information Theory, vol. 59, no. 1, pp. 482–494, 2012.