-
On the backward stability of s-step GMRES
Authors:
Erin Carson,
Yuxin Ma
Abstract:
Communication, i.e., data movement, is a critical bottleneck for the performance of classical Krylov subspace method solvers on modern computer architectures. Variants of these methods which avoid communication have been introduced, which, while equivalent in exact arithmetic, can be unstable in finite precision. In this work, we address the backward stability of s-step GMRES, also known as commun…
▽ More
Communication, i.e., data movement, is a critical bottleneck for the performance of classical Krylov subspace method solvers on modern computer architectures. Variants of these methods which avoid communication have been introduced, which, while equivalent in exact arithmetic, can be unstable in finite precision. In this work, we address the backward stability of s-step GMRES, also known as communication-avoiding GMRES. We present a framework for simplifying the analysis of s-step GMRES, which includes standard GMRES (s=1) as a special case, by isolating the effects of rounding errors in the QR factorization and the solution of the least squares problem. Using this framework, we analyze s-step GMRES with popular block orthogonalization methods: block modified Gram--Schmidt and reorthogonalized block classical Gram--Schmidt algorithms. An example illustrates the resulting instability of s-step GMRES when paired with the classical s-step Arnoldi process and shows the limitations of popular strategies for resolving this instability. To address this issue, we propose a modified Arnoldi process that allows for much larger block size s while maintaining satisfactory accuracy, as confirmed by our numerical experiments.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
On the loss of orthogonality in low-synchronization variants of reorthogonalized block classical Gram-Schmidt
Authors:
Erin Carson,
Kathryn Lund,
Yuxin Ma,
Eda Oktay
Abstract:
Interest in communication-avoiding orthogonalization schemes for high-performance computing has been growing recently. This manuscript addresses open questions about the numerical stability of various block classical Gram-Schmidt variants that have been proposed in the past few years. An abstract framework is employed, the flexibility of which allows for new rigorous bounds on the loss of orthogon…
▽ More
Interest in communication-avoiding orthogonalization schemes for high-performance computing has been growing recently. This manuscript addresses open questions about the numerical stability of various block classical Gram-Schmidt variants that have been proposed in the past few years. An abstract framework is employed, the flexibility of which allows for new rigorous bounds on the loss of orthogonality in these variants. We first analyze a generalization of (reorthogonalized) block classical Gram-Schmidt and show that a "strong" intrablock orthogonalization routine is only needed for the very first block in order to maintain orthogonality on the level of the unit roundoff.
Then, using this variant, which has four synchronization points per block column, we remove the synchronization points one at a time and analyze how each alteration affects the stability of the resulting method. Our analysis shows that the variant requiring only one synchronization per block column cannot be guaranteed to be stable in practice, as stability begins to degrade with the first reduction of synchronization points.
Our analysis of block methods also provides new theoretical results for the single-column case. In particular, it is proven that DCGS2 from [Bielich, D. et al. Par. Comput. 112 (2022)] and CGS-2 from [Świrydowicz, K. et al, Num. Lin. Alg. Appl. 28 (2021)] are as stable as Householder QR. Numerical examples from the BlockStab toolbox are included throughout, to help compare variants and illustrate the effects of different choices of intraorthogonalization subroutines.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Mixed precision HODLR matrices
Authors:
Erin Carson,
Xinye Chen,
Xiaobo Liu
Abstract:
Hierarchical matrix computations have attracted significant attention in the science and engineering community as exploiting data-sparse structures can significantly reduce the computational complexity of many important kernels. One particularly popular option within this class is the Hierarchical Off-Diagonal Low-Rank (HODLR) format. In this paper, we show that the off-diagonal blocks of HODLR ma…
▽ More
Hierarchical matrix computations have attracted significant attention in the science and engineering community as exploiting data-sparse structures can significantly reduce the computational complexity of many important kernels. One particularly popular option within this class is the Hierarchical Off-Diagonal Low-Rank (HODLR) format. In this paper, we show that the off-diagonal blocks of HODLR matrices that are approximated by low-rank matrices can be represented in low precision without degenerating the quality of the overall approximation (with the error growth bounded by a factor of $2$). We also present an adaptive-precision scheme for constructing and storing HODLR matrices, and we prove that the use of mixed precision does not compromise the numerical stability of the resulting HOLDR matrix--vector product and LU factorization. That is, the resulting error in these computations is not significantly greater than the case where we use one precision (say, double) for constructing and storing the HOLDR matrix. Our analyses further give insight on how one must choose the working precision in HOLDR matrix computations relative to the approximation error in order to not observe the effects of finite precision. Intuitively, when a HOLDR matrix is subject to a high degree of approximation error, subsequent computations can be performed in a lower precision without detriment. We demonstrate the validity of our theoretical results through a range of numerical experiments.
△ Less
Submitted 10 August, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Computing $k$-means in mixed precision
Authors:
Erin Carson,
Xinye Chen,
Xiaobo Liu
Abstract:
The $k$-means algorithm is one of the most popular and critical techniques in data mining and machine learning, and it has achieved significant success in numerous science and engineering domains. Computing $k$-means to a global optimum is NP-hard in Euclidean space, yet there are a variety of efficient heuristic algorithms, such as Lloyd's algorithm, that converge to a local optimum with superpol…
▽ More
The $k$-means algorithm is one of the most popular and critical techniques in data mining and machine learning, and it has achieved significant success in numerous science and engineering domains. Computing $k$-means to a global optimum is NP-hard in Euclidean space, yet there are a variety of efficient heuristic algorithms, such as Lloyd's algorithm, that converge to a local optimum with superpolynomial complexity in the worst case. Motivated by the emergence and prominence of mixed precision capabilities in hardware, a current trend is to develop low and mixed precision variants of algorithms in order to improve the runtime and energy consumption. In this paper we study the numerical stability of Lloyd's $k$-means algorithm, and, in particular, we confirm the stability of the widely used distance computation formula. We propose a mixed-precision framework for $k$-means computation and investigate the effects of low-precision distance computation within the framework. Through extensive simulations on various data clustering and image segmentation tasks, we verify the applicability and robustness of the mixed precision $k$-means method. We find that, in $k$-means computation, normalized data is more tolerant to the reduction of precision in the distance computation, while for nonnormalized data more care is needed in the use of reduced precision, mainly to avoid overflow. Our study demonstrates the potential for the use of mixed precision to accelerate the $k$-means computation and offers some insights into other distance-based machine learning methods.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
A comparison of mixed precision iterative refinement approaches for least-squares problems
Authors:
Erin Carson,
Ieva Daužickaitė
Abstract:
Various approaches to iterative refinement (IR) for least-squares problems have been proposed in the literature and it may not be clear which approach is suitable for a given problem. We consider three approaches to IR for least-squares problems when two precisions are used and review their theoretical guarantees, known shortcomings and when the method can be expected to recognize that the correct…
▽ More
Various approaches to iterative refinement (IR) for least-squares problems have been proposed in the literature and it may not be clear which approach is suitable for a given problem. We consider three approaches to IR for least-squares problems when two precisions are used and review their theoretical guarantees, known shortcomings and when the method can be expected to recognize that the correct solution has been found, and extend uniform precision analysis for an IR approach based on the semi-normal equations to the two-precision case. We focus on the situation where it is desired to refine the solution to the working precision level. It is shown that the IR methods exhibit different sensitivities to the conditioning of the problem and the size of the least-squares residual, which should be taken into account when choosing the IR approach. We also discuss a new approach that is based on solving multiple least-squares problems.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Reorthogonalized Pythagorean variants of block classical Gram-Schmidt
Authors:
Erin Carson,
Kathryn Lund,
Yuxin Ma,
Eda Oktay
Abstract:
Block classical Gram-Schmidt (BCGS) is commonly used for orthogonalizing a set of vectors $X$ in distributed computing environments due to its favorable communication properties relative to other orthogonalization approaches, such as modified Gram-Schmidt or Householder. However, it is known that BCGS (as well as recently developed low-synchronization variants of BCGS) can suffer from a significan…
▽ More
Block classical Gram-Schmidt (BCGS) is commonly used for orthogonalizing a set of vectors $X$ in distributed computing environments due to its favorable communication properties relative to other orthogonalization approaches, such as modified Gram-Schmidt or Householder. However, it is known that BCGS (as well as recently developed low-synchronization variants of BCGS) can suffer from a significant loss of orthogonality in finite-precision arithmetic, which can contribute to instability and inaccurate solutions in downstream applications such as $s$-step Krylov subspace methods. A common solution to improve the orthogonality among the vectors is reorthogonalization. Focusing on the "Pythagorean" variant of BCGS, introduced in [E. Carson, K. Lund, & M. Rozložník. SIAM J. Matrix Anal. Appl. 42(3), pp. 1365--1380, 2021], which guarantees an $O(\varepsilon)κ^2(X)$ bound on the loss of orthogonality as long as $O(\varepsilon)κ^2(X)<1$, where $\varepsilon$ denotes the unit roundoff, we introduce and analyze two reorthogonalized Pythagorean BCGS variants. These variants feature favorable communication properties, with asymptotically two synchronization points per block column, as well as an improved $O(\varepsilon)$ bound on the loss of orthogonality. Our bounds are derived in a general fashion to additionally allow for the analysis of mixed-precision variants. We verify our theoretical results with a panel of test matrices and experiments from a new version of the \texttt{BlockStab} toolbox.
△ Less
Submitted 8 September, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Mixed Precision FGMRES-Based Iterative Refinement for Weighted Least Squares
Authors:
Erin Carson,
Eda Oktay
Abstract:
With the recent emergence of mixed precision hardware, there has been a renewed interest in its use for solving numerical linear algebra problems fast and accurately. The solution of least squares (LS) problems $\min_x\|b-Ax\|_2$, where $A \in \mathbb{R}^{m\times n}$, arise in numerous application areas. Overdetermined standard least squares problems can be solved by using mixed precision within t…
▽ More
With the recent emergence of mixed precision hardware, there has been a renewed interest in its use for solving numerical linear algebra problems fast and accurately. The solution of least squares (LS) problems $\min_x\|b-Ax\|_2$, where $A \in \mathbb{R}^{m\times n}$, arise in numerous application areas. Overdetermined standard least squares problems can be solved by using mixed precision within the iterative refinement method of Björck, which transforms the least squares problem into an $(m+n)\times(m+n)$ ''augmented'' system. It has recently been shown that mixed precision GMRES-based iterative refinement can also be used, in an approach termed GMRES-LSIR. In practice, we often encounter types of least squares problems beyond standard least squares, including weighted least squares (WLS), $\min_x\|D^{1/2}(b-Ax)\|_2$, where $D^{1/2}$ is a diagonal matrix of weights. In this paper, we discuss a mixed precision FGMRES-WLSIR algorithm for solving WLS problems using two different preconditioners.
△ Less
Submitted 26 January, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Mixed Precision Iterative Refinement with Adaptive Precision Sparse Approximate Inverse Preconditioning
Authors:
Noaman Khan,
Erin Carson
Abstract:
Hardware trends have motivated the development of mixed precision algo-rithms in numerical linear algebra, which aim to decrease runtime while maintaining acceptable accuracy. One recent development is the development of an adaptive precision sparse matrix-vector produce routine, which may be used to accelerate the solution of sparse linear systems by iterative methods. This approach is also appli…
▽ More
Hardware trends have motivated the development of mixed precision algo-rithms in numerical linear algebra, which aim to decrease runtime while maintaining acceptable accuracy. One recent development is the development of an adaptive precision sparse matrix-vector produce routine, which may be used to accelerate the solution of sparse linear systems by iterative methods. This approach is also applicable to the application of inexact preconditioners, such as sparse approximate inverse preconditioners used in Krylov subspace methods. In this work, we develop an adaptive precision sparse approximate inverse preconditioner and demonstrate its use within a five-precision GMRES-based iterative refinement method. We call this algorithm variant BSPAI-GMRES-IR. We then analyze the conditions for the convergence of BSPAI-GMRES-IR, and determine settings under which BSPAI-GMRES-IR will produce similar backward and forward errors as the existing SPAI-GMRES-IR method, the latter of which does not use adaptive precision in preconditioning. Our numerical experiments show that this approach can potentially lead to a reduction in the cost of storing and applying sparse approximate inverse preconditioners, although a significant reduction in cost may comes at the expense of increasing the number of GMRES iterations required for convergence.
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
The effect of approximate coarsest-level solves on the convergence of multigrid V-cycle methods
Authors:
Petr Vacek,
Erin Carson,
Kirk M. Soodhalter
Abstract:
The multigrid V-cycle method is a popular method for solving systems of linear equations. It computes an approximate solution by using smoothing on fine levels and solving a system of linear equations on the coarsest level. Solving on the coarsest level depends on the size and difficulty of the problem. If the size permits, it is typical to use a direct method based on LU or Cholesky decomposition…
▽ More
The multigrid V-cycle method is a popular method for solving systems of linear equations. It computes an approximate solution by using smoothing on fine levels and solving a system of linear equations on the coarsest level. Solving on the coarsest level depends on the size and difficulty of the problem. If the size permits, it is typical to use a direct method based on LU or Cholesky decomposition. In settings with large coarsest-level problems, approximate solvers such as iterative Krylov subspace methods, or direct methods based on low-rank approximation, are often used. The accuracy of the coarsest-level solver is typically determined based on the experience of the users with the concrete problems and methods.
In this paper we present an approach to analyzing the effects of approximate coarsest-level solves on the convergence of the V-cycle method for symmetric positive definite problems. Using these results, we derive coarsest-level stopping criterion through which we may control the difference between the approximation computed by a V-cycle method with approximate coarsest-level solver and the approximation which would be computed if the coarsest-level problems were solved exactly. The coarsest-level stopping criterion may thus be set up such that the V-cycle method converges to a chosen finest-level accuracy in (nearly) the same number of V-cycle iterations as the V-cycle method with exact coarsest-level solver. We also utilize the theoretical results to discuss how the convergence of the V-cycle method may be affected by the choice of a tolerance in a coarsest-level stopping criterion based on the relative residual norm.
△ Less
Submitted 7 May, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Mixed Precision Rayleigh Quotient Iteration for Total Least Squares Problems
Authors:
Eda Oktay,
Erin Carson
Abstract:
With the recent emergence of mixed precision hardware, there has been a renewed interest in its use for solving numerical linear algebra problems fast and accurately. The solution of total least squares problems, i.e., solving $\min_{E,r} \| [E, r]\|_F$ subject to $(A+E)x=b+r$, arises in numerous applications. Solving this problem requires finding the smallest singular value and corresponding righ…
▽ More
With the recent emergence of mixed precision hardware, there has been a renewed interest in its use for solving numerical linear algebra problems fast and accurately. The solution of total least squares problems, i.e., solving $\min_{E,r} \| [E, r]\|_F$ subject to $(A+E)x=b+r$, arises in numerous applications. Solving this problem requires finding the smallest singular value and corresponding right singular vector of $[A,b]$, which is challenging when $A$ is large and sparse. An efficient algorithm for this case due to Björck et al. [SIAM J. Matrix Anal. Appl. 22(2), 2000], called RQI-PCGTLS, is based on Rayleigh quotient iteration coupled with the preconditioned conjugate gradient method.
We develop a mixed precision variant of this algorithm, RQI-PCGTLS-MP, in which up to three different precisions can be used. We assume that the lowest precision is used in the computation of the preconditioner, and give theoretical constraints on how this precision must be chosen to ensure stability. In contrast to standard least squares, for total least squares, the resulting constraint depends not only on the matrix $A$, but also on the right-hand side $b$. We perform a number of numerical experiments on model total least squares problems used in the literature, which demonstrate that our algorithm can attain the same accuracy as RQI-PCGTLS albeit with a potential convergence delay due to the use of low precision. Performance modeling shows that the mixed precision approach can achieve up to a $4\times$ speedup depending on the size of the matrix and the number of Rayleigh quotient iterations performed.
△ Less
Submitted 13 September, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
The stability of split-preconditioned FGMRES in four precisions
Authors:
Erin Carson,
Ieva Daužickaitė
Abstract:
We consider the split-preconditioned FGMRES method in a mixed precision framework, in which four potentially different precisions can be used for computations with the coefficient matrix, application of the left preconditioner, application of the right preconditioner, and the working precision. Our analysis is applicable to general preconditioners. We obtain bounds on the backward and forward erro…
▽ More
We consider the split-preconditioned FGMRES method in a mixed precision framework, in which four potentially different precisions can be used for computations with the coefficient matrix, application of the left preconditioner, application of the right preconditioner, and the working precision. Our analysis is applicable to general preconditioners. We obtain bounds on the backward and forward errors in split-preconditioned FGMRES. Our analysis further provides insight into how the various precisions should be chosen; under certain assumptions, a suitable selection guarantees a backward error on the order of the working precision.
△ Less
Submitted 12 September, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Towards understanding CG and GMRES through examples
Authors:
Erin Carson,
Jörg Liesen,
Zdeněk Strakoš
Abstract:
When the CG method for solving linear algebraic systems was formulated about 70 years ago by Lanczos, Hestenes, and Stiefel, it was considered an iterative process possessing a mathematical finite termination property. CG was placed into a rich mathematical context, including links with Gauss quadrature and continued fractions. The optimality property of CG was described via a normalized weighted…
▽ More
When the CG method for solving linear algebraic systems was formulated about 70 years ago by Lanczos, Hestenes, and Stiefel, it was considered an iterative process possessing a mathematical finite termination property. CG was placed into a rich mathematical context, including links with Gauss quadrature and continued fractions. The optimality property of CG was described via a normalized weighted polynomial least squares approximation to zero. This highly nonlinear problem explains the adaptation of CG iterates to the given data. Karush and Hayes immediately considered CG in infinite dimensional Hilbert spaces and investigated its superlinear convergence. Since then, the view of CG and other Krylov subspace methods has changed. Today these methods are primarily used as computational tools, and their behavior is typically characterized using linear upper bounds or heuristics based on clustering of eigenvalues. Such simplifications limit the mathematical understanding and also negatively affect their practical application.
This paper offers a different perspective. Focusing on CG and GMRES, it presents mathematically important and practically relevant phenomena that uncover their behavior through a discussion of computed examples. These examples provide an easily accessible approach that enables understanding of the methods, while pointers to more detailed analyses in the literature are given. This approach allows readers to choose the level of depth and thoroughness appropriate for their intentions. Some of the points made in this paper illustrate well known facts. Others challenge mainstream views and explain existing misunderstandings. Several points refer to recent results leading to open problems. We consider CG and GMRES crucially important for the mathematical understanding, further development, and practical applications also of other Krylov subspace methods.
△ Less
Submitted 1 February, 2024; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Using Mixed Precision in Low-Synchronization Reorthogonalized Block Classical Gram-Schmidt
Authors:
Eda Oktay,
Erin Carson
Abstract:
Using lower precision in algorithms can be beneficial in terms of reducing both computation and communication costs. Motivated by this, we aim to further the state-of-the-art in developing and analyzing mixed precision variants of iterative methods. In this work, we focus on the block variant of low-synchronization classical Gram-Schmidt with reorthogonalization, which we call BCGSI+LS. We demonst…
▽ More
Using lower precision in algorithms can be beneficial in terms of reducing both computation and communication costs. Motivated by this, we aim to further the state-of-the-art in developing and analyzing mixed precision variants of iterative methods. In this work, we focus on the block variant of low-synchronization classical Gram-Schmidt with reorthogonalization, which we call BCGSI+LS. We demonstrate that the loss of orthogonality produced by this orthogonalization scheme can exceed $O(u)κ(\mathcal{X})$, where $u$ is the unit roundoff and $κ(\mathcal{X})$ is the condition number of the matrix to be orthogonalized, and thus we can not in general expect this to result in a backward stable block GMRES implementation. We then develop a mixed precision variant of this algorithm, called BCGSI+LS-MP, which uses higher precision in certain parts of the computation. We demonstrate experimentally that for a number of challenging test problems, our mixed precision variant successfully maintains a loss of orthogonality below $O(u)κ(\mathcal{X})$. This indicates that we can achieve a backward stable block GMRES algorithm that requires only one synchronization per iteration.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Single-pass Nyström approximation in mixed precision
Authors:
Erin Carson,
Ieva Daužickaitė
Abstract:
Low rank matrix approximations appear in a number of scientific computing applications. We consider the Nyström method for approximating a positive semidefinite matrix $A$. In the case that $A$ is very large or its entries can only be accessed once, a single-pass version may be necessary. In this work, we perform a complete rounding error analysis of the single-pass Nyström method in two precision…
▽ More
Low rank matrix approximations appear in a number of scientific computing applications. We consider the Nyström method for approximating a positive semidefinite matrix $A$. In the case that $A$ is very large or its entries can only be accessed once, a single-pass version may be necessary. In this work, we perform a complete rounding error analysis of the single-pass Nyström method in two precisions, where the computation of the expensive matrix product with $A$ is assumed to be performed in the lower of the two precisions. Our analysis gives insight into how the sketching matrix and shift should be chosen to ensure stability, implementation aspects which have been commented on in the literature but not yet rigorously justified.
We further develop a heuristic to determine how to pick the lower precision, which confirms the general intuition that the lower the desired rank of the approximation, the lower the precision we can use without detriment. We also demonstrate that our mixed precision Nyström method can be used to inexpensively construct limited memory preconditioners for the conjugate gradient method and derive a bound the condition number of the resulting preconditioned coefficient matrix. We present numerical experiments on a set of matrices with various spectral decays and demonstrate the utility of our mixed precision approach.
△ Less
Submitted 21 July, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Iterated Gauss-Seidel GMRES
Authors:
Stephen Thomas,
Erin Carson,
Miro Rozložník,
Arielle Carr,
Kasia Świrydowicz
Abstract:
The GMRES algorithm of Saad and Schultz (1986) is an iterative method for approximately solving linear systems $A{\bf x}={\bf b}$, with initial guess ${\bf x}_0$ and residual ${\bf r}_0 = {\bf b} - A{\bf x}_0$. The algorithm employs the Arnoldi process to generate the Krylov basis vectors (the columns of $V_k$). It is well known that this process can be viewed as a $QR$ factorization of the matrix…
▽ More
The GMRES algorithm of Saad and Schultz (1986) is an iterative method for approximately solving linear systems $A{\bf x}={\bf b}$, with initial guess ${\bf x}_0$ and residual ${\bf r}_0 = {\bf b} - A{\bf x}_0$. The algorithm employs the Arnoldi process to generate the Krylov basis vectors (the columns of $V_k$). It is well known that this process can be viewed as a $QR$ factorization of the matrix $B_k = [\: {\bf r}_0, AV_k\:]$ at each iteration. Despite an ${O}(ε)κ(B_k)$ loss of orthogonality, for unit roundoff $ε$ and condition number $κ$, the modified Gram-Schmidt formulation was shown to be backward stable in the seminal paper by Paige et al. (2006). We present an iterated Gauss-Seidel formulation of the GMRES algorithm (IGS-GMRES) based on the ideas of Ruhe (1983) and Świrydowicz et al. (2020). IGS-GMRES maintains orthogonality to the level ${O}(ε)κ(B_k)$ or ${O}(ε)$, depending on the choice of one or two iterations; for two Gauss-Seidel iterations, the computed Krylov basis vectors remain orthogonal to working precision and the smallest singular value of $V_k$ remains close to one. The resulting GMRES method is thus backward stable. We show that IGS-GMRES can be implemented with only a single synchronization point per iteration, making it relevant to large-scale parallel computing environments. We also demonstrate that, unlike MGS-GMRES, in IGS-GMRES the relative Arnoldi residual corresponding to the computed approximate solution no longer stagnates above machine precision even for highly non-normal systems.
△ Less
Submitted 20 March, 2023; v1 submitted 16 May, 2022;
originally announced May 2022.
-
Multiple Domain Causal Networks
Authors:
Tianhui Zhou,
William E. Carson IV,
Michael Hunter Klein,
David Carlson
Abstract:
Observational studies are regarded as economic alternatives to randomized trials, often used in their stead to investigate and determine treatment efficacy. Due to lack of sample size, observational studies commonly combine data from multiple sources or different sites/centers. Despite the benefits of an increased sample size, a naive combination of multicenter data may result in incongruities ste…
▽ More
Observational studies are regarded as economic alternatives to randomized trials, often used in their stead to investigate and determine treatment efficacy. Due to lack of sample size, observational studies commonly combine data from multiple sources or different sites/centers. Despite the benefits of an increased sample size, a naive combination of multicenter data may result in incongruities stemming from center-specific protocols for generating cohorts or reactions towards treatments distinct to a given center, among other things. These issues arise in a variety of other contexts, including capturing a treatment effect related to an individual's unique biological characteristics. Existing methods for estimating heterogeneous treatment effects have not adequately addressed the multicenter context, but rather treat it simply as a means to obtain sufficient sample size. Additionally, previous approaches to estimating treatment effects do not straightforwardly generalize to the multicenter design, especially when required to provide treatment insights for patients from a new, unobserved center. To address these shortcomings, we propose Multiple Domain Causal Networks (MDCN), an approach that simultaneously strengthens the information sharing between similar centers while addressing the selection bias in treatment assignment through learning of a new feature embedding. In empirical evaluations, MDCN is consistently more accurate when estimating the heterogeneous treatment effect in new centers compared to benchmarks that adjust solely based on treatment imbalance or general center differences. Finally, we justify our approach by providing theoretical analyses that demonstrate that MDCN improves on the generalization bound of the new, unobserved target center.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Mixed Precision Iterative Refinement with Sparse Approximate Inverse Preconditioning
Authors:
Erin Carson,
Noaman Khan
Abstract:
With the commercial availability of mixed precision hardware, mixed precision GMRES-based iterative refinement schemes have emerged as popular approaches for solving sparse linear systems. Existing analyses of these approaches, however, are based on using full LU factorizations to construct preconditioners for use within GMRES in each refinement step. In practical applications, inexact preconditio…
▽ More
With the commercial availability of mixed precision hardware, mixed precision GMRES-based iterative refinement schemes have emerged as popular approaches for solving sparse linear systems. Existing analyses of these approaches, however, are based on using full LU factorizations to construct preconditioners for use within GMRES in each refinement step. In practical applications, inexact preconditioning techniques, such as incomplete LU or sparse approximate inverses, are often used for performance reasons.
In this work, we investigate the use of sparse approximate inverse preconditioners based on Frobenius norm minimization within GMRES-based iterative refinement. We analyze the computation of sparse approximate inverses in finite precision and derive constraints under which user-specified stopping criteria will be satisfied. We then analyze the behavior of and convergence constraints for a five-precision GMRES-based iterative refinement scheme that uses sparse approximate inverse preconditioning, which we call SPAI-GMRES-IR. Our numerical experiments confirm the theoretical analysis and illustrate the resulting tradeoffs between preconditioner sparsity and GMRES-IR convergence rate.
△ Less
Submitted 31 August, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Mixed Precision GMRES-based Iterative Refinement with Recycling
Authors:
Eda Oktay,
Erin Carson
Abstract:
With the emergence of mixed precision capabilities in hardware, iterative refinement schemes for solving linear systems $Ax=b$ have recently been revisited and reanalyzed in the context of three or more precisions. These new analyses show that under certain constraints on condition number, the LU factorization of the matrix can be computed in low precision without affecting the final accuracy. Ano…
▽ More
With the emergence of mixed precision capabilities in hardware, iterative refinement schemes for solving linear systems $Ax=b$ have recently been revisited and reanalyzed in the context of three or more precisions. These new analyses show that under certain constraints on condition number, the LU factorization of the matrix can be computed in low precision without affecting the final accuracy. Another promising technique is GMRES-based iterative refinement, which, in contrast to the standard approach, use GMRES preconditioned by the low-precision triangular factors to solve for the approximate solution update in each refinement step. This more accurate solution method extends the range of problems which can be solved with a given combination of precisions. However, in certain settings, GMRES may require too many iterations per refinement step, making it potentially more expensive than simply recomputing the LU factors in a higher precision.
Krylov subspace recycling is a well-known technique for reusing information across sequential invocations of a Krylov subspace method on systems with the same or a slowly changing coefficient matrix. In this work, we incorporate the idea of Krylov subspace recycling into a mixed precision GMRES-based iterative refinement solver. The insight is that in each refinement step, we call preconditioned GMRES on a linear system with the same coefficient matrix $A$, with only the right-hand side changing. In this way, the GMRES solves in subsequent refinement steps can be accelerated by recycling information obtained from the first step. We perform extensive numerical experiments on various random dense problems, Toeplitz problems (prolate matrices), and problems from real applications, which confirm the benefits of the recycling approach.
△ Less
Submitted 16 February, 2022; v1 submitted 24 January, 2022;
originally announced January 2022.
-
AugmentedPCA: A Python Package of Supervised and Adversarial Linear Factor Models
Authors:
William E. Carson IV,
Austin Talbot,
David Carlson
Abstract:
Deep autoencoders are often extended with a supervised or adversarial loss to learn latent representations with desirable properties, such as greater predictivity of labels and outcomes or fairness with respects to a sensitive variable. Despite the ubiquity of supervised and adversarial deep latent factor models, these methods should demonstrate improvement over simpler linear approaches to be pre…
▽ More
Deep autoencoders are often extended with a supervised or adversarial loss to learn latent representations with desirable properties, such as greater predictivity of labels and outcomes or fairness with respects to a sensitive variable. Despite the ubiquity of supervised and adversarial deep latent factor models, these methods should demonstrate improvement over simpler linear approaches to be preferred in practice. This necessitates a reproducible linear analog that still adheres to an augmenting supervised or adversarial objective. We address this methodological gap by presenting methods that augment the principal component analysis (PCA) objective with either a supervised or an adversarial objective and provide analytic and reproducible solutions. We implement these methods in an open-source Python package, AugmentedPCA, that can produce excellent real-world baselines. We demonstrate the utility of these factor models on an open-source, RNA-seq cancer gene expression dataset, showing that augmenting with a supervised objective results in improved downstream classification performance, produces principal components with greater class fidelity, and facilitates identification of genes aligned with the principal axes of data variance with implications to development of specific types of cancer.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
Adversarial Factor Models for the Generation of Improved Autism Diagnostic Biomarkers
Authors:
William E. Carson IV,
Dmitry Isaev,
Samatha Major,
Guillermo Sapiro,
Geraldine Dawson,
David Carlson
Abstract:
Discovering reliable measures that inform on autism spectrum disorder (ASD) diagnosis is critical for providing appropriate and timely treatment for this neurodevelopmental disorder. In this work we present applications of adversarial linear factor models in the creation of improved biomarkers for ASD diagnosis. First, we demonstrate that an adversarial linear factor model can be used to remove co…
▽ More
Discovering reliable measures that inform on autism spectrum disorder (ASD) diagnosis is critical for providing appropriate and timely treatment for this neurodevelopmental disorder. In this work we present applications of adversarial linear factor models in the creation of improved biomarkers for ASD diagnosis. First, we demonstrate that an adversarial linear factor model can be used to remove confounding information from our biomarkers, ensuring that they contain only pertinent information on ASD. Second, we show this same model can be used to learn a disentangled representation of multimodal biomarkers that results in an increase in predictive performance. These results demonstrate that adversarial methods can address both biomarker confounds and improve biomarker predictive performance.
△ Less
Submitted 24 September, 2021;
originally announced November 2021.
-
Estimating Potential Outcome Distributions with Collaborating Causal Networks
Authors:
Tianhui Zhou,
William E Carson IV,
David Carlson
Abstract:
Traditional causal inference approaches leverage observational study data to estimate the difference in observed and unobserved outcomes for a potential treatment, known as the Conditional Average Treatment Effect (CATE). However, CATE corresponds to the comparison on the first moment alone, and as such may be insufficient in reflecting the full picture of treatment effects. As an alternative, est…
▽ More
Traditional causal inference approaches leverage observational study data to estimate the difference in observed and unobserved outcomes for a potential treatment, known as the Conditional Average Treatment Effect (CATE). However, CATE corresponds to the comparison on the first moment alone, and as such may be insufficient in reflecting the full picture of treatment effects. As an alternative, estimating the full potential outcome distributions could provide greater insights. However, existing methods for estimating treatment effect potential outcome distributions often impose restrictive or simplistic assumptions about these distributions. Here, we propose Collaborating Causal Networks (CCN), a novel methodology which goes beyond the estimation of CATE alone by learning the full potential outcome distributions. Estimation of outcome distributions via the CCN framework does not require restrictive assumptions of the underlying data generating process. Additionally, CCN facilitates estimation of the utility of each possible treatment and permits individual-specific variation through utility functions. CCN not only extends outcome estimation beyond traditional risk difference, but also enables a more comprehensive decision-making process through definition of flexible comparisons. Under assumptions commonly made in the causal literature, we show that CCN learns distributions that asymptotically capture the true potential outcome distributions. Furthermore, we propose an adjustment approach that is empirically effective in alleviating sample imbalance between treatment groups in observational data. Finally, we evaluate the performance of CCN in multiple synthetic and semi-synthetic experiments. We demonstrate that CCN learns improved distribution estimates compared to existing Bayesian and deep generative methods as well as improved decisions with respects to a variety of utility functions.
△ Less
Submitted 20 September, 2022; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Multistage Mixed Precision Iterative Refinement
Authors:
Eda Oktay,
Erin Carson
Abstract:
Low precision arithmetic, in particular half precision floating point arithmetic, is now available in commercial hardware. Using lower precision can offer significant savings in computation and communication costs with proportional savings in energy. Motivated by this, there has been a renewed interest in mixed precision iterative refinement for solving linear systems $Ax=b$, and new variants of G…
▽ More
Low precision arithmetic, in particular half precision floating point arithmetic, is now available in commercial hardware. Using lower precision can offer significant savings in computation and communication costs with proportional savings in energy. Motivated by this, there has been a renewed interest in mixed precision iterative refinement for solving linear systems $Ax=b$, and new variants of GMRES-based iterative refinement have been developed. Each particular variant with a given combination of precisions leads to different condition number-based constraints for convergence of the backward and forward errors, and each has different performance costs. The constraints for convergence given in the literature are, as an artifact of the analyses, often overly strict in practice, and thus could lead a user to select a more expensive variant when a less expensive one would have sufficed.
In this work, we develop a multistage mixed precision iterative refinement solver which aims to combine existing mixed precision approaches to balance performance and accuracy and improve usability. For an initial combination of precisions, the algorithm begins with the least expensive approach and convergence is monitored via inexpensive computations with quantities produced during the iteration. If slow convergence or divergence is detected using particular stopping criteria, the algorithm switches to use a more expensive, but more reliable variant. A novel aspect of our approach is that, unlike existing implementations, our algorithm first attempts to use ``stronger'' solvers for the solution update before resorting to increasing the precision(s). In some scenarios, this can avoid the need to refactorize the matrix in higher precision. We perform extensive numerical experiments on random dense problems and problems from real applications which confirm the benefits of the multistage approach.
△ Less
Submitted 15 November, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Synthesizing Multi-Tracer PET Images for Alzheimer's Disease Patients using a 3D Unified Anatomy-aware Cyclic Adversarial Network
Authors:
Bo Zhou,
Rui Wang,
Ming-Kai Chen,
Adam P. Mecca,
Ryan S. O'Dell,
Christopher H. Van Dyck,
Richard E. Carson,
James S. Duncan,
Chi Liu
Abstract:
Positron Emission Tomography (PET) is an important tool for studying Alzheimer's disease (AD). PET scans can be used as diagnostics tools, and to provide molecular characterization of patients with cognitive disorders. However, multiple tracers are needed to measure glucose metabolism (18F-FDG), synaptic vesicle protein (11C-UCB-J), and $β$-amyloid (11C-PiB). Administering multiple tracers to pati…
▽ More
Positron Emission Tomography (PET) is an important tool for studying Alzheimer's disease (AD). PET scans can be used as diagnostics tools, and to provide molecular characterization of patients with cognitive disorders. However, multiple tracers are needed to measure glucose metabolism (18F-FDG), synaptic vesicle protein (11C-UCB-J), and $β$-amyloid (11C-PiB). Administering multiple tracers to patient will lead to high radiation dose and cost. In addition, access to PET scans using new or less-available tracers with sophisticated production methods and short half-life isotopes may be very limited. Thus, it is desirable to develop an efficient multi-tracer PET synthesis model that can generate multi-tracer PET from single-tracer PET. Previous works on medical image synthesis focus on one-to-one fixed domain translations, and cannot simultaneously learn the feature from multi-tracer domains. Given 3 or more tracers, relying on previous methods will also create a heavy burden on the number of models to be trained. To tackle these issues, we propose a 3D unified anatomy-aware cyclic adversarial network (UCAN) for translating multi-tracer PET volumes with one unified generative model, where MR with anatomical information is incorporated. Evaluations on a multi-tracer PET dataset demonstrate the feasibility that our UCAN can generate high-quality multi-tracer PET volumes, with NMSE less than 15% for all PET tracers.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Mixed Precision $s$-step Lanczos and Conjugate Gradient Algorithms
Authors:
Erin Carson,
Tomáš Gergelits
Abstract:
Compared to the classical Lanczos algorithm, the $s$-step Lanczos variant has the potential to improve performance by asymptotically decreasing the synchronization cost per iteration. However, this comes at a cost. Despite being mathematically equivalent, the $s$-step variant is known to behave quite differently in finite precision, with potential for greater loss of accuracy and a decrease in the…
▽ More
Compared to the classical Lanczos algorithm, the $s$-step Lanczos variant has the potential to improve performance by asymptotically decreasing the synchronization cost per iteration. However, this comes at a cost. Despite being mathematically equivalent, the $s$-step variant is known to behave quite differently in finite precision, with potential for greater loss of accuracy and a decrease in the convergence rate relative to the classical algorithm. It has previously been shown that the errors that occur in the $s$-step version follow the same structure as the errors in the classical algorithm, but with the addition of an amplification factor that depends on the square of the condition number of the $O(s)-$dimensional Krylov bases computed in each outer loop. As the condition number of these $s$-step bases grows (in some cases very quickly) with $s$, this limits the parameter $s$ that can be chosen and thus limits the performance that can be achieved. In this work we show that if a select few computations in $s$-step Lanczos are performed in double the working precision, the error terms then depend only linearly on the conditioning of the $s$-step bases. This has the potential for drastically improving the numerical behavior of the algorithm with little impact on per-iteration performance. Our numerical experiments demonstrate the improved numerical behavior possible with the mixed precision approach, and also show that this improved behavior extends to the $s$-step CG algorithm in mixed precision.
△ Less
Submitted 30 August, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Multiparametric Cardiac 18F-FDG PET: Pilot Comparison of FDG Delivery Rate with 82Rb Myocardial Blood Flow
Authors:
Yang Zuo,
Javier E. Lopez,
Thomas W. Smith,
Cameron C. Foster,
Richard E. Carson,
Ramsey D. Badawi,
Guobao Wang
Abstract:
Myocardial blood flow (MBF) and flow reserve are usually quantified in the clinic with positron emission tomography (PET) using a perfusion-specific radiotracer (e.g. 82Rbchloride). However, the clinical accessibility of existing perfusion tracers remains limited. Meanwhile, 18F-fluorodeoxyglucose (FDG) is a commonly used radiotracer for PET metabolic imaging without similar limitations. In this p…
▽ More
Myocardial blood flow (MBF) and flow reserve are usually quantified in the clinic with positron emission tomography (PET) using a perfusion-specific radiotracer (e.g. 82Rbchloride). However, the clinical accessibility of existing perfusion tracers remains limited. Meanwhile, 18F-fluorodeoxyglucose (FDG) is a commonly used radiotracer for PET metabolic imaging without similar limitations. In this paper, we explore the potential of 18F-FDG for myocardial perfusion imaging by comparing the myocardial FDG delivery rate K1 with MBF as determined by dynamic 82Rb PET in fourteen human subjects with heart disease. Two sets of FDG K1 were derived from one-hour dynamic FDG scans. One was the original FDG K1 estimates and the other was the corresponding K1 values that were linearly normalized for blood glucose levels. A generalized Renkin-Crone model was used to fit FDG K1 with Rb MBF, which then allowed for a nonlinear extraction fraction correction for converting FDG K1 to MBF. The linear correlation between FDG-derived MBF and Rb MBF was moderate (r=0.79) before the glucose normalization and became much improved (r>0.9) after glucose normalization. The extraction fraction of FDG was also similar to that of Rb-chloride in the myocardium. The results from this pilot study suggest that dynamic cardiac FDG-PET with tracer kinetic modeling has the potential to provide MBF in addition to its conventional use for metabolic imaging.
△ Less
Submitted 12 July, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
An overview of block Gram-Schmidt methods and their stability properties
Authors:
Erin Carson,
Kathryn Lund,
Miroslav Rozložník,
Stephen Thomas
Abstract:
Block Gram-Schmidt algorithms serve as essential kernels in many scientific computing applications, but for many commonly used variants, a rigorous treatment of their stability properties remains open. This work provides a comprehensive categorization of block Gram-Schmidt algorithms, particularly those used in Krylov subspace methods to build orthonormal bases one block vector at a time. Known st…
▽ More
Block Gram-Schmidt algorithms serve as essential kernels in many scientific computing applications, but for many commonly used variants, a rigorous treatment of their stability properties remains open. This work provides a comprehensive categorization of block Gram-Schmidt algorithms, particularly those used in Krylov subspace methods to build orthonormal bases one block vector at a time. Known stability results are assembled, and new results are summarized or conjectured for important communication-reducing variants. Additionally, new block versions of low-synchronization variants are derived, and their efficacy and stability are demonstrated for a wide range of challenging examples. Numerical examples are computed with a versatile MATLAB package hosted at https://github.com/katlund/BlockStab, and scripts for reproducing all results in the paper are provided. Block Gram-Schmidt implementations in popular software packages are discussed, along with a number of open problems. An appendix containing all algorithms type-set in a uniform fashion is provided.
△ Less
Submitted 21 August, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic
Authors:
Ahmad Abdelfattah,
Hartwig Anzt,
Erik G. Boman,
Erin Carson,
Terry Cojean,
Jack Dongarra,
Mark Gates,
Thomas Grützmacher,
Nicholas J. Higham,
Sherry Li,
Neil Lindquist,
Yang Liu,
Jennifer Loe,
Piotr Luszczek,
Pratik Nayak,
Sri Pranesh,
Siva Rajamanickam,
Tobias Ribizel,
Barry Smith,
Kasia Swirydowicz,
Stephen Thomas,
Stanimire Tomov,
Yaohung M. Tsai,
Ichitaro Yamazaki,
Urike Meier Yang
Abstract:
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more t…
▽ More
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more than an order of magnitude higher performance than what is available in IEEE double precision. At the same time, the gap between the compute power on the one hand and the memory bandwidth on the other hand keeps increasing, making data access and communication prohibitively expensive compared to arithmetic operations. To start the multiprecision focus effort, we survey the numerical linear algebra community and summarize all existing multiprecision knowledge, expertise, and software capabilities in this landscape analysis report. We also include current efforts and preliminary results that may not yet be considered "mature technology," but have the potential to grow into production quality within the multiprecision focus effort. As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision technology can help improving the performance of these methods and present highlights of application significantly outperforming the traditional fixed precision methods.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
A Novel Loss Function Incorporating Imaging Acquisition Physics for PET Attenuation Map Generation using Deep Learning
Authors:
Luyao Shi,
John A. Onofrey,
Enette Mae Revilla,
Takuya Toyonaga,
David Menard,
Jo-seph Ankrah,
Richard E. Carson,
Chi Liu,
Yihuan Lu
Abstract:
In PET/CT imaging, CT is used for PET attenuation correction (AC). Mismatch between CT and PET due to patient body motion results in AC artifacts. In addition, artifact caused by metal, beam-hardening and count-starving in CT itself also introduces inaccurate AC for PET. Maximum likelihood reconstruction of activity and attenuation (MLAA) was proposed to solve those issues by simultaneously recons…
▽ More
In PET/CT imaging, CT is used for PET attenuation correction (AC). Mismatch between CT and PET due to patient body motion results in AC artifacts. In addition, artifact caused by metal, beam-hardening and count-starving in CT itself also introduces inaccurate AC for PET. Maximum likelihood reconstruction of activity and attenuation (MLAA) was proposed to solve those issues by simultaneously reconstructing tracer activity ($λ$-MLAA) and attenuation map ($μ$-MLAA) based on the PET raw data only. However, $μ$-MLAA suffers from high noise and $λ$-MLAA suffers from large bias as compared to the reconstruction using the CT-based attenuation map ($μ$-CT). Recently, a convolutional neural network (CNN) was applied to predict the CT attenuation map ($μ$-CNN) from $λ$-MLAA and $μ$-MLAA, in which an image-domain loss (IM-loss) function between the $μ$-CNN and the ground truth $μ$-CT was used. However, IM-loss does not directly measure the AC errors according to the PET attenuation physics, where the line-integral projection of the attenuation map ($μ$) along the path of the two annihilation events, instead of the $μ$ itself, is used for AC. Therefore, a network trained with the IM-loss may yield suboptimal performance in the $μ$ generation. Here, we propose a novel line-integral projection loss (LIP-loss) function that incorporates the PET attenuation physics for $μ$ generation. Eighty training and twenty testing datasets of whole-body 18F-FDG PET and paired ground truth $μ$-CT were used. Quantitative evaluations showed that the model trained with the additional LIP-loss was able to significantly outperform the model trained solely based on the IM-loss function.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
An Adaptive $s$-step Conjugate Gradient Algorithm with Dynamic Basis Updating
Authors:
Erin C. Carson
Abstract:
The adaptive $s$-step CG algorithm is a solver for sparse, symmetric positive definite linear systems designed to reduce the synchronization cost per iteration while still achieving a user-specified accuracy requirement. In this work, we improve the adaptive $s$-step conjugate gradient algorithm by use of iteratively updated estimates of the largest and smallest Ritz values, which give approximati…
▽ More
The adaptive $s$-step CG algorithm is a solver for sparse, symmetric positive definite linear systems designed to reduce the synchronization cost per iteration while still achieving a user-specified accuracy requirement. In this work, we improve the adaptive $s$-step conjugate gradient algorithm by use of iteratively updated estimates of the largest and smallest Ritz values, which give approximations of the largest and smallest eigenvalues of $A$, using a technique due to Meurant and Tich{\' y} [G. Meurant and P. Tich{\' y}, Numer. Algs. (2018), pp.~1--32]. The Ritz value estimates are used to dynamically update parameters for constructing Newton or Chebyshev polynomials so that the conditioning of the $s$-step bases can be continuously improved throughout the iterations. These estimates are also used to automatically set a variable related to the ratio of the sizes of the error and residual, which was previously treated as an input parameter. We show through numerical experiments that in many cases the new algorithm improves upon the previous adaptive $s$-step approach both in terms of numerical behavior and reduction in number of synchronizations.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
Predict-and-recompute conjugate gradient variants
Authors:
Tyler Chen,
Erin C. Carson
Abstract:
The standard implementation of the conjugate gradient algorithm suffers from communication bottlenecks on parallel architectures, due primarily to the two global reductions required every iteration. In this paper, we study conjugate gradient variants which decrease the runtime per iteration by overlapping global synchronizations, and in the case of pipelined variants, matrix-vector products. Throu…
▽ More
The standard implementation of the conjugate gradient algorithm suffers from communication bottlenecks on parallel architectures, due primarily to the two global reductions required every iteration. In this paper, we study conjugate gradient variants which decrease the runtime per iteration by overlapping global synchronizations, and in the case of pipelined variants, matrix-vector products. Through the use of a predict-and-recompute scheme, whereby recursively-updated quantities are first used as a predictor for their true values and then recomputed exactly at a later point in the iteration, these variants are observed to have convergence behavior nearly as good as the standard conjugate gradient implementation on a variety of test problems. We provide a rounding error analysis which provides insight into this observation. It is also verified experimentally that the variants studied do indeed reduce the runtime per iteration in practice and that they scale similarly to previously-studied communication-hiding variants. Finally, because these variants achieve good convergence without the use of any additional input parameters, they have the potential to be used in place of the standard conjugate gradient implementation in a range of applications.
△ Less
Submitted 20 March, 2021; v1 submitted 4 May, 2019;
originally announced May 2019.
-
The Adaptive $s$-step Conjugate Gradient Method
Authors:
Erin Carson
Abstract:
On modern large-scale parallel computers, the performance of Krylov subspace iterative methods is limited by global synchronization. This has inspired the development of $s$-step Krylov subspace method variants, in which iterations are computed in blocks of $s$, which can reduce the number of global synchronizations per iteration by a factor of $O(s)$.
Although the $s$-step variants are mathemat…
▽ More
On modern large-scale parallel computers, the performance of Krylov subspace iterative methods is limited by global synchronization. This has inspired the development of $s$-step Krylov subspace method variants, in which iterations are computed in blocks of $s$, which can reduce the number of global synchronizations per iteration by a factor of $O(s)$.
Although the $s$-step variants are mathematically equivalent to their classical counterparts, they can behave quite differently in finite precision depending on the parameter $s$. If $s$ is chosen too large, the $s$-step method can suffer a convergence delay and a decrease in attainable accuracy relative to the classical method. This makes it difficult for a potential user of such methods - the $s$ value that minimizes the time per iteration may not be the best $s$ for minimizing the overall time-to-solution, and further may cause an unacceptable decrease in accuracy.
Towards improving the reliability and usability of $s$-step Krylov subspace methods, in this work we derive the \emph{adaptive $s$-step CG method}, a variable $s$-step CG method where in block $k$, the parameter $s_k$ is determined automatically such that a user-specified accuracy is attainable. The method for determining $s_k$ is based on a bound on growth of the residual gap within block $k$, from which we derive a constraint on the condition numbers of the computed $O(s_k)$-dimensional Krylov subspace bases. The computations required for determining the block size $s_k$ can be performed without increasing the number of global synchronizations per block. Our numerical experiments demonstrate that the adaptive $s$-step CG method is able to attain up to the same accuracy as classical CG while still significantly reducing the total number of global synchronizations.
△ Less
Submitted 14 January, 2017;
originally announced January 2017.
-
OSETI with STACEE: A Search for Nanosecond Optical Transients from Nearby Stars
Authors:
D. S. Hanna,
J. Ball,
C. E. Covault,
J. E. Carson,
D. D. Driscoll,
P. Fortin,
D. M. Gingrich,
A. Jarvis,
J. Kildea,
T. Lindner,
C. Mueller,
R. Mukherjee,
R. A. Ong,
K. Ragan,
D. A. Williams,
J. Zweerink
Abstract:
We have used the STACEE high-energy gamma-ray detector to look for fast blue-green laser pulses from the vicinity of 187 stars. The STACEE detector offers unprecedented light-collecting capability for the detection of nanosecond pulses from such lasers. We estimate STACEE's sensitivity to be approximately 10 photons per square meter at a wavelength of 420 nm. The stars have been chosen because t…
▽ More
We have used the STACEE high-energy gamma-ray detector to look for fast blue-green laser pulses from the vicinity of 187 stars. The STACEE detector offers unprecedented light-collecting capability for the detection of nanosecond pulses from such lasers. We estimate STACEE's sensitivity to be approximately 10 photons per square meter at a wavelength of 420 nm. The stars have been chosen because their characteristics are such that they may harbor habitable planets and they are relatively close to Earth. Each star was observed for 10 minutes and we found no evidence for laser pulses in any of the data sets.
△ Less
Submitted 14 April, 2009;
originally announced April 2009.
-
Observations of the Pulsar PSR B1951+32 with the Solar Tower Atmospheric Cherenkov Effect Experiment
Authors:
J. Kildea,
J. Zweerink,
J. Ball,
J. E. Carson,
C. E. Covault,
D. D. Driscoll,
P. Fortin,
D. M. Gingrich,
D. S. Hanna,
A. Jarvis,
T. Lindner,
C. Mueller,
R. Mukherjee,
R. A. Ong,
K. Ragan,
D. A. Williams
Abstract:
We present the analysis and results of 12.5 hours of high-energy gamma-ray observations of the EGRET-detected pulsar PSR B1951+32 using the Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE). STACEE is an atmospheric Cherenkov detector, in Albuquerque, New Mexico, that detects cosmic gamma rays using the shower-front-sampling technique. STACEE's sensitivity to astrophysical sources at…
▽ More
We present the analysis and results of 12.5 hours of high-energy gamma-ray observations of the EGRET-detected pulsar PSR B1951+32 using the Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE). STACEE is an atmospheric Cherenkov detector, in Albuquerque, New Mexico, that detects cosmic gamma rays using the shower-front-sampling technique. STACEE's sensitivity to astrophysical sources at energies around 100 GeV allows it to investigate emission from gamma-ray pulsars with expected pulsed emission cutoffs below 100 GeV. We discuss the observations and analysis of STACEE's PSR 1951+32 data, accumulated during the 2005 and 2006 observing seasons.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.
-
STACEE Observations of 1ES 1218+304
Authors:
STACEE Collaboration,
R. Mukherjee,
N. Akhter,
J. Ball,
J. E. Carson,
C. E. Covault,
D. D. Driscoll,
P. Fortin,
D. M. Gingrich,
D. S. Hanna,
A. Jarvis,
J. Kildea,
T. Lindner,
C. Mueller,
R. A. Ong,
K. Ragan,
D. A. Williams,
J. Zweerink
Abstract:
We present the analysis and results of recent high-energy gamma-ray observations of the high energy-peaked BL Lac (HBL) object 1ES 1218+304 with the Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE). 1ES 1218+304 is an X-ray bright HBL at a redshift z=0.182. It has been predicted to be a gamma-ray emitter above 100 GeV, detectable by ground-based Cherenkov telescopes. Recently this so…
▽ More
We present the analysis and results of recent high-energy gamma-ray observations of the high energy-peaked BL Lac (HBL) object 1ES 1218+304 with the Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE). 1ES 1218+304 is an X-ray bright HBL at a redshift z=0.182. It has been predicted to be a gamma-ray emitter above 100 GeV, detectable by ground-based Cherenkov telescopes. Recently this source has been detected by MAGIC and VERITAS, confirming these predictions. STACEE's sensitivity to astrophysical sources at energies above 100 GeV allows it to explore high energy sources such as X-ray bright active galaxies and gamma-ray bursts. We present results from STACEE observations of 1ES 1218+304 in the 2006 and 2007 observing seasons.
△ Less
Submitted 22 October, 2007;
originally announced October 2007.
-
Gamma-Ray Burst Follow-up Observations with STACEE During 2003-2007
Authors:
STACEE Collaboration,
A. Jarvis,
J. Ball,
J. E. Carson,
C. E. Covault,
D. D. Driscoll,
P. Fortin,
D. M. Gingrich,
D. S. Hanna,
J. Kildea,
T. Lindner,
R. Mukherjee,
C. Mueller,
R. A. Ong,
K. Ragan,
D. A. Williams,
J. Zweerink
Abstract:
The Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE) is an atmospheric Cherenkov telescope (ACT) that uses a large mirror array to achieve a relatively low energy threshold. For sources with Crab-like spectra, at high elevations, the detector response peaks near 100 GeV. Gamma-ray burst (GRB) observations have been a high priority for the STACEE collaboration since the inception of t…
▽ More
The Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE) is an atmospheric Cherenkov telescope (ACT) that uses a large mirror array to achieve a relatively low energy threshold. For sources with Crab-like spectra, at high elevations, the detector response peaks near 100 GeV. Gamma-ray burst (GRB) observations have been a high priority for the STACEE collaboration since the inception of the experiment. We present the results of 20 GRB follow-up observations at times ranging from 3 minutes to 15 hours after the burst triggers. Where redshift measurements are available, we place constraints on the intrinsic high-energy spectra of the bursts.
△ Less
Submitted 22 October, 2007;
originally announced October 2007.
-
Search for Dark Matter Annihilation in Draco with STACEE
Authors:
STACEE Collaboration,
D. D. Driscoll,
J. Ball,
J. E. Carson,
C. E. Covault,
P. Fortin,
D. M. Gingrich,
D. S. Hanna,
A. Jarvis,
J. Kildea,
T. Lindner,
C. Mueller,
R. Mukherjee,
R. A. Ong,
K. Ragan,
D. A. Williams,
J. Zweerink
Abstract:
For some time, the Draco dwarf spheroidal galaxy has garnered interest as a possible source for the indirect detection of dark matter. Its large mass-to-light ratio and relative proximity to the Earth provide favorable conditions for the production of detectable gamma rays from dark matter self-annihilation in its core. The Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE) is an air-s…
▽ More
For some time, the Draco dwarf spheroidal galaxy has garnered interest as a possible source for the indirect detection of dark matter. Its large mass-to-light ratio and relative proximity to the Earth provide favorable conditions for the production of detectable gamma rays from dark matter self-annihilation in its core. The Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE) is an air-shower Cherenkov telescope located in Albuquerque, NM capable of detecting gamma rays at energies above 100 GeV. We present the results of the STACEE observations of Draco during the 2005-2006 observing season totaling 10 hours of livetime after cuts.
△ Less
Submitted 18 October, 2007;
originally announced October 2007.
-
The Energy Spectrum of the Blazar Markarian 421 Above 130 GeV
Authors:
J. E. Carson,
J. Kildea,
R. A. Ong,
J. Ball,
D. A. Bramel,
C. E. Covault,
D. Driscoll,
P. Fortin,
D. M. Gingrich,
D. S. Hanna,
T. Lindner,
C. Mueller,
A. Jarvis,
R. Mukherjee,
K. Ragan,
R. A. Scalzo,
D. A. Williams,
J. Zweerink
Abstract:
Markarian 421 (Mrk 421) was the first blazar detected at gamma-ray energies above 300 GeV, and it remains one of only twelve TeV blazars detected to date. TeV gamma-ray measurements of its flaring activity and spectral variability have placed constraints on models of the high-energy emission from blazars. However, observations between 50 and 300 GeV are rare, and the high-energy peak of the spec…
▽ More
Markarian 421 (Mrk 421) was the first blazar detected at gamma-ray energies above 300 GeV, and it remains one of only twelve TeV blazars detected to date. TeV gamma-ray measurements of its flaring activity and spectral variability have placed constraints on models of the high-energy emission from blazars. However, observations between 50 and 300 GeV are rare, and the high-energy peak of the spectral energy distribution (SED), predicted to be in this range, has never been directly detected. We present a detection of Mrk 421 above 100 GeV as made by the Solar Tower Atmospheric Cherenkov Effect Experiment (STACEE) during a multiwavelength campaign in early 2004. STACEE is a ground-based atmospheric Cherenkov telescope using the wavefront sampling technique to detect gamma rays at lower energies than achieved by most imaging Cherenkov telescopes. We also outline a method for reconstructing gamma-ray energies using a solar heliostat telescope. This technique was applied to the 2004 data, and we present the differential energy spectrum of Mrk 421 above 130 GeV. Assuming a differential photon flux dN/dE proportional to E^-a, we measure a spectral index a = 2.1 +/- 0.2 (statistical) +0.2/-0.1 (systematic). Finally, we discuss the STACEE spectrum in the context of the multiwavelength results from the same epoch.
△ Less
Submitted 19 December, 2006;
originally announced December 2006.
-
GLAST: physics goals and instrument status
Authors:
Jennifer E. Carson
Abstract:
The Gamma-ray Large Area Space Telescope (GLAST) is a space-based observatory scheduled to launch in October 2007 with two instruments: (1) the GLAST Burst Monitor (GBM), sensitive to photon energies between 8 keV and 25 MeV and optimized to detect gamma-ray bursts, and (2) the Large Area Telescope (LAT), sensitive to gamma rays between ~20 MeV and 300 GeV and designed to survey the gamma-ray sk…
▽ More
The Gamma-ray Large Area Space Telescope (GLAST) is a space-based observatory scheduled to launch in October 2007 with two instruments: (1) the GLAST Burst Monitor (GBM), sensitive to photon energies between 8 keV and 25 MeV and optimized to detect gamma-ray bursts, and (2) the Large Area Telescope (LAT), sensitive to gamma rays between ~20 MeV and 300 GeV and designed to survey the gamma-ray sky with unprecedented sensitivity. We describe the LAT and the GBM. We then focus on the LAT's capabilities for studying active galactic nuclei.
△ Less
Submitted 31 October, 2006;
originally announced October 2006.