Search | arXiv e-print repository

A Scalable Algorithm for Active Learning

Authors: Youguang Chen, Zheyu Wen, George Biros

Abstract: FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to… ▽ More FIRAL is a recently proposed deterministic active learning algorithm for multiclass classification using logistic regression. It was shown to outperform the state-of-the-art in terms of accuracy and robustness and comes with theoretical performance guarantees. However, its scalability suffers when dealing with datasets featuring a large number of points $n$, dimensions $d$, and classes $c$, due to its $\mathcal{O}(c^2d^2+nc^2d)$ storage and $\mathcal{O}(c^3(nd^2 + bd^3 + bn))$ computational complexity where $b$ is the number of points to select in active learning. To address these challenges, we propose an approximate algorithm with storage requirements reduced to $\mathcal{O}(n(d+c) + cd^2)$ and a computational complexity of $\mathcal{O}(bncd^2)$. Additionally, we present a parallel implementation on GPUs. We demonstrate the accuracy and scalability of our approach using MNIST, CIFAR-10, Caltech101, and ImageNet. The accuracy tests reveal no deterioration in accuracy compared to FIRAL. We report strong and weak scaling tests on up to 12 GPUs, for three million point synthetic dataset. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: To be appeared at SC'24. Link: https://sc24.conference-program.com/presentation/?id=pap624&sess=sess397

arXiv:2409.07379 [pdf, other]

FIRAL: An Active Learning Algorithm for Multinomial Logistic Regression

Authors: Youguang Chen, George Biros

Abstract: We investigate theory and algorithms for pool-based active learning for multiclass classification using multinomial logistic regression. Using finite sample analysis, we prove that the Fisher Information Ratio (FIR) lower and upper bounds the excess risk. Based on our theoretical analysis, we propose an active learning algorithm that employs regret minimization to minimize the FIR. To verify our d… ▽ More We investigate theory and algorithms for pool-based active learning for multiclass classification using multinomial logistic regression. Using finite sample analysis, we prove that the Fisher Information Ratio (FIR) lower and upper bounds the excess risk. Based on our theoretical analysis, we propose an active learning algorithm that employs regret minimization to minimize the FIR. To verify our derived excess risk bounds, we conduct experiments on synthetic datasets. Furthermore, we compare FIRAL with five other methods and found that our scheme outperforms them: it consistently produces the smallest classification error in the multiclass logistic regression setting, as demonstrated through experiments on MNIST, CIFAR-10, and 50-class ImageNet. △ Less

Submitted 11 September, 2024; originally announced September 2024.

Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2409.00207 [pdf, other]

A fast solver for the spatially homogeneous electron Boltzmann equation

Authors: Milinda Fernando, Daniil Bochkov, James Almgren-Bell, Todd Oliver, Robert Moser, Philip Varghese, Laxminarayan Raja, George Biros

Abstract: We present a numerical method for the velocity-space, spatially homogeneous, collisional Boltzmann equation for electron transport in low-temperature plasma (LTP) conditions. Modeling LTP plasmas is useful in many applications, including advanced manufacturing, material processing, semiconductor processing, and hypersonics, to name a few. Most state-of-the-art methods for electron kinetics are bas… ▽ More We present a numerical method for the velocity-space, spatially homogeneous, collisional Boltzmann equation for electron transport in low-temperature plasma (LTP) conditions. Modeling LTP plasmas is useful in many applications, including advanced manufacturing, material processing, semiconductor processing, and hypersonics, to name a few. Most state-of-the-art methods for electron kinetics are based on Monte-Carlo sampling for collisions combined with Lagrangian particle-in-cell methods. We discuss an Eulerian solver that approximates the electron velocity distribution function using spherical harmonics (angular components) and B-splines (energy component). Our solver supports electron-heavy elastic and inelastic binary collisions, electron-electron Coulomb interactions, steady-state and transient dynamics, and an arbitrary nmber of angular terms in the electron distribution function. We report convergence results and compare our solver to two other codes: an in-house particle Monte-Carlo ethod; and Bolsig+, a state-of-the-art Eulerian solver for electron transport in LTPs. Furthermore, we use our solver to study the relaxation time scales of the higher-order anisotropic correction terms. Our code is open-source and provides an interface that allows coupling to multiphysics simulations of low-temperature plasmas. △ Less

Submitted 30 August, 2024; originally announced September 2024.

arXiv:2401.03661 [pdf, other]

GrainGNN: A dynamic graph neural network for predicting 3D grain microstructure

Authors: Yigong Qin, Stephen DeWitt, Balasubramaniam Radhakrishnan, George Biros

Abstract: We propose GrainGNN, a surrogate model for the evolution of polycrystalline grain structure under rapid solidification conditions in metal additive manufacturing. High fidelity simulations of solidification microstructures are typically performed using multicomponent partial differential equations (PDEs) with moving interfaces. The inherent randomness of the PDE initial conditions (grain seeds) ne… ▽ More We propose GrainGNN, a surrogate model for the evolution of polycrystalline grain structure under rapid solidification conditions in metal additive manufacturing. High fidelity simulations of solidification microstructures are typically performed using multicomponent partial differential equations (PDEs) with moving interfaces. The inherent randomness of the PDE initial conditions (grain seeds) necessitates ensemble simulations to predict microstructure statistics, e.g., grain size, aspect ratio, and crystallographic orientation. Currently such ensemble simulations are prohibitively expensive and surrogates are necessary. In GrainGNN, we use a dynamic graph to represent interface motion and topological changes due to grain coarsening. We use a reduced representation of the microstructure using hand-crafted features; we combine pattern finding and altering graph algorithms with two neural networks, a classifier (for topological changes) and a regressor (for interface motion). Both networks have an encoder-decoder architecture; the encoder has a multi-layer transformer long-short-term-memory architecture; the decoder is a single layer perceptron. We evaluate GrainGNN by comparing it to high-fidelity phase field simulations for in-distribution and out-of-distribution grain configurations for solidification under laser power bed fusion conditions. GrainGNN results in 80\%--90\% pointwise accuracy; and nearly identical distributions of scalar quantities of interest (QoI) between phase field and GrainGNN simulations compared using Kolmogorov-Smirnov test. GrainGNN's inference speedup (PyTorch on single x86 CPU) over a high-fidelity phase field simulation (CUDA on a single NVIDIA A100 GPU) is 150$\times$--2000$\times$ for 100-initial grain problem. Further, using GrainGNN, we model the formation of 11,600 grains in 220 seconds on a single CPU core. △ Less

Submitted 31 January, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2209.08189 [pdf, other]

CLAIRE -- Parallelized Diffeomorphic Image Registration for Large-Scale Biomedical Imaging Applications

Authors: Naveen Himthani, Malte Brunn, Jae-Youn Kim, Miriam Schulte, Andreas Mang, George Biros

Abstract: We study the performance of CLAIRE -- a diffeomorphic multi-node, multi-GPU image-registration algorithm, and software -- in large-scale biomedical imaging applications with billions of voxels. At such resolutions, most existing software packages for diffeomorphic image registration are prohibitively expensive. As a result, practitioners first significantly downsample the original images and then… ▽ More We study the performance of CLAIRE -- a diffeomorphic multi-node, multi-GPU image-registration algorithm, and software -- in large-scale biomedical imaging applications with billions of voxels. At such resolutions, most existing software packages for diffeomorphic image registration are prohibitively expensive. As a result, practitioners first significantly downsample the original images and then register them using existing tools. Our main contribution is an extensive analysis of the impact of downsampling on registration performance. We study this impact by comparing full-resolution registrations obtained with CLAIRE to lower-resolution registrations for synthetic and real-world imaging datasets. Our results suggest that registration at full resolution can yield a superior registration quality -- but not always. For example, downsampling a synthetic image from $1024^3$ to $256^3$ decreases the Dice coefficient from 92% to 79%. However, the differences are less pronounced for noisy or low-contrast high-resolution images. CLAIRE allows us not only to register images of clinically relevant size in a few seconds but also to register images at unprecedented resolution in a reasonable time. The highest resolution considered is CLARITY images of size $2816\times3016\times1162$. To the best of our knowledge, this is the first study on image registration quality at such resolutions. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: 32 pages, 9 tables, 8 figures

arXiv:2106.06016 [pdf, other]

Ensemble inversion for brain tumor growth models with mass effect

Authors: Shashank Subramanian, Klaudius Scheufele, Naveen Himthani, Christos Davatzikos, George Biros

Abstract: We propose a method for extracting physics-based biomarkers from a single multiparametric Magnetic Resonance Imaging (mpMRI) scan bearing a glioma tumor. We account for mass effect, the deformation of brain parenchyma due to the growing tumor, which on its own is an important radiographic feature but its automatic quantification remains an open problem. In particular, we calibrate a partial differ… ▽ More We propose a method for extracting physics-based biomarkers from a single multiparametric Magnetic Resonance Imaging (mpMRI) scan bearing a glioma tumor. We account for mass effect, the deformation of brain parenchyma due to the growing tumor, which on its own is an important radiographic feature but its automatic quantification remains an open problem. In particular, we calibrate a partial differential equation (PDE) tumor growth model that captures mass effect, parameterized by a single scalar parameter, tumor proliferation, migration, while localizing the tumor initiation site. The single-scan calibration problem is severely ill-posed because the precancerous, healthy, brain anatomy is unknown. To address the ill-posedness, we introduce an ensemble inversion scheme that uses a number of normal subject brain templates as proxies for the healthy precancer subject anatomy. We verify our solver on a synthetic dataset and perform a retrospective analysis on a clinical dataset of 216 glioblastoma (GBM) patients. We analyze the reconstructions using our calibrated biophysical model and demonstrate that our solver provides both global and local quantitative measures of tumor biophysics and mass effect. We further highlight the improved performance in model calibration through the inclusion of mass effect in tumor growth models -- including mass effect in the model leads to 10% increase in average dice coefficients for patients with significant mass effect. We further evaluate our model by introducing novel biophysics-based features and using them for survival analysis. Our preliminary analysis suggests that including such features can improve patient stratification and survival prediction. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 10 pages, 3 supplementary pages

arXiv:2102.12602 [pdf, other]

Quantitative in vivo imaging to enable tumor forecasting and treatment optimization

Authors: Guillermo Lorenzo, David A. Hormuth II, Angela M. Jarrett, Ernesto A. B. F. Lima, Shashank Subramanian, George Biros, J. Tinsley Oden, Thomas J. R. Hughes, Thomas E. Yankeelov

Abstract: Current clinical decision-making in oncology relies on averages of large patient populations to both assess tumor status and treatment outcomes. However, cancers exhibit an inherent evolving heterogeneity that requires an individual approach based on rigorous and precise predictions of cancer growth and treatment response. To this end, we advocate the use of quantitative in vivo imaging data to ca… ▽ More Current clinical decision-making in oncology relies on averages of large patient populations to both assess tumor status and treatment outcomes. However, cancers exhibit an inherent evolving heterogeneity that requires an individual approach based on rigorous and precise predictions of cancer growth and treatment response. To this end, we advocate the use of quantitative in vivo imaging data to calibrate mathematical models for the personalized forecasting of tumor development. In this chapter, we summarize the main data types available from both common and emerging in vivo medical imaging technologies, and how these data can be used to obtain patient-specific parameters for common mathematical models of cancer. We then outline computational methods designed to solve these models, thereby enabling their use for producing personalized tumor forecasts in silico, which, ultimately, can be used to not only predict response, but also optimize treatment. Finally, we discuss the main barriers to making the above paradigm a clinical reality. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2011.07769 [pdf, other]

RCHOL: Randomized Cholesky Factorization for Solving SDD Linear Systems

Authors: Chao Chen, Tianyu Liang, George Biros

Abstract: We introduce a randomized algorithm, namely RCHOL, to construct an approximate Cholesky factorization for a given Laplacian matrix (a.k.a., graph Laplacian). From a graph perspective, the exact Cholesky factorization introduces a clique in the underlying graph after eliminating a row/column. By randomization, RCHOL only retains a sparse subset of the edges in the clique using a random sampling dev… ▽ More We introduce a randomized algorithm, namely RCHOL, to construct an approximate Cholesky factorization for a given Laplacian matrix (a.k.a., graph Laplacian). From a graph perspective, the exact Cholesky factorization introduces a clique in the underlying graph after eliminating a row/column. By randomization, RCHOL only retains a sparse subset of the edges in the clique using a random sampling developed by Spielman and Kyng. We prove RCHOL is breakdown-free and apply it to solving large sparse linear systems with symmetric diagonally dominant matrices. In addition, we parallelize RCHOL based on the nested dissection ordering for shared-memory machines. We report numerical experiments that demonstrate the robustness and the scalability of RCHOL. For example, our parallel code scaled up to 64 threads on a single node for solving the 3D Poisson equation, discretized with the 7-point stencil on a $1024\times 1024 \times 1024$ grid, a problem that has one billion unknowns. △ Less

Submitted 2 September, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

arXiv:2009.04552 [pdf, other]

KNN-DBSCAN: a DBSCAN in high dimensions

Authors: Youguang Chen, William Ruys, George Biros

Abstract: Clustering is a fundamental task in machine learning. One of the most successful and broadly used algorithms is DBSCAN, a density-based clustering algorithm. DBSCAN requires $ε$-nearest neighbor graphs of the input dataset, which are computed with range-search algorithms and spatial data structures like KD-trees. Despite many efforts to design scalable implementations for DBSCAN, existing work is… ▽ More Clustering is a fundamental task in machine learning. One of the most successful and broadly used algorithms is DBSCAN, a density-based clustering algorithm. DBSCAN requires $ε$-nearest neighbor graphs of the input dataset, which are computed with range-search algorithms and spatial data structures like KD-trees. Despite many efforts to design scalable implementations for DBSCAN, existing work is limited to low-dimensional datasets, as constructing $ε$-nearest neighbor graphs is expensive in high-dimensions. In this paper, we modify DBSCAN to enable use of $κ$-nearest neighbor graphs of the input dataset. The $κ$-nearest neighbor graphs are constructed using approximate algorithms based on randomized projections. Although these algorithms can become inaccurate or expensive in high-dimensions, they possess a much lower memory overhead than constructing $ε$-nearest neighbor graphs. We delineate the conditions under which $k$NN-DBSCAN produces the same clustering as DBSCAN. We also present an efficient parallel implementation of the overall algorithm using OpenMP for shared memory and MPI for distributed memory parallelism. We present results on up to 16 billion points in 20 dimensions, and perform weak and strong scaling studies using synthetic data. Our code is efficient in both low and high dimensions. We can cluster one billion points in 3D in less than one second on 28K cores on the Frontera system at the Texas Advanced Computing Center (TACC). In our largest run, we cluster 65 billion points in 20 dimensions in less than 40 seconds using 114,688 x86 cores on TACC's Frontera system. Also, we compare with a state of the art parallel DBSCAN code; on 20d/4M point dataset, our code is up to 37$\times$ faster. △ Less

Submitted 11 September, 2024; v1 submitted 9 September, 2020; originally announced September 2020.

Comments: 26 pages, 10 figures

arXiv:2008.12820 [pdf, other]

doi 10.1109/SC41405.2020.00042

Multi-Node Multi-GPU Diffeomorphic Image Registration for Large-Scale Imaging Problems

Authors: Malte Brunn, Naveen Himthani, George Biros, Miriam Mehl, Andreas Mang

Abstract: We present a Gauss-Newton-Krylov solver for large deformation diffeomorphic image registration. We extend the publicly available CLAIRE library to multi-node multi-graphics processing unit (GPUs) systems and introduce novel algorithmic modifications that significantly improve performance. Our contributions comprise ($i$) a new preconditioner for the reduced-space Gauss-Newton Hessian system, (… ▽ More We present a Gauss-Newton-Krylov solver for large deformation diffeomorphic image registration. We extend the publicly available CLAIRE library to multi-node multi-graphics processing unit (GPUs) systems and introduce novel algorithmic modifications that significantly improve performance. Our contributions comprise ($i$) a new preconditioner for the reduced-space Gauss-Newton Hessian system, ($ii$) a highly-optimized multi-node multi-GPU implementation exploiting device direct communication for the main computational kernels (interpolation, high-order finite difference operators and Fast-Fourier-Transform), and ($iii$) a comparison with state-of-the-art CPU and GPU implementations. We solve a $256^3$-resolution image registration problem in five seconds on a single NVIDIA Tesla V100, with a performance speedup of 70% compared to the state-of-the-art. In our largest run, we register $2048^3$ resolution images (25 B unknowns; approximately 152$\times$ larger than the largest problem solved in state-of-the-art GPU implementations) on 64 nodes with 256 GPUs on TACC's Longhorn system. △ Less

Submitted 28 August, 2020; originally announced August 2020.

Comments: Proc ACM/IEEE Conference on Supercomputing 2020 (accepted for publication)

MSC Class: 68U10; 49J20; 35Q93; 65K10; 65F08; 76D55

arXiv:2006.09932 [pdf, other]

Multiatlas Calibration of Biophysical Brain Tumor Growth Models with Mass Effect

Authors: Shashank Subramanian, Klaudius Scheufele, Naveen Himthani, George Biros

Abstract: We present a 3D fully-automatic method for the calibration of partial differential equation (PDE) models of glioblastoma (GBM) growth with mass effect, the deformation of brain tissue due to the tumor. We quantify the mass effect, tumor proliferation, tumor migration, and the localized tumor initial condition from a single multiparameteric Magnetic Resonance Imaging (mpMRI) patient scan. The PDE i… ▽ More We present a 3D fully-automatic method for the calibration of partial differential equation (PDE) models of glioblastoma (GBM) growth with mass effect, the deformation of brain tissue due to the tumor. We quantify the mass effect, tumor proliferation, tumor migration, and the localized tumor initial condition from a single multiparameteric Magnetic Resonance Imaging (mpMRI) patient scan. The PDE is a reaction-advection-diffusion partial differential equation coupled with linear elasticity equations to capture mass effect. The single-scan calibration model is notoriously difficult because the precancerous (healthy) brain anatomy is unknown. To solve this inherently ill-posed and ill-conditioned optimization problem, we introduce a novel inversion scheme that uses multiple brain atlases as proxies for the healthy precancer patient brain resulting in robust and reliable parameter estimation. We apply our method on both synthetic and clinical datasets representative of the heterogeneous spatial landscape typically observed in glioblastomas to demonstrate the validity and performance of our methods. In the synthetic data, we report calibration errors (due to the ill-posedness and our solution scheme) in the 10\%-20\% range. In the clinical data, we report good quantitative agreement with the observed tumor and qualitative agreement with the mass effect (for which we do not have a ground truth). Our method uses a minimal set of parameters and provides both global and local quantitative measures of tumor infiltration and mass effect. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: Provisionally accepted to MICCAI 2020

arXiv:2004.08893 [pdf, other]

doi 10.1016/j.jpdc.2020.11.006

Fast GPU 3D Diffeomorphic Image Registration

Authors: Malte Brunn, Naveen Himthani, George Biros, Miriam Mehl, Andreas Mang

Abstract: 3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss--Newton--Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architectures. Despite the importance of image registration, only a few implementations of large deformati… ▽ More 3D image registration is one of the most fundamental and computationally expensive operations in medical image analysis. Here, we present a mixed-precision, Gauss--Newton--Krylov solver for diffeomorphic registration of two images. Our work extends the publicly available CLAIRE library to GPU architectures. Despite the importance of image registration, only a few implementations of large deformation diffeomorphic registration packages support GPUs. Our contributions are new algorithms to significantly reduce the run time of the two main computational kernels in CLAIRE: calculation of derivatives and scattered-data interpolation. We deploy (i) highly-optimized, mixed-precision GPU-kernels for the evaluation of scattered-data interpolation, (ii) replace Fast-Fourier-Transform (FFT)-based first-order derivatives with optimized 8th-order finite differences, and (iii) compare with state-of-the-art CPU and GPU implementations. As a highlight, we demonstrate that we can register $256^3$ clinical images in less than 6 seconds on a single NVIDIA Tesla V100. This amounts to over 20$\times$ speed-up over the current version of CLAIRE and over 30$\times$ speed-up over existing GPU implementations. △ Less

Submitted 19 April, 2020; originally announced April 2020.

Comments: 20 pages, 6 figures, 8 tables

MSC Class: 68U10; 49J20; 35Q93; 65K10; 65F08; 76D55

Journal ref: Journal of Parallel and Distributed Computing 149:149-162, 2021

arXiv:1906.04596 [pdf, other]

ANODEV2: A Coupled Neural ODE Evolution Framework

Authors: Tianjun Zhang, Zhewei Yao, Amir Gholami, Kurt Keutzer, Joseph Gonzalez, George Biros, Michael Mahoney

Abstract: It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, which allow more general discretization schemes with adaptive time stepping. Here, we propose ANODEV2, which is an extension of this approach that also allows evolution of the neural network… ▽ More It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, which allow more general discretization schemes with adaptive time stepping. Here, we propose ANODEV2, which is an extension of this approach that also allows evolution of the neural network parameters, in a coupled ODE-based formulation. The Neural ODE method introduced earlier is in fact a special case of this new more general framework. We present the formulation of ANODEV2, derive optimality conditions, and implement a coupled reaction-diffusion-advection version of this framework in PyTorch. We present empirical results using several different configurations of ANODEV2, testing them on multiple models on CIFAR-10. We report results showing that this coupled ODE-based framework is indeed trainable, and that it achieves higher accuracy, as compared to the baseline models as well as the recently-proposed Neural ODE approach. △ Less

Submitted 9 June, 2019; originally announced June 2019.

Journal ref: NeurIPS 2019

arXiv:1902.10298 [pdf, other]

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

Authors: Amir Gholami, Kurt Keutzer, George Biros

Abstract: Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in [8]… ▽ More Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in [8], claimed that this memory overhead can be reduced from O(LN_t), where N_t is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time step sizes. We discuss the underlying problems, and to address them we propose ANODE, an Adjoint based Neural ODE framework which avoids the numerical instability related problems noted above, and provides unconditionally accurate gradients. ANODE has a memory footprint of O(L) + O(N_t), with the same computational cost as reversing ODE solve. We furthermore, discuss a memory efficient algorithm which can further reduce this footprint with a trade-off of additional computational cost. We show results on Cifar-10/100 datasets using ResNet and SqueezeNext neural networks. △ Less

Submitted 1 July, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

arXiv:1811.02629 [pdf, other]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Authors: Spyridon Bakas, Mauricio Reyes, Andras Jakab, Stefan Bauer, Markus Rempfler, Alessandro Crimi, Russell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, Marcel Prastawa, Esther Alberts, Jana Lipkova, John Freymann, Justin Kirby, Michel Bilello, Hassan Fathallah-Shaykh, Roland Wiest, Jan Kirschke, Benedikt Wiestler, Rivka Colen, Aikaterini Kotrotsou, Pamela Lamontagne, Daniel Marcus, Mikhail Milchenko , et al. (402 additional authors not shown)

Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem… ▽ More Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset. △ Less

Submitted 23 April, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: The International Multimodal Brain Tumor Segmentation (BraTS) Challenge

arXiv:1810.05732 [pdf, other]

A Novel Domain Adaptation Framework for Medical Image Segmentation

Authors: Amir Gholami, Shashank Subramanian, Varun Shenoy, Naveen Himthani, Xiangyu Yue, Sicheng Zhao, Peter Jin, George Biros, Kurt Keutzer

Abstract: We propose a segmentation framework that uses deep neural networks and introduce two innovations. First, we describe a biophysics-based domain adaptation method. Second, we propose an automatic method to segment white and gray matter, and cerebrospinal fluid, in addition to tumorous tissue. Regarding our first innovation, we use a domain adaptation framework that combines a novel multispecies biop… ▽ More We propose a segmentation framework that uses deep neural networks and introduce two innovations. First, we describe a biophysics-based domain adaptation method. Second, we propose an automatic method to segment white and gray matter, and cerebrospinal fluid, in addition to tumorous tissue. Regarding our first innovation, we use a domain adaptation framework that combines a novel multispecies biophysical tumor growth model with a generative adversarial model to create realistic looking synthetic multimodal MR images with known segmentation. Regarding our second innovation, we propose an automatic approach to enrich available segmentation data by computing the segmentation for healthy tissues. This segmentation, which is done using diffeomorphic image registration between the BraTS training data and a set of prelabeled atlases, provides more information for training and reduces the class imbalance problem. Our overall approach is not specific to any particular neural network and can be used in conjunction with existing solutions. We demonstrate the performance improvement using a 2D U-Net for the BraTS'18 segmentation challenge. Our biophysics based domain adaptation achieves better results, as compared to the existing state-of-the-art GAN model used to create synthetic data for training. △ Less

Submitted 11 October, 2018; originally announced October 2018.

arXiv:1808.04487 [pdf, other]

doi 10.1137/18M1207818

CLAIRE: A distributed-memory solver for constrained large deformation diffeomorphic image registration

Authors: Andreas Mang, Amir Gholami, Christos Davatzikos, George Biros

Abstract: With this work, we release CLAIRE, a distributed-memory implementation of an effective solver for constrained large deformation diffeomorphic image registration problems in three dimensions. We consider an optimal control formulation. We invert for a stationary velocity field that parameterizes the deformation map. Our solver is based on a globalized, preconditioned, inexact reduced space Gauss--N… ▽ More With this work, we release CLAIRE, a distributed-memory implementation of an effective solver for constrained large deformation diffeomorphic image registration problems in three dimensions. We consider an optimal control formulation. We invert for a stationary velocity field that parameterizes the deformation map. Our solver is based on a globalized, preconditioned, inexact reduced space Gauss--Newton--Krylov scheme. We exploit state-of-the-art techniques in scientific computing to develop an effective solver that scales to thousands of distributed memory nodes on high-end clusters. We present the formulation, discuss algorithmic features, describe the software package, and introduce an improved preconditioner for the reduced space Hessian to speed up the convergence of our solver. We test registration performance on synthetic and real data. We demonstrate registration accuracy on several neuroimaging datasets. We compare the performance of our scheme against different flavors of the Demons algorithm for diffeomorphic image registration. We study convergence of our preconditioner and our overall algorithm. We report scalability results on state-of-the-art supercomputing platforms. We demonstrate that we can solve registration problems for clinically relevant data sizes in two to four minutes on a standard compute node with 20 cores, attaining excellent data fidelity. With the present work we achieve a speedup of (on average) 5$\times$ with a peak performance of up to 17$\times$ compared to our former work. △ Less

Submitted 9 December, 2019; v1 submitted 13 August, 2018; originally announced August 2018.

Comments: 37 pages;

MSC Class: 68U10; 49J20; 35Q93; 65K10; 65F08; 76D55

Journal ref: SIAM Journal on Scientific Computing, 41(5):C548-C584, 2019

arXiv:1803.00058 [pdf, other]

doi 10.1007/s11081-018-9390-9

PDE-constrained optimization in medical image analysis

Authors: Andreas Mang, Amir Gholami, Christos Davatzikos, George Biros

Abstract: PDE-constrained optimization problems find many applications in medical image analysis, for example, neuroimaging, cardiovascular imaging, and oncological imaging. We review related literature and give examples on the formulation, discretization, and numerical solution of PDE-constrained optimization problems for medical imaging. We discuss three examples. The first one is image registration. The… ▽ More PDE-constrained optimization problems find many applications in medical image analysis, for example, neuroimaging, cardiovascular imaging, and oncological imaging. We review related literature and give examples on the formulation, discretization, and numerical solution of PDE-constrained optimization problems for medical imaging. We discuss three examples. The first one is image registration. The second one is data assimilation for brain tumor patients, and the third one data assimilation in cardiovascular imaging. The image registration problem is a classical task in medical image analysis and seeks to find pointwise correspondences between two or more images. The data assimilation problems use a PDE-constrained formulation to link a biophysical model to patient-specific data obtained from medical images. The associated optimality systems turn out to be sets of nonlinear, multicomponent PDEs that are challenging to solve in an efficient way. The ultimate goal of our work is the design of inversion methods that integrate complementary data, and rigorously follow mathematical and physical principles, in an attempt to support clinical decision making. This requires reliable, high-fidelity algorithms with a short time-to-solution. This task is complicated by model and data uncertainties, and by the fact that PDE-constrained optimization problems are ill-posed in nature, and in general yield high-dimensional, severely ill-conditioned systems after discretization. These features make regularization, effective preconditioners, and iterative solvers that, in many cases, have to be implemented on distributed-memory architectures to be practical, a prerequisite. We showcase state-of-the-art techniques in scientific computing to tackle these challenges. △ Less

Submitted 28 February, 2018; originally announced March 2018.

MSC Class: 49K20; 65Y05; 65M32; 65K10; 76D55; 68U10; 35M10

Journal ref: Optimization and Engineering 19, 765-812, 2018

arXiv:1707.00164 [pdf, other]

Geometry-Oblivious FMM for Compressing Dense SPD Matrices

Authors: Chenhan D. Yu, James Levitt, Severin Reiz, George Biros

Abstract: We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in $N \log N$ or even $N$ time, where $N$ is the matrix size. Compression requires $N \log N$ storage and work. In general, our sc… ▽ More We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in $N \log N$ or even $N$ time, where $N$ is the matrix size. Compression requires $N \log N$ storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices. △ Less

Submitted 1 July, 2017; originally announced July 2017.

Comments: 13 pages, accepted by SC'17

arXiv:1701.02324 [pdf, other]

An $N \log N$ Parallel Fast Direct Solver for Kernel Matrices

Authors: Chenhan D. Yu, William B. March, George Biros

Abstract: Kernel matrices appear in machine learning and non-parametric statistics. Given $N$ points in $d$ dimensions and a kernel function that requires $\mathcal{O}(d)$ work to evaluate, we present an $\mathcal{O}(dN\log N)$-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization,… ▽ More Kernel matrices appear in machine learning and non-parametric statistics. Given $N$ points in $d$ dimensions and a kernel function that requires $\mathcal{O}(d)$ work to evaluate, we present an $\mathcal{O}(dN\log N)$-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization, solving a linear system with a kernel matrix can be done with $\mathcal{O}(N\log N)$ work. Our algorithm only requires kernel evaluations and does not require that the kernel matrix admits an efficient global low rank approximation. Instead our factorization only assumes low-rank properties for the off-diagonal blocks under an appropriate row and column ordering. We also present a hybrid method that, when the factorization is prohibitively expensive, combines a partial factorization with iterative methods. As a highlight, we are able to approximately factorize a dense $11M\times11M$ kernel matrix in 2 minutes on 3,072 x86 "Haswell" cores and a $4.5M\times4.5M$ matrix in 1 minute using 4,352 "Knights Landing" cores. △ Less

Submitted 9 January, 2017; originally announced January 2017.

Comments: proceeding 31st IEEE International Parallel & Distributed Processing Symposium

arXiv:1610.02608 [pdf, other]

Research and Education in Computational Science and Engineering

Authors: Ulrich Rüde, Karen Willcox, Lois Curfman McInnes, Hans De Sterck, George Biros, Hans Bungartz, James Corones, Evin Cramer, James Crowley, Omar Ghattas, Max Gunzburger, Michael Hanke, Robert Harrison, Michael Heroux, Jan Hesthaven, Peter Jimack, Chris Johnson, Kirk E. Jordan, David E. Keyes, Rolf Krause, Vipin Kumar, Stefan Mayer, Juan Meza, Knut Martin Mørken, J. Tinsley Oden , et al. (8 additional authors not shown)

Abstract: Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that… ▽ More Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade. △ Less

Submitted 31 December, 2017; v1 submitted 8 October, 2016; originally announced October 2016.

Comments: Major revision, to appear in SIAM Review

Report number: Argonne National Laboratory Preprint ANL/MCS-P6054-0916 MSC Class: 00A72; 62-07; 68U20; 68W01; 68W10; 97A99; 97M10; 97N80; 97R20; 97R30 ACM Class: G.0; G.4; I.6; J.0; J.2; J.3; J.4; J.6; J.7; K.3.2

arXiv:1608.03630 [pdf, other]

doi 10.1109/SC.2016.71

Distributed-memory large deformation diffeomorphic 3D image registration

Authors: Andreas Mang, Amir Gholami, George Biros

Abstract: We present a parallel distributed-memory algorithm for large deformation diffeomorphic registration of volumetric images that produces large isochoric deformations (locally volume preserving). Image registration is a key technology in medical image analysis. Our algorithm uses a partial differential equation constrained optimal control formulation. Finding the optimal deformation map requires the… ▽ More We present a parallel distributed-memory algorithm for large deformation diffeomorphic registration of volumetric images that produces large isochoric deformations (locally volume preserving). Image registration is a key technology in medical image analysis. Our algorithm uses a partial differential equation constrained optimal control formulation. Finding the optimal deformation map requires the solution of a highly nonlinear problem that involves pseudo-differential operators, biharmonic operators, and pure advection operators both forward and back- ward in time. A key issue is the time to solution, which poses the demand for efficient optimization methods as well as an effective utilization of high performance computing resources. To address this problem we use a preconditioned, inexact, Gauss-Newton- Krylov solver. Our algorithm integrates several components: a spectral discretization in space, a semi-Lagrangian formulation in time, analytic adjoints, different regularization functionals (including volume-preserving ones), a spectral preconditioner, a highly optimized distributed Fast Fourier Transform, and a cubic interpolation scheme for the semi-Lagrangian time-stepping. We demonstrate the scalability of our algorithm on images with resolution of up to $1024^3$ on the "Maverick" and "Stampede" systems at the Texas Advanced Computing Center (TACC). The critical problem in the medical imaging application domain is strong scaling, that is, solving registration problems of a moderate size of $256^3$---a typical resolution for medical images. We are able to solve the registration problem for images of this size in less than five seconds on 64 x86 nodes of TACC's "Maverick" system. △ Less

Submitted 11 August, 2016; originally announced August 2016.

Comments: accepted for publication at SC16 in Salt Lake City, Utah, USA; November 2016

MSC Class: 68U10; 49J20; 35Q93; 65K10; 76D55; 90C20 ACM Class: D.1.3; I.4.3; G.1.8; F.2.1; G.1.0

Journal ref: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Article No. 72, 2016

arXiv:1604.02153 [pdf, other]

doi 10.1137/16M1070475

A Semi-Lagrangian two-level preconditioned Newton-Krylov solver for constrained diffeomorphic image registration

Authors: Andreas Mang, George Biros

Abstract: We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation. We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve… ▽ More We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation. We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations. We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20$\times$ speedup for a two dimensional, real world multi-subject medical image registration problem. △ Less

Submitted 28 February, 2018; v1 submitted 7 April, 2016; originally announced April 2016.

MSC Class: 68U10; 49J20; 35Q93; 65K10; 65F08; 76D55

Journal ref: SIAM Journal on Scientific Computing, 39(6):B1064-B1101, 2017

arXiv:1602.01376 [pdf, other]

Inv-ASKIT: A Parallel Fast Diret Solver for Kernel Matrices

Authors: Chenhan D. Yu, William B. March, Bo Xiao, George Biros

Abstract: We present a parallel algorithm for computing the approximate factorization of an $N$-by-$N$ kernel matrix. Once this factorization has been constructed (with $N \log^2 N $ work), we can solve linear systems with this matrix with $N \log N $ work. Kernel matrices represent pairwise interactions of points in metric spaces. They appear in machine learning, approximation theory, and computational phy… ▽ More We present a parallel algorithm for computing the approximate factorization of an $N$-by-$N$ kernel matrix. Once this factorization has been constructed (with $N \log^2 N $ work), we can solve linear systems with this matrix with $N \log N $ work. Kernel matrices represent pairwise interactions of points in metric spaces. They appear in machine learning, approximation theory, and computational physics. Kernel matrices are typically dense (matrix multiplication scales quadratically with $N$) and ill-conditioned (solves can require 100s of Krylov iterations). Thus, fast algorithms for matrix multiplication and factorization are critical for scalability. Recently we introduced ASKIT, a new method for approximating a kernel matrix that resembles N-body methods. Here we introduce INV-ASKIT, a factorization scheme based on ASKIT. We describe the new method, derive complexity estimates, and conduct an empirical study of its accuracy and scalability. We report results on real-world datasets including "COVTYPE" ($0.5$M points in 54 dimensions), "SUSY" ($4.5$M points in 8 dimensions) and "MNIST" (2M points in 784 dimensions) using shared and distributed memory parallelism. In our largest run we approximately factorize a dense matrix of size 32M $\times$ 32M (generated from points in 64 dimensions) on 4,096 Sandy-Bridge cores. To our knowledge these results improve the state of the art by several orders of magnitude. △ Less

Submitted 3 February, 2016; originally announced February 2016.

Comments: 11 pages, 2 figures, to appear in IPDPS 2016

arXiv:1506.07933 [pdf, other]

AccFFT: A library for distributed-memory FFT on CPU and GPU architectures

Authors: Amir Gholami, Judith Hill, Dhairya Malhotra, George Biros

Abstract: We present a new library for parallel distributed Fast Fourier Transforms (FFT). The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements. AccFFT extends existing FFT libraries for CUDA-enabled Graphics Processing Units (GPUs) to distributed memory clusters. We use overlapping communication method to reduce the overhead of PCI… ▽ More We present a new library for parallel distributed Fast Fourier Transforms (FFT). The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements. AccFFT extends existing FFT libraries for CUDA-enabled Graphics Processing Units (GPUs) to distributed memory clusters. We use overlapping communication method to reduce the overhead of PCIe transfers from/to GPU. We present numerical results on the Maverick platform at the Texas Advanced Computing Center (TACC) and on the Titan system at the Oak Ridge National Laboratory (ORNL). We present the scaling of the library up to 4,096 K20 GPUs of Titan. △ Less

Submitted 25 May, 2016; v1 submitted 25 June, 2015; originally announced June 2015.

Comments: Parallel FFT Library

arXiv:1503.00757 [pdf, other]

doi 10.1137/15M1010919

Constrained $H^1$-regularization schemes for diffeomorphic image registration

Authors: Andreas Mang, George Biros

Abstract: We propose regularization schemes for deformable registration and efficient algorithms for their numerical approximation. We treat image registration as a variational optimal control problem. The deformation map is parametrized by its velocity. Tikhonov regularization ensures well-posedness. Our scheme augments standard smoothness regularization operators based on $H^1$- and $H^2$-seminorms with a… ▽ More We propose regularization schemes for deformable registration and efficient algorithms for their numerical approximation. We treat image registration as a variational optimal control problem. The deformation map is parametrized by its velocity. Tikhonov regularization ensures well-posedness. Our scheme augments standard smoothness regularization operators based on $H^1$- and $H^2$-seminorms with a constraint on the divergence of the velocity field, which resembles variational formulations for Stokes incompressible flows. In our formulation, we invert for a stationary velocity field and a mass source map. This allows us to explicitly control the compressibility of the deformation map and by that the determinant of the deformation gradient. We also introduce a new regularization scheme that allows us to control shear. We use a globalized, preconditioned, matrix-free, reduced space (Gauss--)Newton--Krylov scheme for numerical optimization. We exploit variable elimination techniques to reduce the number of unknowns of our system; we only iterate on the reduced space of the velocity field. Our current implementation is limited to the two-dimensional case. The numerical experiments demonstrate that we can control the determinant of the deformation gradient without compromising registration quality. This additional control allows us to avoid oversmoothing of the deformation map. We also demonstrate that we can promote or penalize shear while controlling the determinant of the deformation gradient. △ Less

Submitted 7 September, 2016; v1 submitted 2 March, 2015; originally announced March 2015.

MSC Class: 68U10; 49J20; 35Q93; 65K10; 76D55; 90C20

Journal ref: SIAM J Imaging Sci, 9(3):1154-1194, 2016

arXiv:1410.0260 [pdf, other]

ASKIT: Approximate Skeletonization Kernel-Independent Treecode in High Dimensions

Authors: William B. March, Bo Xiao, George Biros

Abstract: We present a fast algorithm for kernel summation problems in high-dimensions. These problems appear in computational physics, numerical approximation, non-parametric statistics, and machine learning. In our context, the sums depend on a kernel function that is a pair potential defined on a dataset of points in a high-dimensional Euclidean space. A direct evaluation of the sum scales quadratically… ▽ More We present a fast algorithm for kernel summation problems in high-dimensions. These problems appear in computational physics, numerical approximation, non-parametric statistics, and machine learning. In our context, the sums depend on a kernel function that is a pair potential defined on a dataset of points in a high-dimensional Euclidean space. A direct evaluation of the sum scales quadratically with the number of points. Fast kernel summation methods can reduce this cost to linear complexity, but the constants involved do not scale well with the dimensionality of the dataset. The main algorithmic components of fast kernel summation algorithms are the separation of the kernel sum between near and far field (which is the basis for pruning) and the efficient and accurate approximation of the far field. We introduce novel methods for pruning and approximating the far field. Our far field approximation requires only kernel evaluations and does not use analytic expansions. Pruning is not done using bounding boxes but rather combinatorially using a sparsified nearest-neighbor graph of the input. The time complexity of our algorithm depends linearly on the ambient dimension. The error in the algorithm depends on the low-rank approximability of the far field, which in turn depends on the kernel function and on the intrinsic dimensionality of the distribution of the points. The error of the far field approximation does not depend on the ambient dimension. We present the new algorithm along with experimental results that demonstrate its performance. We report results for Gaussian kernel sums for 100 million points in 64 dimensions, for one million points in 1000 dimensions, and for problems in which the Gaussian kernel has a variable bandwidth. To the best of our knowledge, all of these experiments are impossible or prohibitively expensive with existing fast kernel summation methods. △ Less

Submitted 13 March, 2015; v1 submitted 1 October, 2014; originally announced October 2014.

Comments: 22 pages, 6 figures

arXiv:1409.2802 [pdf, other]

Far-Field Compression for Fast Kernel Summation Methods in High Dimensions

Authors: William B. March, George Biros

Abstract: We consider fast kernel summations in high dimensions: given a large set of points in $d$ dimensions (with $d \gg 3$) and a pair-potential function (the {\em kernel} function), we compute a weighted sum of all pairwise kernel interactions for each point in the set. Direct summation is equivalent to a (dense) matrix-vector multiplication and scales quadratically with the number of points. Fast kern… ▽ More We consider fast kernel summations in high dimensions: given a large set of points in $d$ dimensions (with $d \gg 3$) and a pair-potential function (the {\em kernel} function), we compute a weighted sum of all pairwise kernel interactions for each point in the set. Direct summation is equivalent to a (dense) matrix-vector multiplication and scales quadratically with the number of points. Fast kernel summation algorithms reduce this cost to log-linear or linear complexity. Treecodes and Fast Multipole Methods (FMMs) deliver tremendous speedups by constructing approximate representations of interactions of points that are far from each other. In algebraic terms, these representations correspond to low-rank approximations of blocks of the overall interaction matrix. Existing approaches require an excessive number of kernel evaluations with increasing $d$ and number of points in the dataset. To address this issue, we use a randomized algebraic approach in which we first sample the rows of a block and then construct its approximate, low-rank interpolative decomposition. We examine the feasibility of this approach theoretically and experimentally. We provide a new theoretical result showing a tighter bound on the reconstruction error from uniformly sampling rows than the existing state-of-the-art. We demonstrate that our sampling approach is competitive with existing (but prohibitively expensive) methods from the literature. We also construct kernel matrices for the Laplacian, Gaussian, and polynomial kernels -- all commonly used in physics and data analysis. We explore the numerical properties of blocks of these matrices, and show that they are amenable to our approach. Depending on the data set, our randomized algorithm can successfully compute low rank approximations in high dimensions. We report results for data sets with ambient dimensions from four to 1,000. △ Less

Submitted 12 February, 2015; v1 submitted 9 September, 2014; originally announced September 2014.

Comments: 43 pages, 21 figures

arXiv:1408.6299 [pdf, other]

doi 10.1137/140984002

An inexact Newton-Krylov algorithm for constrained diffeomorphic image registration

Authors: Andreas Mang, George Biros

Abstract: We propose numerical algorithms for solving large deformation diffeomorphic image registration problems. We formulate the nonrigid image registration problem as a problem of optimal control. This leads to an infinite-dimensional partial differential equation (PDE) constrained optimization problem. The PDE constraint consists, in its simplest form, of a hyperbolic transport equation for the evolu… ▽ More We propose numerical algorithms for solving large deformation diffeomorphic image registration problems. We formulate the nonrigid image registration problem as a problem of optimal control. This leads to an infinite-dimensional partial differential equation (PDE) constrained optimization problem. The PDE constraint consists, in its simplest form, of a hyperbolic transport equation for the evolution of the image intensity. The control variable is the velocity field. Tikhonov regularization on the control ensures well-posedness. We consider standard smoothness regularization based on $H^1$- or $H^2$-seminorms. We augment this regularization scheme with a constraint on the divergence of the velocity field rendering the deformation incompressible and thus ensuring that the determinant of the deformation gradient is equal to one, up to the numerical error. We use a Fourier pseudospectral discretization in space and a Chebyshev pseudospectral discretization in time. We use a preconditioned, globalized, matrix-free, inexact Newton-Krylov method for numerical optimization. A parameter continuation is designed to estimate an optimal regularization parameter. Regularity is ensured by controlling the geometric properties of the deformation field. Overall, we arrive at a black-box solver. We study spectral properties of the Hessian, grid convergence, numerical accuracy, computational efficiency, and deformation regularity of our scheme. We compare the designed Newton-Krylov methods with a globalized preconditioned gradient descent. We study the influence of a varying number of unknowns in time. The reported results demonstrate excellent numerical accuracy, guaranteed local deformation regularity, and computational efficiency with an optional control on local mass conservation. The Newton-Krylov methods clearly outperform the Picard method if high accuracy of the inversion is required. △ Less

Submitted 7 May, 2015; v1 submitted 26 August, 2014; originally announced August 2014.

Comments: 32 pages; 10 figures; 9 tables

MSC Class: 68U10; 49J20; 35Q93; 65K10; 76D55

Journal ref: SIAM J. Imaging Sci., 8(2):1030-1069, 2015

Showing 1–29 of 29 results for author: Biros, G