-
Privacy Preserving Federated Learning in Medical Imaging with Uncertainty Estimation
Authors:
Nikolas Koutsoubis,
Yasin Yilmaz,
Ravi P. Ramachandran,
Matthew Schabath,
Ghulam Rasool
Abstract:
Machine learning (ML) and Artificial Intelligence (AI) have fueled remarkable advancements, particularly in healthcare. Within medical imaging, ML models hold the promise of improving disease diagnoses, treatment planning, and post-treatment monitoring. Various computer vision tasks like image classification, object detection, and image segmentation are poised to become routine in clinical analysi…
▽ More
Machine learning (ML) and Artificial Intelligence (AI) have fueled remarkable advancements, particularly in healthcare. Within medical imaging, ML models hold the promise of improving disease diagnoses, treatment planning, and post-treatment monitoring. Various computer vision tasks like image classification, object detection, and image segmentation are poised to become routine in clinical analysis. However, privacy concerns surrounding patient data hinder the assembly of large training datasets needed for developing and training accurate, robust, and generalizable models. Federated Learning (FL) emerges as a compelling solution, enabling organizations to collaborate on ML model training by sharing model training information (gradients) rather than data (e.g., medical images). FL's distributed learning framework facilitates inter-institutional collaboration while preserving patient privacy. However, FL, while robust in privacy preservation, faces several challenges. Sensitive information can still be gleaned from shared gradients that are passed on between organizations during model training. Additionally, in medical imaging, quantifying model confidence\uncertainty accurately is crucial due to the noise and artifacts present in the data. Uncertainty estimation in FL encounters unique hurdles due to data heterogeneity across organizations. This paper offers a comprehensive review of FL, privacy preservation, and uncertainty estimation, with a focus on medical imaging. Alongside a survey of current research, we identify gaps in the field and suggest future directions for FL research to enhance privacy and address noisy medical imaging data challenges.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Evaluating the Impact of Sequence Combinations on Breast Tumor Segmentation in Multiparametric MRI
Authors:
Hang Min,
Gorane Santamaria Hormaechea,
Prabhakar Ramachandran,
Jason Dowling
Abstract:
Multiparametric magnetic resonance imaging (mpMRI) is a key tool for assessing breast cancer progression. Although deep learning has been applied to automate tumor segmentation in breast MRI, the effect of sequence combinations in mpMRI remains under-investigated. This study explores the impact of different combinations of T2-weighted (T2w), dynamic contrast-enhanced MRI (DCE-MRI) and diffusion-we…
▽ More
Multiparametric magnetic resonance imaging (mpMRI) is a key tool for assessing breast cancer progression. Although deep learning has been applied to automate tumor segmentation in breast MRI, the effect of sequence combinations in mpMRI remains under-investigated. This study explores the impact of different combinations of T2-weighted (T2w), dynamic contrast-enhanced MRI (DCE-MRI) and diffusion-weighted imaging (DWI) with apparent diffusion coefficient (ADC) map on breast tumor segmentation using nnU-Net. Evaluated on a multicenter mpMRI dataset, the nnU-Net model using DCE sequences achieved a Dice similarity coefficient (DSC) of 0.69 $\pm$ 0.18 for functional tumor volume (FTV) segmentation. For whole tumor mask (WTM) segmentation, adding the predicted FTV to DWI and ADC map improved the DSC from 0.57 $\pm$ 0.24 to 0.60 $\pm$ 0.21. Adding T2w did not yield significant improvement, which still requires further investigation under a more standardized imaging protocol. This study serves as a foundation for future work on predicting breast cancer treatment response using mpMRI.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Deployment of a Robust and Explainable Mortality Prediction Model: The COVID-19 Pandemic and Beyond
Authors:
Jacob R. Epifano,
Stephen Glass,
Ravi P. Ramachandran,
Sharad Patel,
Aaron J. Masino,
Ghulam Rasool
Abstract:
This study investigated the performance, explainability, and robustness of deployed artificial intelligence (AI) models in predicting mortality during the COVID-19 pandemic and beyond. The first study of its kind, we found that Bayesian Neural Networks (BNNs) and intelligent training techniques allowed our models to maintain performance amidst significant data shifts. Our results emphasize the imp…
▽ More
This study investigated the performance, explainability, and robustness of deployed artificial intelligence (AI) models in predicting mortality during the COVID-19 pandemic and beyond. The first study of its kind, we found that Bayesian Neural Networks (BNNs) and intelligent training techniques allowed our models to maintain performance amidst significant data shifts. Our results emphasize the importance of developing robust AI models capable of matching or surpassing clinician predictions, even under challenging conditions. Our exploration of model explainability revealed that stochastic models generate more diverse and personalized explanations thereby highlighting the need for AI models that provide detailed and individualized insights in real-world clinical settings. Furthermore, we underscored the importance of quantifying uncertainty in AI models which enables clinicians to make better-informed decisions based on reliable predictions. Our study advocates for prioritizing implementation science in AI research for healthcare and ensuring that AI solutions are practical, beneficial, and sustainable in real-world clinical environments. By addressing unique challenges and complexities in healthcare settings, researchers can develop AI models that effectively improve clinical practice and patient outcomes.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Classification of Dysarthria based on the Levels of Severity. A Systematic Review
Authors:
Afnan Al-Ali,
Somaya Al-Maadeed,
Moutaz Saleh,
Rani Chinnappa Naidu,
Zachariah C Alex,
Prakash Ramachandran,
Rajeev Khoodeeram,
Rajesh Kumar M
Abstract:
Dysarthria is a neurological speech disorder that can significantly impact affected individuals' communication abilities and overall quality of life. The accurate and objective classification of dysarthria and the determination of its severity are crucial for effective therapeutic intervention. While traditional assessments by speech-language pathologists (SLPs) are common, they are often subjecti…
▽ More
Dysarthria is a neurological speech disorder that can significantly impact affected individuals' communication abilities and overall quality of life. The accurate and objective classification of dysarthria and the determination of its severity are crucial for effective therapeutic intervention. While traditional assessments by speech-language pathologists (SLPs) are common, they are often subjective, time-consuming, and can vary between practitioners. Emerging machine learning-based models have shown the potential to provide a more objective dysarthria assessment, enhancing diagnostic accuracy and reliability. This systematic review aims to comprehensively analyze current methodologies for classifying dysarthria based on severity levels. Specifically, this review will focus on determining the most effective set and type of features that can be used for automatic patient classification and evaluating the best AI techniques for this purpose. We will systematically review the literature on the automatic classification of dysarthria severity levels. Sources of information will include electronic databases and grey literature. Selection criteria will be established based on relevance to the research questions. Data extraction will include methodologies used, the type of features extracted for classification, and AI techniques employed. The findings of this systematic review will contribute to the current understanding of dysarthria classification, inform future research, and support the development of improved diagnostic tools. The implications of these findings could be significant in advancing patient care and improving therapeutic outcomes for individuals affected by dysarthria.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Targeted Background Removal Creates Interpretable Feature Visualizations
Authors:
Ian E. Nielsen,
Erik Grundeland,
Joseph Snedeker,
Ghulam Rasool,
Ravi P. Ramachandran
Abstract:
Feature visualization is used to visualize learned features for black box machine learning models. Our approach explores an altered training process to improve interpretability of the visualizations. We argue that by using background removal techniques as a form of robust training, a network is forced to learn more human recognizable features, namely, by focusing on the main object of interest wit…
▽ More
Feature visualization is used to visualize learned features for black box machine learning models. Our approach explores an altered training process to improve interpretability of the visualizations. We argue that by using background removal techniques as a form of robust training, a network is forced to learn more human recognizable features, namely, by focusing on the main object of interest without any distractions from the background. Four different training methods were used to verify this hypothesis. The first used unmodified pictures. The second used a black background. The third utilized Gaussian noise as the background. The fourth approach employed a mix of background removed images and unmodified images. The feature visualization results show that the background removed images reveal a significant improvement over the baseline model. These new results displayed easily recognizable features from their respective classes, unlike the model trained on unmodified data.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Revisiting the Fragility of Influence Functions
Authors:
Jacob R. Epifano,
Ravi P. Ramachandran,
Aaron J. Masino,
Ghulam Rasool
Abstract:
In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragilit…
▽ More
In the last few years, many works have tried to explain the predictions of deep learning models. Few methods, however, have been proposed to verify the accuracy or faithfulness of these explanations. Recently, influence functions, which is a method that approximates the effect that leave-one-out training has on the loss function, has been shown to be fragile. The proposed reason for their fragility remains unclear. Although previous work suggests the use of regularization to increase robustness, this does not hold in all cases. In this work, we seek to investigate the experiments performed in the prior work in an effort to understand the underlying mechanisms of influence function fragility. First, we verify influence functions using procedures from the literature under conditions where the convexity assumptions of influence functions are met. Then, we relax these assumptions and study the effects of non-convexity by using deeper models and more complex datasets. Here, we analyze the key metrics and procedures that are used to validate influence functions. Our results indicate that the validation procedures may cause the observed fragility.
△ Less
Submitted 7 April, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
EvalAttAI: A Holistic Approach to Evaluating Attribution Maps in Robust and Non-Robust Models
Authors:
Ian E. Nielsen,
Ravi P. Ramachandran,
Nidhal Bouaynaya,
Hassan M. Fathallah-Shaykh,
Ghulam Rasool
Abstract:
The expansion of explainable artificial intelligence as a field of research has generated numerous methods of visualizing and understanding the black box of a machine learning model. Attribution maps are generally used to highlight the parts of the input image that influence the model to make a specific decision. On the other hand, the robustness of machine learning models to natural noise and adv…
▽ More
The expansion of explainable artificial intelligence as a field of research has generated numerous methods of visualizing and understanding the black box of a machine learning model. Attribution maps are generally used to highlight the parts of the input image that influence the model to make a specific decision. On the other hand, the robustness of machine learning models to natural noise and adversarial attacks is also being actively explored. This paper focuses on evaluating methods of attribution mapping to find whether robust neural networks are more explainable. We explore this problem within the application of classification for medical imaging. Explainability research is at an impasse. There are many methods of attribution mapping, but no current consensus on how to evaluate them and determine the ones that are the best. Our experiments on multiple datasets (natural and medical imaging) and various attribution methods reveal that two popular evaluation metrics, Deletion and Insertion, have inherent limitations and yield contradictory results. We propose a new explainability faithfulness metric (called EvalAttAI) that addresses the limitations of prior metrics. Using our novel evaluation, we found that Bayesian deep neural networks using the Variational Density Propagation technique were consistently more explainable when used with the best performing attribution method, the Vanilla Gradient. However, in general, various types of robust neural networks may not be more explainable, despite these models producing more visually plausible attribution maps.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review
Authors:
Asim Waqas,
Aakash Tripathi,
Ravi P. Ramachandran,
Paul Stewart,
Ghulam Rasool
Abstract:
Cancer has relational information residing at varying scales, modalities, and resolutions of the acquired data, such as radiology, pathology, genomics, proteomics, and clinical records. Integrating diverse data types can improve the accuracy and reliability of cancer diagnosis and treatment. There can be disease-related information that is too subtle for humans or existing technological tools to d…
▽ More
Cancer has relational information residing at varying scales, modalities, and resolutions of the acquired data, such as radiology, pathology, genomics, proteomics, and clinical records. Integrating diverse data types can improve the accuracy and reliability of cancer diagnosis and treatment. There can be disease-related information that is too subtle for humans or existing technological tools to discern visually. Traditional methods typically focus on partial or unimodal information about biological systems at individual scales and fail to encapsulate the complete spectrum of the heterogeneous nature of data. Deep neural networks have facilitated the development of sophisticated multimodal data fusion approaches that can extract and integrate relevant information from multiple sources. Recent deep learning frameworks such as Graph Neural Networks (GNNs) and Transformers have shown remarkable success in multimodal learning. This review article provides an in-depth analysis of the state-of-the-art in GNNs and Transformers for multimodal data fusion in oncology settings, highlighting notable research studies and their findings. We also discuss the foundations of multimodal learning, inherent challenges, and opportunities for integrative learning in oncology. By examining the current state and potential future developments of multimodal data integration in oncology, we aim to demonstrate the promising role that multimodal neural networks can play in cancer prevention, early detection, and treatment through informed oncology practices in personalized settings.
△ Less
Submitted 28 March, 2024; v1 submitted 11 March, 2023;
originally announced March 2023.
-
Anisotropic, Sparse and Interpretable Physics-Informed Neural Networks for PDEs
Authors:
Amuthan A. Ramabathiran,
Prabhu Ramachandran
Abstract:
There has been a growing interest in the use of Deep Neural Networks (DNNs) to solve Partial Differential Equations (PDEs). Despite the promise that such approaches hold, there are various aspects where they could be improved. Two such shortcomings are (i) their computational inefficiency relative to classical numerical methods, and (ii) the non-interpretability of a trained DNN model. In this wor…
▽ More
There has been a growing interest in the use of Deep Neural Networks (DNNs) to solve Partial Differential Equations (PDEs). Despite the promise that such approaches hold, there are various aspects where they could be improved. Two such shortcomings are (i) their computational inefficiency relative to classical numerical methods, and (ii) the non-interpretability of a trained DNN model. In this work we present ASPINN, an anisotropic extension of our earlier work called SPINN--Sparse, Physics-informed, and Interpretable Neural Networks--to solve PDEs that addresses both these issues. ASPINNs generalize radial basis function networks. We demonstrate using a variety of examples involving elliptic and hyperbolic PDEs that the special architecture we propose is more efficient than generic DNNs, while at the same time being directly interpretable. Further, they improve upon the SPINN models we proposed earlier in that fewer nodes are require to capture the solution using ASPINN than using SPINN, thanks to the anisotropy of the local zones of influence of each node. The interpretability of ASPINN translates to a ready visualization of their weights and biases, thereby yielding more insight into the nature of the trained model. This in turn provides a systematic procedure to improve the architecture based on the quality of the computed solution. ASPINNs thus serve as an effective bridge between classical numerical algorithms and modern DNN based methods to solve PDEs. In the process, we also streamline the training of ASPINNs into a form that is closer to that of supervised learning algorithms.
△ Less
Submitted 23 December, 2022; v1 submitted 1 July, 2022;
originally announced July 2022.
-
Transformers in Time-series Analysis: A Tutorial
Authors:
Sabeen Ahmed,
Ian E. Nielsen,
Aakash Tripathi,
Shamoon Siddiqui,
Ghulam Rasool,
Ravi P. Ramachandran
Abstract:
Transformer architecture has widespread applications, particularly in Natural Language Processing and computer vision. Recently Transformers have been employed in various aspects of time-series analysis. This tutorial provides an overview of the Transformer architecture, its applications, and a collection of examples from recent research papers in time-series analysis. We delve into an explanation…
▽ More
Transformer architecture has widespread applications, particularly in Natural Language Processing and computer vision. Recently Transformers have been employed in various aspects of time-series analysis. This tutorial provides an overview of the Transformer architecture, its applications, and a collection of examples from recent research papers in time-series analysis. We delve into an explanation of the core components of the Transformer, including the self-attention mechanism, positional encoding, multi-head, and encoder/decoder. Several enhancements to the initial, Transformer architecture are highlighted to tackle time-series tasks. The tutorial also provides best practices and techniques to overcome the challenge of effectively training Transformers for time-series analysis.
△ Less
Submitted 1 July, 2023; v1 submitted 28 April, 2022;
originally announced May 2022.
-
SCOTCH: An Efficient Secure Computation Framework for Secure Aggregation
Authors:
Yash More,
Prashanthi Ramachandran,
Priyam Panda,
Arup Mondal,
Harpreet Virk,
Debayan Gupta
Abstract:
Federated learning enables multiple data owners to jointly train a machine learning model without revealing their private datasets. However, a malicious aggregation server might use the model parameters to derive sensitive information about the training dataset used. To address such leakage, differential privacy and cryptographic techniques have been investigated in prior work, but these often res…
▽ More
Federated learning enables multiple data owners to jointly train a machine learning model without revealing their private datasets. However, a malicious aggregation server might use the model parameters to derive sensitive information about the training dataset used. To address such leakage, differential privacy and cryptographic techniques have been investigated in prior work, but these often result in large communication overheads or impact model performance. To mitigate this centralization of power, we propose SCOTCH, a decentralized m-party secure-computation framework for federated aggregation that deploys MPC primitives, such as secret sharing. Our protocol is simple, efficient, and provides strict privacy guarantees against curious aggregators or colluding data-owners with minimal communication overheads compared to other existing state-of-the-art privacy-preserving federated learning frameworks. We evaluate our framework by performing extensive experiments on multiple datasets with promising results. SCOTCH can train the standard MLP NN with the training dataset split amongst 3 participating users and 3 aggregating servers with 96.57% accuracy on MNIST, and 98.40% accuracy on the Extended MNIST (digits) dataset, while providing various optimizations.
△ Less
Submitted 15 February, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Robust Explainability: A Tutorial on Gradient-Based Attribution Methods for Deep Neural Networks
Authors:
Ian E. Nielsen,
Dimah Dera,
Ghulam Rasool,
Nidhal Bouaynaya,
Ravi P. Ramachandran
Abstract:
With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no consensus on how to evaluate them. On the other hand, robustness is a popular topic for deep learning research; however, it is hardly talked about in explainabili…
▽ More
With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no consensus on how to evaluate them. On the other hand, robustness is a popular topic for deep learning research; however, it is hardly talked about in explainability until very recently. In this tutorial paper, we start by presenting gradient-based interpretability methods. These techniques use gradient signals to assign the burden of the decision on the input features. Later, we discuss how gradient-based methods can be evaluated for their robustness and the role that adversarial robustness plays in having meaningful explanations. We also discuss the limitations of gradient-based methods. Finally, we present the best practices and attributes that should be examined before choosing an explainability method. We conclude with the future directions for research in the area at the convergence of robustness and explainability.
△ Less
Submitted 13 January, 2022; v1 submitted 23 July, 2021;
originally announced July 2021.
-
Efficient and Accurate Adaptive Resolution for Weakly-Compressible SPH
Authors:
Abhinav Muta,
Prabhu Ramachandran
Abstract:
In this paper we propose an accurate, and computationally efficient method for incorporating adaptive spatial resolution into weakly-compressible Smoothed Particle Hydrodynamics (SPH) schemes. Particles are adaptively split and merged in an accurate manner. Critically, the method ensures that the number of neighbors of each particle is optimal, leading to an efficient algorithm. A set of backgroun…
▽ More
In this paper we propose an accurate, and computationally efficient method for incorporating adaptive spatial resolution into weakly-compressible Smoothed Particle Hydrodynamics (SPH) schemes. Particles are adaptively split and merged in an accurate manner. Critically, the method ensures that the number of neighbors of each particle is optimal, leading to an efficient algorithm. A set of background particles is used to specify either geometry-based spatial resolution, where the resolution is a function of distance to a solid body, or solution-based adaptive resolution, where the resolution is a function of the computed solution. This allows us to simulate problems using particles having length variations of the order of 1:250 with much fewer particles than currently reported with other techniques. The method is designed to automatically adapt when any solid bodies move. The algorithms employed are fully parallel. We consider a suite of benchmark problems to demonstrate the accuracy of the approach. We then consider the classic problem of the flow past a circular cylinder at a range of Reynolds numbers and show that the proposed method produces accurate results with a significantly reduced number of particles. We provide an open source implementation and a fully reproducible manuscript.
△ Less
Submitted 14 May, 2022; v1 submitted 4 July, 2021;
originally announced July 2021.
-
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Authors:
Ashish Vaswani,
Prajit Ramachandran,
Aravind Srinivas,
Niki Parmar,
Blake Hechtman,
Jonathon Shlens
Abstract:
Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions, in contrast to parameter-dependent scaling and content-independent interactions of convolutions. Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolut…
▽ More
Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions, in contrast to parameter-dependent scaling and content-independent interactions of convolutions. Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolutional models such as ResNet-50. In this work, we aim to develop self-attention models that can outperform not just the canonical baseline models, but even the high-performing convolutional models. We propose two extensions to self-attention that, in conjunction with a more efficient implementation of self-attention, improve the speed, memory usage, and accuracy of these models. We leverage these improvements to develop a new self-attention model family, HaloNets, which reach state-of-the-art accuracies on the parameter-limited setting of the ImageNet classification benchmark. In preliminary transfer learning experiments, we find that HaloNet models outperform much larger models and have better inference performance. On harder tasks such as object detection and instance segmentation, our simple local self-attention and convolutional hybrids show improvements over very strong baselines. These results mark another step in demonstrating the efficacy of self-attention models on settings traditionally dominated by convolutional models.
△ Less
Submitted 7 June, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
SPINN: Sparse, Physics-based, and partially Interpretable Neural Networks for PDEs
Authors:
Amuthan A. Ramabathiran,
Prabhu Ramachandran
Abstract:
We introduce a class of Sparse, Physics-based, and partially Interpretable Neural Networks (SPINN) for solving ordinary and partial differential equations (PDEs). By reinterpreting a traditional meshless representation of solutions of PDEs we develop a class of sparse neural network architectures that are partially interpretable. The SPINN model we propose here serves as a seamless bridge between…
▽ More
We introduce a class of Sparse, Physics-based, and partially Interpretable Neural Networks (SPINN) for solving ordinary and partial differential equations (PDEs). By reinterpreting a traditional meshless representation of solutions of PDEs we develop a class of sparse neural network architectures that are partially interpretable. The SPINN model we propose here serves as a seamless bridge between two extreme modeling tools for PDEs, namely dense neural network based methods like Physics Informed Neural Networks (PINNs) and traditional mesh-free numerical methods, thereby providing a novel means to develop a new class of hybrid algorithms that build on the best of both these viewpoints. A unique feature of the SPINN model that distinguishes it from other neural network based approximations proposed earlier is that it is (i) interpretable, in a particular sense made precise in the work, and (ii) sparse in the sense that it has much fewer connections than typical dense neural networks used for PDEs. Further, the SPINN algorithm implicitly encodes mesh adaptivity and is able to handle discontinuities in the solutions. In addition, we demonstrate that Fourier series representations can also be expressed as a special class of SPINN and propose generalized neural network analogues of Fourier representations. We illustrate the utility of the proposed method with a variety of examples involving ordinary differential equations, elliptic, parabolic, hyperbolic and nonlinear partial differential equations, and an example in fluid dynamics.
△ Less
Submitted 28 July, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
S++: A Fast and Deployable Secure-Computation Framework for Privacy-Preserving Neural Network Training
Authors:
Prashanthi Ramachandran,
Shivam Agarwal,
Arup Mondal,
Aastha Shah,
Debayan Gupta
Abstract:
We introduce S++, a simple, robust, and deployable framework for training a neural network (NN) using private data from multiple sources, using secret-shared secure function evaluation. In short, consider a virtual third party to whom every data-holder sends their inputs, and which computes the neural network: in our case, this virtual third party is actually a set of servers which individually le…
▽ More
We introduce S++, a simple, robust, and deployable framework for training a neural network (NN) using private data from multiple sources, using secret-shared secure function evaluation. In short, consider a virtual third party to whom every data-holder sends their inputs, and which computes the neural network: in our case, this virtual third party is actually a set of servers which individually learn nothing, even with a malicious (but non-colluding) adversary.
Previous work in this area has been limited to just one specific activation function: ReLU, rendering the approach impractical for many use-cases. For the first time, we provide fast and verifiable protocols for all common activation functions and optimize them for running in a secret-shared manner. The ability to quickly, verifiably, and robustly compute exponentiation, softmax, sigmoid, etc., allows us to use previously written NNs without modification, vastly reducing developer effort and complexity of code. In recent times, ReLU has been found to converge much faster and be more computationally efficient as compared to non-linear functions like sigmoid or tanh. However, we argue that it would be remiss not to extend the mechanism to non-linear functions such as the logistic sigmoid, tanh, and softmax that are fundamental due to their ability to express outputs as probabilities and their universal approximation property. Their contribution in RNNs and a few recent advancements also makes them more relevant.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Revisiting Fundamentals of Experience Replay
Authors:
William Fedus,
Prajit Ramachandran,
Rishabh Agarwal,
Yoshua Bengio,
Hugo Larochelle,
Mark Rowland,
Will Dabney
Abstract:
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and a…
▽ More
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Revisiting Spatial Invariance with Low-Rank Local Connectivity
Authors:
Gamaleldin F. Elsayed,
Prajit Ramachandran,
Jonathon Shlens,
Simon Kornblith
Abstract:
Convolutional neural networks are among the most successful architectures in deep learning with this success at least partially attributable to the efficacy of spatial invariance as an inductive bias. Locally connected layers, which differ from convolutional layers only in their lack of spatial invariance, usually perform poorly in practice. However, these observations still leave open the possibi…
▽ More
Convolutional neural networks are among the most successful architectures in deep learning with this success at least partially attributable to the efficacy of spatial invariance as an inductive bias. Locally connected layers, which differ from convolutional layers only in their lack of spatial invariance, usually perform poorly in practice. However, these observations still leave open the possibility that some degree of relaxation of spatial invariance may yield a better inductive bias than either convolution or local connectivity. To test this hypothesis, we design a method to relax the spatial invariance of a network layer in a controlled manner; we create a \textit{low-rank} locally connected layer, where the filter bank applied at each position is constructed as a linear combination of basis set of filter banks with spatially varying combining weights. By varying the number of basis filter banks, we can control the degree of relaxation of spatial invariance. In experiments with small convolutional networks, we find that relaxing spatial invariance improves classification accuracy over both convolution and locally connected layers across MNIST, CIFAR-10, and CelebA datasets, thus suggesting that spatial invariance may be an overly restrictive prior.
△ Less
Submitted 14 August, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Algorithms for uniform particle initialization in domains with complex boundaries
Authors:
Pawan Negi,
Prabhu Ramachandran
Abstract:
Accurate mesh-free simulation of fluid flows involving complex boundaries requires that the boundaries be captured accurately in terms of particles. In the context of incompressible/weakly-compressible fluid flow, the SPH method is more accurate when the particle distribution is uniform. Hence, for the time accurate simulation of flow in the presence of complex boundaries, one must have both an ac…
▽ More
Accurate mesh-free simulation of fluid flows involving complex boundaries requires that the boundaries be captured accurately in terms of particles. In the context of incompressible/weakly-compressible fluid flow, the SPH method is more accurate when the particle distribution is uniform. Hence, for the time accurate simulation of flow in the presence of complex boundaries, one must have both an accurate boundary discretization as well as a uniform distribution of particles to initialize the simulation. This process of obtaining an initial uniform distribution of particles is called "particle packing". In this paper, various particle packing algorithms present in the literature are implemented and compared. An improved SPH-based algorithm is proposed which produces uniform particle distributions of both the fluid and solid domains in two and three dimensions. Some challenging geometries are constructed to demonstrate the accuracy of the new algorithm. The implementation of the algorithm is open source and the manuscript is fully reproducible.
△ Less
Submitted 8 January, 2021; v1 submitted 17 October, 2019;
originally announced October 2019.
-
PySPH: a Python-based framework for smoothed particle hydrodynamics
Authors:
Prabhu Ramachandran,
Aditya Bhosale,
Kunal Puri,
Pawan Negi,
Abhinav Muta,
A Dinesh,
Dileep Menon,
Rahul Govind,
Suraj Sanka,
Amal S Sebastian,
Ananyo Sen,
Rohan Kaushik,
Anshuman Kumar,
Vikas Kurapati,
Mrinalgouda Patil,
Deep Tavker,
Pankaj Pandey,
Chandrashekhar Kaushik,
Arkopal Dutt,
Arpit Agarwal
Abstract:
PySPH is an open-source, Python-based, framework for particle methods in general and Smoothed Particle Hydrodynamics (SPH) in particular. PySPH allows a user to define a complete SPH simulation using pure Python. High-performance code is generated from this high-level Python code and executed on either multiple cores, or on GPUs, seamlessly. It also supports distributed execution using MPI. PySPH…
▽ More
PySPH is an open-source, Python-based, framework for particle methods in general and Smoothed Particle Hydrodynamics (SPH) in particular. PySPH allows a user to define a complete SPH simulation using pure Python. High-performance code is generated from this high-level Python code and executed on either multiple cores, or on GPUs, seamlessly. It also supports distributed execution using MPI. PySPH supports a wide variety of SPH schemes and formulations. These include, incompressible and compressible fluid flow, elastic dynamics, rigid body dynamics, shallow water equations, and other problems. PySPH supports a variety of boundary conditions including mirror, periodic, solid wall, and inlet/outlet boundary conditions. The package is written to facilitate reuse and reproducibility. This paper discusses the overall design of PySPH and demonstrates many of its features. Several example results are shown to demonstrate the range of features that PySPH provides.
△ Less
Submitted 28 December, 2020; v1 submitted 10 September, 2019;
originally announced September 2019.
-
Stand-Alone Self-Attention in Vision Models
Authors:
Prajit Ramachandran,
Niki Parmar,
Ashish Vaswani,
Irwan Bello,
Anselm Levskaya,
Jonathon Shlens
Abstract:
Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is…
▽ More
Convolutions are a fundamental building block of modern computer vision systems. Recent approaches have argued for going beyond convolutions in order to capture long-range dependencies. These efforts focus on augmenting convolutional models with content-based interactions, such as self-attention and non-local means, to achieve gains on a number of vision tasks. The natural question that arises is whether attention can be a stand-alone primitive for vision models instead of serving as just an augmentation on top of convolutions. In developing and testing a pure self-attention vision model, we verify that self-attention can indeed be an effective stand-alone layer. A simple procedure of replacing all instances of spatial convolutions with a form of self-attention applied to ResNet model produces a fully self-attentional model that outperforms the baseline on ImageNet classification with 12% fewer FLOPS and 29% fewer parameters. On COCO object detection, a pure self-attention model matches the mAP of a baseline RetinaNet while having 39% fewer FLOPS and 34% fewer parameters. Detailed ablation studies demonstrate that self-attention is especially impactful when used in later layers. These results establish that stand-alone self-attention is an important addition to the vision practitioner's toolbox.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
Backprop Evolution
Authors:
Maximilian Alber,
Irwan Bello,
Barret Zoph,
Pieter-Jan Kindermans,
Prajit Ramachandran,
Quoc Le
Abstract:
The back-propagation algorithm is the cornerstone of deep learning. Despite its importance, few variations of the algorithm have been attempted. This work presents an approach to discover new variations of the back-propagation equation. We use a domain specific lan- guage to describe update equations as a list of primitive functions. An evolution-based method is used to discover new propagation ru…
▽ More
The back-propagation algorithm is the cornerstone of deep learning. Despite its importance, few variations of the algorithm have been attempted. This work presents an approach to discover new variations of the back-propagation equation. We use a domain specific lan- guage to describe update equations as a list of primitive functions. An evolution-based method is used to discover new propagation rules that maximize the generalization per- formance after a few epochs of training. We find several update equations that can train faster with short training times than standard back-propagation, and perform similar as standard back-propagation at convergence.
△ Less
Submitted 8 August, 2018;
originally announced August 2018.
-
automan: a simple, Python-based, automation framework for numerical computing
Authors:
Prabhu Ramachandran
Abstract:
We present an easy-to-use, Python-based framework that allows a researcher to automate their computational simulations. In particular the framework facilitates assembling several long-running computations and producing various plots from the data produced by these computations. The framework makes it possible to reproduce every figure made for a publication with a single command. It also allows on…
▽ More
We present an easy-to-use, Python-based framework that allows a researcher to automate their computational simulations. In particular the framework facilitates assembling several long-running computations and producing various plots from the data produced by these computations. The framework makes it possible to reproduce every figure made for a publication with a single command. It also allows one to distribute the computations across a network of computers. The framework has been used to write research papers in numerical computing. This paper discusses the design of the framework, and the benefits of using it. The ideas presented are general and should help researchers organize their computations for better reproducibility.
△ Less
Submitted 4 February, 2018; v1 submitted 11 December, 2017;
originally announced December 2017.
-
Searching for Activation Functions
Authors:
Prajit Ramachandran,
Barret Zoph,
Quoc V. Le
Abstract:
The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance. Currently, the most successful and widely-used activation function is the Rectified Linear Unit (ReLU). Although various hand-designed alternatives to ReLU have been proposed, none have managed to replace it due to inconsistent gains. In this work, we propose to leverage auto…
▽ More
The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance. Currently, the most successful and widely-used activation function is the Rectified Linear Unit (ReLU). Although various hand-designed alternatives to ReLU have been proposed, none have managed to replace it due to inconsistent gains. In this work, we propose to leverage automatic search techniques to discover new activation functions. Using a combination of exhaustive and reinforcement learning-based search, we discover multiple novel activation functions. We verify the effectiveness of the searches by conducting an empirical evaluation with the best discovered activation function. Our experiments show that the best discovered activation function, $f(x) = x \cdot \text{sigmoid}(βx)$, which we name Swish, tends to work better than ReLU on deeper models across a number of challenging datasets. For example, simply replacing ReLUs with Swish units improves top-1 classification accuracy on ImageNet by 0.9\% for Mobile NASNet-A and 0.6\% for Inception-ResNet-v2. The simplicity of Swish and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.
△ Less
Submitted 27 October, 2017; v1 submitted 16 October, 2017;
originally announced October 2017.
-
Fast Generation for Convolutional Autoregressive Models
Authors:
Prajit Ramachandran,
Tom Le Paine,
Pooya Khorrami,
Mohammad Babaeizadeh,
Shiyu Chang,
Yang Zhang,
Mark A. Hasegawa-Johnson,
Roy H. Campbell,
Thomas S. Huang
Abstract:
Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in a naïve fashion where redundant computations are unnecessarily repeated. This results in slow generation, making such models infeasible for production environmen…
▽ More
Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in a naïve fashion where redundant computations are unnecessarily repeated. This results in slow generation, making such models infeasible for production environments. In this work, we describe a method to speed up generation in convolutional autoregressive models. The key idea is to cache hidden states to avoid redundant computation. We apply our fast generation method to the Wavenet and PixelCNN++ models and achieve up to $21\times$ and $183\times$ speedups respectively.
△ Less
Submitted 20 April, 2017;
originally announced April 2017.
-
Stein Variational Policy Gradient
Authors:
Yang Liu,
Prajit Ramachandran,
Qiang Liu,
Jian Peng
Abstract:
Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian infer…
▽ More
Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian inference problem. We then propose a novel Stein variational policy gradient method (SVPG) which combines existing policy gradient methods and a repulsive functional to generate a set of diverse but well-behaved policies. SVPG is robust to initialization and can easily be implemented in a parallel manner. On continuous control problems, we find that implementing SVPG on top of REINFORCE and advantage actor-critic algorithms improves both average return and data efficiency.
△ Less
Submitted 7 April, 2017;
originally announced April 2017.
-
Fast Wavenet Generation Algorithm
Authors:
Tom Le Paine,
Pooya Khorrami,
Shiyu Chang,
Yang Zhang,
Prajit Ramachandran,
Mark A. Hasegawa-Johnson,
Thomas S. Huang
Abstract:
This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advanta…
▽ More
This paper presents an efficient implementation of the Wavenet generation process called Fast Wavenet. Compared to a naive implementation that has complexity O(2^L) (L denotes the number of layers in the network), our proposed approach removes redundant convolution operations by caching previous calculations, thereby reducing the complexity to O(L) time. Timing experiments show significant advantages of our fast implementation over a naive one. While this method is presented for Wavenet, the same scheme can be applied anytime one wants to perform autoregressive generation or online prediction using a model with dilated convolution layers. The code for our method is publicly available.
△ Less
Submitted 28 November, 2016;
originally announced November 2016.
-
Unsupervised Pretraining for Sequence to Sequence Learning
Authors:
Prajit Ramachandran,
Peter J. Liu,
Quoc V. Le
Abstract:
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights of two language models and then fine-tuned with labeled data. We apply this method to challenging benchmarks in machine translation and abstractive summarizati…
▽ More
This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights of two language models and then fine-tuned with labeled data. We apply this method to challenging benchmarks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models. Our main result is that pretraining improves the generalization of seq2seq models. We achieve state-of-the art results on the WMT English$\rightarrow$German task, surpassing a range of methods using both phrase-based machine translation and neural machine translation. Our method achieves a significant improvement of 1.3 BLEU from the previous best models on both WMT'14 and WMT'15 English$\rightarrow$German. We also conduct human evaluations on abstractive summarization and find that our method outperforms a purely supervised learning baseline in a statistically significant manner.
△ Less
Submitted 21 February, 2018; v1 submitted 8 November, 2016;
originally announced November 2016.
-
Seq-NMS for Video Object Detection
Authors:
Wei Han,
Pooya Khorrami,
Tom Le Paine,
Prajit Ramachandran,
Mohammad Babaeizadeh,
Honghui Shi,
Jianan Li,
Shuicheng Yan,
Thomas S. Huang
Abstract:
Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip. Recently, there have been major advances for doing object detection in a single image. These methods typically contain three phases: (i) object proposal generation (ii) object classification and (iii) post-processing. We propose a modificatio…
▽ More
Video object detection is challenging because objects that are easily detected in one frame may be difficult to detect in another frame within the same clip. Recently, there have been major advances for doing object detection in a single image. These methods typically contain three phases: (i) object proposal generation (ii) object classification and (iii) post-processing. We propose a modification of the post-processing phase that uses high-scoring object detections from nearby frames to boost scores of weaker detections within the same clip. We show that our method obtains superior results to state-of-the-art single image object detection techniques. Our method placed 3rd in the video object detection (VID) task of the ImageNet Large Scale Visual Recognition Challenge 2015 (ILSVRC2015).
△ Less
Submitted 22 August, 2016; v1 submitted 26 February, 2016;
originally announced February 2016.
-
SPH Entropy Errors and the Pressure Blip
Authors:
Kunal Puri,
Prabhu Ramachandran
Abstract:
The spurious pressure jump at a contact discontinuity, in SPH simulations of the compressible Euler equations is investigated. From the spatiotemporal behaviour of the error, the SPH pressure jump is likened to entropy errors observed for artificial viscosity based finite difference/volume schemes. The error is observed to be generated at start-up and dissipation is the only recourse to mitigate i…
▽ More
The spurious pressure jump at a contact discontinuity, in SPH simulations of the compressible Euler equations is investigated. From the spatiotemporal behaviour of the error, the SPH pressure jump is likened to entropy errors observed for artificial viscosity based finite difference/volume schemes. The error is observed to be generated at start-up and dissipation is the only recourse to mitigate it's effect.
We show that similar errors are generated for the Lagrangian plus remap version of the Piecewise Parabolic Method (PPM) finite volume code (PPMLR). Through a comparison with the direct Eulerian version of the PPM code (PPMDE), we argue that a lack of diffusion across the material wave (contact discontinuity) is responsible for the error in PPMLR. We verify this hypothesis by constructing a more dissipative version of the remap code using a piecewise constant reconstruction. As an application to SPH, we propose a hybrid GSPH scheme that adds the requisite dissipation by utilizing a more dissipative Riemann solver for the energy equation. The proposed modification to the GSPH scheme, and it's improved treatment of the anomaly is verified for flows with strong shocks in one and two dimensions.
The result that dissipation must act across the density and energy equations provides a consistent explanation for many of the hitherto proposed "cures" or "fixes" for the problem.
△ Less
Submitted 3 March, 2014; v1 submitted 9 November, 2013;
originally announced November 2013.
-
Mayavi: a package for 3D visualization of scientific data
Authors:
Prabhu Ramachandran,
Gaël Varoquaux
Abstract:
Mayavi is an open-source, general-purpose, 3D scientific visualization package. It seeks to provide easy and interactive tools for data visualization that fit with the scientific user's workflow. For this purpose, Mayavi provides several entry points: a full-blown interactive application; a Python library with both a MATLAB-like interface focused on easy scripting and a feature-rich object hierarc…
▽ More
Mayavi is an open-source, general-purpose, 3D scientific visualization package. It seeks to provide easy and interactive tools for data visualization that fit with the scientific user's workflow. For this purpose, Mayavi provides several entry points: a full-blown interactive application; a Python library with both a MATLAB-like interface focused on easy scripting and a feature-rich object hierarchy; widgets associated with these objects for assembling in a domain-specific application, and plugins that work with a general purpose application-building framework. In this article, we present an overview of the various features of Mayavi, we then provide insight on the design and engineering decisions made in implementing Mayavi, and finally discuss a few novel applications.
△ Less
Submitted 23 October, 2010;
originally announced October 2010.