-
On Characterizing the Evolution of Embedding Space of Neural Networks using Algebraic Topology
Authors:
Suryaka Suresh,
Bishshoy Das,
Vinayak Abrol,
Sumantra Dutta Roy
Abstract:
We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers. Motivated by existing studies using simplicial complexes on shallow fully connected networks (FCN), we present an extended analysis using Cubical homology instead, with a variety of popular deep architectures and real image datasets. We dem…
▽ More
We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers. Motivated by existing studies using simplicial complexes on shallow fully connected networks (FCN), we present an extended analysis using Cubical homology instead, with a variety of popular deep architectures and real image datasets. We demonstrate that as depth increases, a topologically complicated dataset is transformed into a simple one, resulting in Betti numbers attaining their lowest possible value. The rate of decay in topological complexity (as a metric) helps quantify the impact of architectural choices on the generalization ability. Interestingly from a representation learning perspective, we highlight several invariances such as topological invariance of (1) an architecture on similar datasets; (2) embedding space of a dataset for architectures of variable depth; (3) embedding space to input resolution/size, and (4) data sub-sampling. In order to further demonstrate the link between expressivity \& the generalization capability of a network, we consider the task of ranking pre-trained models for downstream classification task (transfer learning). Compared to existing approaches, the proposed metric has a better correlation to the actually achievable accuracy via fine-tuning the pre-trained model.
△ Less
Submitted 9 November, 2023; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Unifying Tracking and Image-Video Object Detection
Authors:
Peirong Liu,
Rui Wang,
Pengchuan Zhang,
Omid Poursaeed,
Yipin Zhou,
Xuefei Cao,
Sreya Dutta Roy,
Ashish Shah,
Ser-Nam Lim
Abstract:
Objection detection (OD) has been one of the most fundamental tasks in computer vision. Recent developments in deep learning have pushed the performance of image OD to new heights by learning-based, data-driven approaches. On the other hand, video OD remains less explored, mostly due to much more expensive data annotation needs. At the same time, multi-object tracking (MOT) which requires reasonin…
▽ More
Objection detection (OD) has been one of the most fundamental tasks in computer vision. Recent developments in deep learning have pushed the performance of image OD to new heights by learning-based, data-driven approaches. On the other hand, video OD remains less explored, mostly due to much more expensive data annotation needs. At the same time, multi-object tracking (MOT) which requires reasoning about track identities and spatio-temporal trajectories, shares similar spirits with video OD. However, most MOT datasets are class-specific (e.g., person-annotated only), which constrains a model's flexibility to perform tracking on other objects. We propose TrIVD (Tracking and Image-Video Detection), the first framework that unifies image OD, video OD, and MOT within one end-to-end model. To handle the discrepancies and semantic overlaps of category labels across datasets, TrIVD formulates detection/tracking as grounding and reasons about object categories via visual-text alignments. The unified formulation enables cross-dataset, multi-task training, and thus equips TrIVD with the ability to leverage frame-level features, video-level spatio-temporal relations, as well as track identity associations. With such joint training, we can now extend the knowledge from OD data, that comes with much richer object category annotations, to MOT and achieve zero-shot tracking capability. Experiments demonstrate that multi-task co-trained TrIVD outperforms single-task baselines across all image/video OD and MOT tasks. We further set the first baseline on the new task of zero-shot tracking.
△ Less
Submitted 19 November, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
A Self-Supervised Descriptor for Image Copy Detection
Authors:
Ed Pizzi,
Sreya Dutta Roy,
Sugosh Nagavara Ravindra,
Priya Goyal,
Matthijs Douze
Abstract:
Image copy detection is an important task for content moderation. We introduce SSCD, a model that builds on a recent self-supervised contrastive training objective. We adapt this method to the copy detection task by changing the architecture and training objective, including a pooling operator from the instance matching literature, and adapting contrastive learning to augmentations that combine im…
▽ More
Image copy detection is an important task for content moderation. We introduce SSCD, a model that builds on a recent self-supervised contrastive training objective. We adapt this method to the copy detection task by changing the architecture and training objective, including a pooling operator from the instance matching literature, and adapting contrastive learning to augmentations that combine images.
Our approach relies on an entropy regularization term, promoting consistent separation between descriptor vectors, and we demonstrate that this significantly improves copy detection accuracy. Our method produces a compact descriptor vector, suitable for real-world web scale applications. Statistical information from a background image distribution can be incorporated into the descriptor.
On the recent DISC2021 benchmark, SSCD is shown to outperform both baseline copy detection models and self-supervised architectures designed for image classification by huge margins, in all settings. For example, SSCD out-performs SimCLR descriptors by 48% absolute. Code is available at https://github.com/facebookresearch/sscd-copy-detection.
△ Less
Submitted 25 March, 2022; v1 submitted 21 February, 2022;
originally announced February 2022.
-
Altering Backward Pass Gradients improves Convergence
Authors:
Bishshoy Das,
Milton Mondal,
Brejesh Lall,
Shiv Dutt Joshi,
Sumantra Dutta Roy
Abstract:
In standard neural network training, the gradients in the backward pass are determined by the forward pass. As a result, the two stages are coupled. This is how most neural networks are trained currently. However, gradient modification in the backward pass has seldom been studied in the literature. In this paper we explore decoupled training, where we alter the gradients in the backward pass. We p…
▽ More
In standard neural network training, the gradients in the backward pass are determined by the forward pass. As a result, the two stages are coupled. This is how most neural networks are trained currently. However, gradient modification in the backward pass has seldom been studied in the literature. In this paper we explore decoupled training, where we alter the gradients in the backward pass. We propose a simple yet powerful method called PowerGrad Transform, that alters the gradients before the weight update in the backward pass and significantly enhances the predictive performance of the neural network. PowerGrad Transform trains the network to arrive at a better optima at convergence. It is computationally extremely efficient, virtually adding no additional cost to either memory or compute, but results in improved final accuracies on both the training and test sets. PowerGrad Transform is easy to integrate into existing training routines, requiring just a few lines of code. PowerGrad Transform accelerates training and makes it possible for the network to better fit the training data. With decoupled training, PowerGrad Transform improves baseline accuracies for ResNet-50 by 0.73%, for SE-ResNet-50 by 0.66% and by more than 1.0% for the non-normalized ResNet-18 network on the ImageNet classification task.
△ Less
Submitted 20 September, 2022; v1 submitted 24 November, 2021;
originally announced November 2021.
-
DFCANet: Dense Feature Calibration-Attention Guided Network for Cross Domain Iris Presentation Attack Detection
Authors:
Gaurav Jaswal,
Aman Verma,
Sumantra Dutta Roy,
Raghavendra Ramachandra
Abstract:
An iris presentation attack detection (IPAD) is essential for securing personal identity is widely used iris recognition systems. However, the existing IPAD algorithms do not generalize well to unseen and cross-domain scenarios because of capture in unconstrained environments and high visual correlation amongst bonafide and attack samples. These similarities in intricate textural and morphological…
▽ More
An iris presentation attack detection (IPAD) is essential for securing personal identity is widely used iris recognition systems. However, the existing IPAD algorithms do not generalize well to unseen and cross-domain scenarios because of capture in unconstrained environments and high visual correlation amongst bonafide and attack samples. These similarities in intricate textural and morphological patterns of iris ocular images contribute further to performance degradation. To alleviate these shortcomings, this paper proposes DFCANet: Dense Feature Calibration and Attention Guided Network which calibrates the locally spread iris patterns with the globally located ones. Uplifting advantages from feature calibration convolution and residual learning, DFCANet generates domain-specific iris feature representations. Since some channels in the calibrated feature maps contain more prominent information, we capitalize discriminative feature learning across the channels through the channel attention mechanism. In order to intensify the challenge for our proposed model, we make DFCANet operate over nonsegmented and non-normalized ocular iris images. Extensive experimentation conducted over challenging cross-domain and intra-domain scenarios highlights consistent outperforming results. Compared to state-of-the-art methods, DFCANet achieves significant gains in performance for the benchmark IIITD CLI, IIIT CSD and NDCLD13 databases respectively. Further, a novel incremental learning-based methodology has been introduced so as to overcome disentangled iris-data characteristics and data scarcity. This paper also pursues the challenging scenario that considers soft-lens under the attack category with evaluation performed under various cross-domain protocols. The code will be made publicly available.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Sensor-invariant Fingerprint ROI Segmentation Using Recurrent Adversarial Learning
Authors:
Indu Joshi,
Ayush Utkarsh,
Riya Kothari,
Vinod K Kurmi,
Antitza Dantcheva,
Sumantra Dutta Roy,
Prem Kumar Kalra
Abstract:
A fingerprint region of interest (roi) segmentation algorithm is designed to separate the foreground fingerprint from the background noise. All the learning based state-of-the-art fingerprint roi segmentation algorithms proposed in the literature are benchmarked on scenarios when both training and testing databases consist of fingerprint images acquired from the same sensors. However, when testing…
▽ More
A fingerprint region of interest (roi) segmentation algorithm is designed to separate the foreground fingerprint from the background noise. All the learning based state-of-the-art fingerprint roi segmentation algorithms proposed in the literature are benchmarked on scenarios when both training and testing databases consist of fingerprint images acquired from the same sensors. However, when testing is conducted on a different sensor, the segmentation performance obtained is often unsatisfactory. As a result, every time a new fingerprint sensor is used for testing, the fingerprint roi segmentation model needs to be re-trained with the fingerprint image acquired from the new sensor and its corresponding manually marked ROI. Manually marking fingerprint ROI is expensive because firstly, it is time consuming and more importantly, requires domain expertise. In order to save the human effort in generating annotations required by state-of-the-art, we propose a fingerprint roi segmentation model which aligns the features of fingerprint images derived from the unseen sensor such that they are similar to the ones obtained from the fingerprints whose ground truth roi masks are available for training. Specifically, we propose a recurrent adversarial learning based feature alignment network that helps the fingerprint roi segmentation model to learn sensor-invariant features. Consequently, sensor-invariant features learnt by the proposed roi segmentation model help it to achieve improved segmentation performance on fingerprints acquired from the new sensor. Experiments on publicly available FVC databases demonstrate the efficacy of the proposed work.
△ Less
Submitted 3 July, 2021;
originally announced July 2021.
-
Data Uncertainty Guided Noise-aware Preprocessing Of Fingerprints
Authors:
Indu Joshi,
Ayush Utkarsh,
Riya Kothari,
Vinod K Kurmi,
Antitza Dantcheva,
Sumantra Dutta Roy,
Prem Kumar Kalra
Abstract:
The effectiveness of fingerprint-based authentication systems on good quality fingerprints is established long back. However, the performance of standard fingerprint matching systems on noisy and poor quality fingerprints is far from satisfactory. Towards this, we propose a data uncertainty-based framework which enables the state-of-the-art fingerprint preprocessing models to quantify noise presen…
▽ More
The effectiveness of fingerprint-based authentication systems on good quality fingerprints is established long back. However, the performance of standard fingerprint matching systems on noisy and poor quality fingerprints is far from satisfactory. Towards this, we propose a data uncertainty-based framework which enables the state-of-the-art fingerprint preprocessing models to quantify noise present in the input image and identify fingerprint regions with background noise and poor ridge clarity. Quantification of noise helps the model two folds: firstly, it makes the objective function adaptive to the noise in a particular input fingerprint and consequently, helps to achieve robust performance on noisy and distorted fingerprint regions. Secondly, it provides a noise variance map which indicates noisy pixels in the input fingerprint image. The predicted noise variance map enables the end-users to understand erroneous predictions due to noise present in the input image. Extensive experimental evaluation on 13 publicly available fingerprint databases, across different architectural choices and two fingerprint processing tasks demonstrate effectiveness of the proposed framework.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
MOOD: Multi-level Out-of-distribution Detection
Authors:
Ziqian Lin,
Sreya Dutta Roy,
Yixuan Li
Abstract:
Out-of-distribution (OOD) detection is essential to prevent anomalous inputs from causing a model to fail during deployment. While improved OOD detection methods have emerged, they often rely on the final layer outputs and require a full feedforward pass for any given input. In this paper, we propose a novel framework, multi-level out-of-distribution detection MOOD, which exploits intermediate cla…
▽ More
Out-of-distribution (OOD) detection is essential to prevent anomalous inputs from causing a model to fail during deployment. While improved OOD detection methods have emerged, they often rely on the final layer outputs and require a full feedforward pass for any given input. In this paper, we propose a novel framework, multi-level out-of-distribution detection MOOD, which exploits intermediate classifier outputs for dynamic and efficient OOD inference. We explore and establish a direct relationship between the OOD data complexity and optimal exit level, and show that easy OOD examples can be effectively detected early without propagating to deeper layers. At each exit, the OOD examples can be distinguished through our proposed adjusted energy score, which is both empirically and theoretically suitable for networks with multiple classifiers. We extensively evaluate MOOD across 10 OOD datasets spanning a wide range of complexities. Experiments demonstrate that MOOD achieves up to 71.05% computational reduction in inference, while maintaining competitive OOD detection performance.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
A new regularization method for a parameter identification problem in a non-linear partial differential equation
Authors:
M Thamban Nair,
Samprita Das Roy
Abstract:
We consider a parameter identification problem related to a quasi-linear elliptic Neumann boundary value problem involving a parameter function $a(\cdot)$ and the solution $u(\cdot)$, where the problem is to identify $a(\cdot)$ on an interval $I:= g(Γ)$ from the knowledge of the solution $u(\cdot)$ as $g$ on $Γ$, where $Γ$ is a given curve on the boundary of the domain $Ω\subseteq \mathbb{R}^3$ of…
▽ More
We consider a parameter identification problem related to a quasi-linear elliptic Neumann boundary value problem involving a parameter function $a(\cdot)$ and the solution $u(\cdot)$, where the problem is to identify $a(\cdot)$ on an interval $I:= g(Γ)$ from the knowledge of the solution $u(\cdot)$ as $g$ on $Γ$, where $Γ$ is a given curve on the boundary of the domain $Ω\subseteq \mathbb{R}^3$ of the problem and $g$ is a continuous function. For obtaining stable approximate solutions, we consider new regularization method which gives error estimates similar to, and in certain cases better than, the classical Tikhonov regularization considered in the literature in recent past.
△ Less
Submitted 23 April, 2020; v1 submitted 23 February, 2020;
originally announced February 2020.
-
Early Detection of Parkinson's Disease through Patient Questionnaire and Predictive Modelling
Authors:
R Prashanth,
Sumantra Dutta Roy
Abstract:
Early detection of Parkinson's disease (PD) is important which can enable early initiation of therapeutic interventions and management strategies. However, methods for early detection still remain an unmet clinical need in PD. In this study, we use the Patient Questionnaire (PQ) portion from the widely used Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to develop p…
▽ More
Early detection of Parkinson's disease (PD) is important which can enable early initiation of therapeutic interventions and management strategies. However, methods for early detection still remain an unmet clinical need in PD. In this study, we use the Patient Questionnaire (PQ) portion from the widely used Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS) to develop prediction models that can classify early PD from healthy normal using machine learning techniques that are becoming popular in biomedicine: logistic regression, random forests, boosted trees and support vector machine (SVM). We carried out both subject-wise and record-wise validation for evaluating the machine learning techniques. We observe that these techniques perform with high accuracy and high area under the ROC curve (both >95%) in classifying early PD and healthy normal. The logistic model demonstrated statistically significant fit to the data indicating its usefulness as a predictive model. It is inferred that these prediction models have the potential to aid clinicians in the diagnostic process by joining the items of a questionnaire through machine learning.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
Novel and Improved Stage Estimation in Parkinson's Disease using Clinical Scales and Machine Learning
Authors:
R. Prashanth,
Sumantra Dutta Roy
Abstract:
The stage and severity of Parkinson's disease (PD) is an important factor to consider for taking effective therapeutic decisions. Although the Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS) provides an effective instrument evaluating the most pertinent features of PD, it does not allow PD staging. On the other hand, the Hoehn and Yahr (HY) scale which provides stagi…
▽ More
The stage and severity of Parkinson's disease (PD) is an important factor to consider for taking effective therapeutic decisions. Although the Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS) provides an effective instrument evaluating the most pertinent features of PD, it does not allow PD staging. On the other hand, the Hoehn and Yahr (HY) scale which provides staging, does not evaluate many relevant features of PD. In this paper, we propose a novel and improved staging for PD using the MDS-UPDRS features and the HY scale, and developing prediction models to estimate the stage (normal, early or moderate) and severity of PD using machine learning techniques such as ordinal logistic regression (OLR), support vector machine (SVM), AdaBoost- and RUSBoost-based classifiers. Along with this, feature importance in PD is also estimated using Random forests. We observe that the predictive models of SVM, Adaboost-based ensemble, Random forests and probabilistic generative model performed well with the AdaBoost-based ensemble giving the highest accuracy of 97.46%. Body bradykinesia, tremor, facial expression (hypomimia), constancy of rest tremor and handwriting (micrographia) were observed to be the most important features in PD. It is inferred that MDS-UPDRS combined with classifiers can form effective tools to predict PD staging which can aid clinicians in the diagnostic process.
△ Less
Submitted 29 May, 2018;
originally announced May 2018.
-
High Accuracy Classification of Parkinson's Disease through Shape Analysis and Surface Fitting in $^{123}$I-Ioflupane SPECT Imaging
Authors:
R. Prashanth,
Sumantra Dutta Roy,
Pravat K. Mandal,
Shantanu Ghosh
Abstract:
Early and accurate identification of parkinsonian syndromes (PS) involving presynaptic degeneration from non-degenerative variants such as Scans Without Evidence of Dopaminergic Deficit (SWEDD) and tremor disorders, is important for effective patient management as the course, therapy and prognosis differ substantially between the two groups. In this study, we use Single Photon Emission Computed To…
▽ More
Early and accurate identification of parkinsonian syndromes (PS) involving presynaptic degeneration from non-degenerative variants such as Scans Without Evidence of Dopaminergic Deficit (SWEDD) and tremor disorders, is important for effective patient management as the course, therapy and prognosis differ substantially between the two groups. In this study, we use Single Photon Emission Computed Tomography (SPECT) images from healthy normal, early PD and SWEDD subjects, as obtained from the Parkinson's Progression Markers Initiative (PPMI) database, and process them to compute shape- and surface fitting-based features for the three groups. We use these features to develop and compare various classification models that can discriminate between scans showing dopaminergic deficit, as in PD, from scans without the deficit, as in healthy normal or SWEDD. Along with it, we also compare these features with Striatal Binding Ratio (SBR)-based features, which are well-established and clinically used, by computing a feature importance score using Random forests technique. We observe that the Support Vector Machine (SVM) classifier gave the best performance with an accuracy of 97.29%. These features also showed higher importance than the SBR-based features. We infer from the study that shape analysis and surface fitting are useful and promising methods for extracting discriminatory features that can be used to develop diagnostic models that might have the potential to help clinicians in the diagnostic process.
△ Less
Submitted 4 March, 2017;
originally announced March 2017.
-
Extraction of Layout Entities and Sub-layout Query-based Retrieval of Document Images
Authors:
Anukriti Bansal,
Sumantra Dutta Roy,
Gaurav Harit
Abstract:
Layouts and sub-layouts constitute an important clue while searching a document on the basis of its structure, or when textual content is unknown/irrelevant. A sub-layout specifies the arrangement of document entities within a smaller portion of the document. We propose an efficient graph-based matching algorithm, integrated with hash-based indexing, to prune a possibly large search space. A user…
▽ More
Layouts and sub-layouts constitute an important clue while searching a document on the basis of its structure, or when textual content is unknown/irrelevant. A sub-layout specifies the arrangement of document entities within a smaller portion of the document. We propose an efficient graph-based matching algorithm, integrated with hash-based indexing, to prune a possibly large search space. A user can specify a combination of sub-layouts of interest using sketch-based queries. The system supports partial matching for unspecified layout entities. We handle cases of segmentation pre-processing errors (for text/non-text blocks) with a symmetry maximization-based strategy, and accounting for multiple domain-specific plausible segmentation hypotheses. We show promising results of our system on a database of unstructured entities, containing 4776 newspaper images.
△ Less
Submitted 9 September, 2016;
originally announced September 2016.