Search | arXiv e-print repository

The Impact of Initialization on LoRA Finetuning Dynamics

Authors: Soufiane Hayou, Nikhil Ghosh, Bin Yu

Abstract: In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021). Essentially, to start from the pretrained model as initialization for finetuning, one can either initialize B to zero and A to random (default initialization in PEFT package), or vice-versa. In both cases, the product BA is equal to zero at initialization, which makes fine… ▽ More In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021). Essentially, to start from the pretrained model as initialization for finetuning, one can either initialize B to zero and A to random (default initialization in PEFT package), or vice-versa. In both cases, the product BA is equal to zero at initialization, which makes finetuning starts from the pretrained model. These two initialization schemes are seemingly similar. They should in-principle yield the same performance and share the same optimal learning rate. We demonstrate that this is an incorrect intuition and that the first scheme (initializing B to zero and A to random) on average yields better performance compared to the other scheme. Our theoretical analysis shows that the reason behind this might be that the first initialization allows the use of larger learning rates (without causing output instability) compared to the second initialization, resulting in more efficient learning of the first scheme. We validate our results with extensive experiments on LLMs. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: TDLR: Different Initializations lead to completely different finetuning dynamics. One initialization (set A random and B zero) is generally better than the natural opposite initialization. arXiv admin note: text overlap with arXiv:2402.12354

arXiv:2403.04416 [pdf, other]

iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement Learning

Authors: Debasmita Dey, Nirnay Ghosh

Abstract: Routing Protocol for Low Power and Lossy Networks (RPL) is the de-facto routing standard in IoT networks. It enables nodes to collaborate and autonomously build ad-hoc networks modeled by tree-like destination-oriented direct acyclic graphs (DODAG). Despite its widespread usage in industry and healthcare domains, RPL is susceptible to insider attacks. Although the state-of-the-art RPL ensures that… ▽ More Routing Protocol for Low Power and Lossy Networks (RPL) is the de-facto routing standard in IoT networks. It enables nodes to collaborate and autonomously build ad-hoc networks modeled by tree-like destination-oriented direct acyclic graphs (DODAG). Despite its widespread usage in industry and healthcare domains, RPL is susceptible to insider attacks. Although the state-of-the-art RPL ensures that only authenticated nodes participate in DODAG, such hard security measures are still inadequate to prevent insider threats. This entails a need to integrate soft security mechanisms to support decision-making. This paper proposes iTRPL, an intelligent and behavior-based framework that incorporates trust to segregate honest and malicious nodes within a DODAG. It also leverages multi-agent reinforcement learning (MARL) to make autonomous decisions concerning the DODAG. The framework enables a parent node to compute the trust for its child and decide if the latter can join the DODAG. It tracks the behavior of the child node, updates the trust, computes the rewards (or penalties), and shares with the root. The root aggregates the rewards/penalties of all nodes, computes the overall return, and decides via its $ε$-Greedy MARL module if the DODAG will be retained or modified for the future. A simulation-based performance evaluation demonstrates that iTRPL learns to make optimal decisions with time. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.12354 [pdf, other]

LoRA+: Efficient Low Rank Adaptation of Large Models

Authors: Soufiane Hayou, Nikhil Ghosh, Bin Yu

Abstract: In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension). This is due to the fact that adapter matrices A and B in LoRA are updated with the same learning rate. Using scaling arguments for large width networks, we demonstrate that using the same learning rate for A and B does… ▽ More In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension). This is due to the fact that adapter matrices A and B in LoRA are updated with the same learning rate. Using scaling arguments for large width networks, we demonstrate that using the same learning rate for A and B does not allow efficient feature learning. We then show that this suboptimality of LoRA can be corrected simply by setting different learning rates for the LoRA adapter matrices A and B with a well-chosen ratio. We call this proposed algorithm LoRA$+$. In our extensive experiments, LoRA$+$ improves performance (1-2 $\%$ improvements) and finetuning speed (up to $\sim$ 2X SpeedUp), at the same computational cost as LoRA. △ Less

Submitted 4 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: 27 pages

arXiv:2401.06657 [pdf, other]

Accelerating Tactile Internet with QUIC: A Security and Privacy Perspective

Authors: Jayasree Sengupta, Debasmita Dey, Simone Ferlin, Nirnay Ghosh, Vaibhav Bajpai

Abstract: The Tactile Internet paradigm is set to revolutionize human society by enabling skill-set delivery and haptic communication over ultra-reliable, low-latency networks. The emerging sixth-generation (6G) mobile communication systems are envisioned to underpin this Tactile Internet ecosystem at the network edge by providing ubiquitous global connectivity. However, apart from a multitude of opportunit… ▽ More The Tactile Internet paradigm is set to revolutionize human society by enabling skill-set delivery and haptic communication over ultra-reliable, low-latency networks. The emerging sixth-generation (6G) mobile communication systems are envisioned to underpin this Tactile Internet ecosystem at the network edge by providing ubiquitous global connectivity. However, apart from a multitude of opportunities of the Tactile Internet, security and privacy challenges emerge at the forefront. We believe that the recently standardized QUIC protocol, characterized by end-to-end encryption and reduced round-trip delay would serve as the backbone of Tactile Internet. In this article, we envision a futuristic scenario where a QUIC-enabled network uses the underlying 6G communication infrastructure to achieve the requirements for Tactile Internet. Interestingly this requires a deeper investigation of a wide range of security and privacy challenges in QUIC, that need to be mitigated for its adoption in Tactile Internet. Henceforth, this article reviews the existing security and privacy attacks in QUIC and their implication on users. Followed by that, we discuss state-of-the-art attack mitigation strategies and investigate some of their drawbacks with possible directions for future work △ Less

Submitted 31 January, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: 7 pages, 3 figures, 1 table

arXiv:2311.14646 [pdf, other]

More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

Authors: James B. Simon, Dhruva Karkada, Nikhil Ghosh, Mikhail Belkin

Abstract: In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in… ▽ More In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in random feature (RF) regression, a class of models equivalent to shallow networks with only the last layer trained. Concretely, we first show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples, provided the ridge penalty is tuned optimally. In particular, this implies that infinite width RF architectures are preferable to those of any finite width. We then proceed to demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory: near-optimal performance can only be achieved when the training error is much smaller than the test error. Grounding our theory in real-world data, we find empirically that standard computer vision tasks with convolutional neural tangent kernels clearly fall into this class. Taken together, our results tell a simple, testable story of the benefits of overparameterization, overfitting, and more data in random feature models. △ Less

Submitted 15 May, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

Comments: Appeared in ICLR 2024

arXiv:2310.15202 [pdf, ps, other]

Predicting Transcription Factor Binding Sites using Transformer based Capsule Network

Authors: Nimisha Ghosh, Daniele Santoni, Indrajit Saha, Giovanni Felici

Abstract: Prediction of binding sites for transcription factors is important to understand how they regulate gene expression and how this regulation can be modulated for therapeutic purposes. Although in the past few years there are significant works addressing this issue, there is still space for improvement. In this regard, a transformer based capsule network viz. DNABERT-Cap is proposed in this work to p… ▽ More Prediction of binding sites for transcription factors is important to understand how they regulate gene expression and how this regulation can be modulated for therapeutic purposes. Although in the past few years there are significant works addressing this issue, there is still space for improvement. In this regard, a transformer based capsule network viz. DNABERT-Cap is proposed in this work to predict transcription factor binding sites mining ChIP-seq datasets. DNABERT-Cap is a bidirectional encoder pre-trained with large number of genomic DNA sequences, empowered with a capsule layer responsible for the final prediction. The proposed model builds a predictor for transcription factor binding sites using the joint optimisation of features encompassing both bidirectional encoder and capsule layer, along with convolutional and bidirectional long-short term memory layers. To evaluate the efficiency of the proposed approach, we use a benchmark ChIP-seq datasets of five cell lines viz. A549, GM12878, Hep-G2, H1-hESC and Hela, available in the ENCODE repository. The results show that the average area under the receiver operating characteristic curve score exceeds 0.91 for all such five cell lines. DNABERT-Cap is also compared with existing state-of-the-art deep learning based predictors viz. DeepARC, DeepTF, CNN-Zeng and DeepBind, and is seen to outperform them. △ Less

Submitted 28 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2308.03215 [pdf, other]

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

Authors: Nikhil Ghosh, Spencer Frei, Wooseok Ha, Bin Yu

Abstract: In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activation on orthogonal data. We show that for this non-convex problem, randomly initialized SGD with a constant step size successfully finds a global minimum for any batch size choice. However, the particular global minimum found depends upon the batch size… ▽ More In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activation on orthogonal data. We show that for this non-convex problem, randomly initialized SGD with a constant step size successfully finds a global minimum for any batch size choice. However, the particular global minimum found depends upon the batch size. In the full-batch setting, we show that the solution is dense (i.e., not sparse) and is highly aligned with its initialized direction, showing that relatively little feature learning occurs. On the other hand, for any batch size strictly smaller than the number of samples, SGD finds a global minimum which is sparse and nearly orthogonal to its initialization, showing that the randomness of stochastic gradients induces a qualitatively different type of "feature selection" in this setting. Moreover, if we measure the sharpness of the minimum by the trace of the Hessian, the minima found with full batch gradient descent are flatter than those found with strictly smaller batch sizes, in contrast to previous works which suggest that large batches lead to sharper minima. To prove convergence of SGD with a constant step size, we introduce a powerful tool from the theory of non-homogeneous random walks which may be of independent interest. △ Less

Submitted 6 August, 2023; originally announced August 2023.

arXiv:2302.00003 [pdf, other]

The Power of External Memory in Increasing Predictive Model Capacity

Authors: Cenk Baykal, Dylan J Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang

Abstract: One way of introducing sparsity into deep networks is by attaching an external table of parameters that is sparsely looked up at different layers of the network. By storing the bulk of the parameters in the external table, one can increase the capacity of the model without necessarily increasing the inference time. Two crucial questions in this setting are then: what is the lookup function for acc… ▽ More One way of introducing sparsity into deep networks is by attaching an external table of parameters that is sparsely looked up at different layers of the network. By storing the bulk of the parameters in the external table, one can increase the capacity of the model without necessarily increasing the inference time. Two crucial questions in this setting are then: what is the lookup function for accessing the table and how are the contents of the table consumed? Prominent methods for accessing the table include 1) using words/wordpieces token-ids as table indices, 2) LSH hashing the token vector in each layer into a table of buckets, and 3) learnable softmax style routing to a table entry. The ways to consume the contents include adding/concatenating to input representation, and using the contents as expert networks that specialize to different inputs. In this work, we conduct rigorous experimental evaluations of existing ideas and their combinations. We also introduce a new method, alternating updates, that enables access to an increased token dimension without increasing the computation time, and demonstrate its effectiveness in language modeling. △ Less

Submitted 30 January, 2023; originally announced February 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2301.13310

arXiv:2301.13310 [pdf, other]

Alternating Updates for Efficient Transformers

Authors: Cenk Baykal, Dylan Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang

Abstract: It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. We introduce Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of t… ▽ More It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. We introduce Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation, i.e., the token embedding, while only incurring a negligible increase in latency. AltUp achieves this by working on a subblock of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks. We present extensions of AltUp, such as its applicability to the sequence dimension, and demonstrate how AltUp can be synergistically combined with existing approaches, such as Sparse Mixture-of-Experts models, to obtain efficient models with even higher capacity. Our experiments on benchmark transformer models and language tasks demonstrate the consistent effectiveness of AltUp on a diverse set of scenarios. Notably, on SuperGLUE and SQuAD benchmarks, AltUp enables up to $87\%$ speedup relative to the dense baselines at the same accuracy. △ Less

Submitted 3 October, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2207.11621 [pdf, other]

A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

Authors: Nikhil Ghosh, Mikhail Belkin

Abstract: In this work we establish an algorithm and distribution independent non-asymptotic trade-off between the model size, excess test loss, and training loss of linear predictors. Specifically, we show that models that perform well on the test data (have low excess loss) are either "classical" -- have training loss close to the noise level, or are "modern" -- have a much larger number of parameters com… ▽ More In this work we establish an algorithm and distribution independent non-asymptotic trade-off between the model size, excess test loss, and training loss of linear predictors. Specifically, we show that models that perform well on the test data (have low excess loss) are either "classical" -- have training loss close to the noise level, or are "modern" -- have a much larger number of parameters compared to the minimum needed to fit the training data exactly. We also provide a more precise asymptotic analysis when the limiting spectral distribution of the whitened features is Marchenko-Pastur. Remarkably, while the Marchenko-Pastur analysis is far more precise near the interpolation peak, where the number of parameters is just enough to fit the training data, it coincides exactly with the distribution independent bound as the level of overparametrization increases. △ Less

Submitted 18 April, 2023; v1 submitted 23 July, 2022; originally announced July 2022.

Comments: Further polished writing

arXiv:2202.09931 [pdf, other]

Deconstructing Distributions: A Pointwise Framework of Learning

Authors: Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran

Abstract: In machine learning, we traditionally evaluate the performance of a single model, averaged over a collection of test inputs. In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$. Specifically, we study a point's $\textit{profile}$: the relationship between models' average performance on the test distribution… ▽ More In machine learning, we traditionally evaluate the performance of a single model, averaged over a collection of test inputs. In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$. Specifically, we study a point's $\textit{profile}$: the relationship between models' average performance on the test distribution and their pointwise performance on this individual point. We find that profiles can yield new insights into the structure of both models and data -- in and out-of-distribution. For example, we empirically show that real data distributions consist of points with qualitatively different profiles. On one hand, there are "compatible" points with strong correlation between the pointwise and average performance. On the other hand, there are points with weak and even $\textit{negative}$ correlation: cases where improving overall model accuracy actually $\textit{hurts}$ performance on these inputs. We prove that these experimental observations are inconsistent with the predictions of several simplified models of learning proposed in prior work. As an application, we use profiles to construct a dataset we call CIFAR-10-NEG: a subset of CINIC-10 such that for standard models, accuracy on CIFAR-10-NEG is $\textit{negatively correlated}$ with accuracy on CIFAR-10 test. This illustrates, for the first time, an OOD dataset that completely inverts "accuracy-on-the-line" (Miller, Taori, Raghunathan, Sagawa, Koh, Shankar, Liang, Carmon, and Schmidt 2021) △ Less

Submitted 7 June, 2022; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: GK and NG contributed equally. v2: Added Figures 4, 5

arXiv:2202.07448 [pdf, other]

Towards a Unified Pandemic Management Architecture: Survey, Challenges and Future Directions

Authors: Satyaki Roy, Nirnay Ghosh, Nitish Uplavikar, Preetam Ghosh

Abstract: The pandemic caused by SARS-CoV-2 has left an unprecedented impact on health, economy and society worldwide. Emerging strains are making pandemic management increasingly challenging. There is an urge to collect epidemiological, clinical, and physiological data to make an informed decision on mitigation measures. Advances in the Internet of Things (IoT) and edge computing provide solutions for pand… ▽ More The pandemic caused by SARS-CoV-2 has left an unprecedented impact on health, economy and society worldwide. Emerging strains are making pandemic management increasingly challenging. There is an urge to collect epidemiological, clinical, and physiological data to make an informed decision on mitigation measures. Advances in the Internet of Things (IoT) and edge computing provide solutions for pandemic management through data collection and intelligent computation. While existing data-driven architectures attempt to automate decision-making, they do not capture the multifaceted interaction among computational models, communication infrastructure, and the generated data. In this paper, we perform a survey of the existing approaches for pandemic management, including online data repositories and contact-tracing applications. We then envision a unified pandemic management architecture that leverages the IoT and edge computing to automate recommendations on vaccine distribution, dynamic lockdown, mobility scheduling and pandemic prediction. We elucidate the flow of data among the layers of the architecture, namely, cloud, edge and end device layers. Moreover, we address the privacy implications, threats, regulations, and existing solutions that may be adapted to optimize the utility of health data with security guarantees. The paper ends with a lowdown on the limitations of the architecture and research directions to enhance its practicality. △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: 30 pages and 10 figures

arXiv:2201.11638 [pdf, other]

doi 10.1109/iSES52644.2021.00069

Reuse-Aware Cache Partitioning Framework for Data-Sharing Multicore Systems

Authors: Soma N. Ghosh, Vineet Sahula, Lava Bhargava

Abstract: Multi-core processors improve performance, but they can create unpredictability owing to shared resources such as caches interfering. Cache partitioning is used to alleviate the Worst-Case Execution Time (WCET) estimation by isolating the shared cache across each thread to reduce interference. It does, however, prohibit data from being transferred between parallel threads running on different core… ▽ More Multi-core processors improve performance, but they can create unpredictability owing to shared resources such as caches interfering. Cache partitioning is used to alleviate the Worst-Case Execution Time (WCET) estimation by isolating the shared cache across each thread to reduce interference. It does, however, prohibit data from being transferred between parallel threads running on different cores. In this paper we present (SRCP) a cache replacement mechanism for partitioned caches that is aware of data being shared across threads, prevents shared data from being replicated across partitions and frequently used data from being evicted from caches. Our technique outperforms TA-DRRIP and EHC, which are existing state-of-the-art cache replacement algorithms, by 13.34% in cache hit-rate and 10.4% in performance over LRU (least recently used) cache replacement policy. △ Less

Submitted 17 January, 2022; originally announced January 2022.

Comments: 2 pages. 7th IEEE International Symposium on Smart Electronic Systems (iSES) 2021

ACM Class: C.1.4; D.4.2

arXiv:2111.07167 [pdf, other]

The Three Stages of Learning Dynamics in High-Dimensional Kernel Methods

Authors: Nikhil Ghosh, Song Mei, Bin Yu

Abstract: To understand how deep learning works, it is crucial to understand the training dynamics of neural networks. Several interesting hypotheses about these dynamics have been made based on empirically observed phenomena, but there exists a limited theoretical understanding of when and why such phenomena occur. In this paper, we consider the training dynamics of gradient flow on kernel least-squares… ▽ More To understand how deep learning works, it is crucial to understand the training dynamics of neural networks. Several interesting hypotheses about these dynamics have been made based on empirically observed phenomena, but there exists a limited theoretical understanding of when and why such phenomena occur. In this paper, we consider the training dynamics of gradient flow on kernel least-squares objectives, which is a limiting dynamics of SGD trained neural networks. Using precise high-dimensional asymptotics, we characterize the dynamics of the fitted model in two "worlds": in the Oracle World the model is trained on the population distribution and in the Empirical World the model is trained on a sampled dataset. We show that under mild conditions on the kernel and $L^2$ target regression function the training dynamics undergo three stages characterized by the behaviors of the models in the two worlds. Our theoretical results also mathematically formalize some interesting deep learning phenomena. Specifically, in our setting we show that SGD progressively learns more complex functions and that there is a "deep bootstrap" phenomenon: during the second stage, the test error of both worlds remain close despite the empirical training error being much smaller. Finally, we give a concrete example comparing the dynamics of two different kernels which shows that faster training is not necessary for better generalization. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2007.06201 [pdf, other]

The Blockchain Based Auditor on Secret key Life Cycle in Reconfigurable Platform

Authors: Rourab Paul, Nimisha Ghosh, Amlan Chakrabarti, Prasant Mahapatra

Abstract: The growing sophistication of cyber attacks, vulnerabilities in high computing systems and increasing dependency on cryptography to protect our digital data make it more important to keep secret keys safe and secure. Few major issues on secret keys like incorrect use of keys, inappropriate storage of keys, inadequate protection of keys, insecure movement of keys, lack of audit logging, insider thr… ▽ More The growing sophistication of cyber attacks, vulnerabilities in high computing systems and increasing dependency on cryptography to protect our digital data make it more important to keep secret keys safe and secure. Few major issues on secret keys like incorrect use of keys, inappropriate storage of keys, inadequate protection of keys, insecure movement of keys, lack of audit logging, insider threats and non-destruction of keys can compromise the whole security system dangerously. In this article, we have proposed and implemented an isolated secret key memory which can log life cycle of secret keys cryptographically using blockchain (BC) technology. We have also implemented a special custom bus interconnect which receives custom crypto instruction from Processing Element (PE). During the execution of crypto instructions, the architecture assures that secret key will never come in the processor area and the movement of secret keys to various crypto core is recorded cryptographically after the proper authentication process controlled by proposed hardware based BC. To the best of our knowledge, this is the first work which uses blockchain based solution to address the issues of the life cycle of the secret keys in hardware platform. The additional cost of resource usage and timing complexity we spent to implement the proposed idea is very nominal. We have used Xilinx Vivado EDA tool and Artix 7 FPGA board. △ Less

Submitted 13 July, 2020; originally announced July 2020.

Comments: Manuscript

arXiv:2004.11726 [pdf, other]

A Two-Stage Multiple Instance Learning Framework for the Detection of Breast Cancer in Mammograms

Authors: Sarath Chandra K, Arunava Chakravarty, Nirmalya Ghosh, Tandra Sarkar, Ramanathan Sethuraman, Debdoot Sheet

Abstract: Mammograms are commonly employed in the large scale screening of breast cancer which is primarily characterized by the presence of malignant masses. However, automated image-level detection of malignancy is a challenging task given the small size of the mass regions and difficulty in discriminating between malignant, benign mass and healthy dense fibro-glandular tissue. To address these issues, we… ▽ More Mammograms are commonly employed in the large scale screening of breast cancer which is primarily characterized by the presence of malignant masses. However, automated image-level detection of malignancy is a challenging task given the small size of the mass regions and difficulty in discriminating between malignant, benign mass and healthy dense fibro-glandular tissue. To address these issues, we explore a two-stage Multiple Instance Learning (MIL) framework. A Convolutional Neural Network (CNN) is trained in the first stage to extract local candidate patches in the mammograms that may contain either a benign or malignant mass. The second stage employs a MIL strategy for an image level benign vs. malignant classification. A global image-level feature is computed as a weighted average of patch-level features learned using a CNN. Our method performed well on the task of localization of masses with an average Precision/Recall of 0.76/0.80 and acheived an average AUC of 0.91 on the imagelevel classification task using a five-fold cross-validation on the INbreast dataset. Restricting the MIL only to the candidate patches extracted in Stage 1 led to a significant improvement in classification performance in comparison to a dense extraction of patches from the entire mammogram. △ Less

Submitted 24 April, 2020; originally announced April 2020.

Comments: accepted in EMBC 2020, 4 pg+1 pg Supplementary

arXiv:2004.11721 [pdf, other]

Learning Decision Ensemble using a Graph Neural Network for Comorbidity Aware Chest Radiograph Screening

Authors: Arunava Chakravarty, Tandra Sarkar, Nirmalya Ghosh, Ramanathan Sethuraman, Debdoot Sheet

Abstract: Chest radiographs are primarily employed for the screening of cardio, thoracic and pulmonary conditions. Machine learning based automated solutions are being developed to reduce the burden of routine screening on Radiologists, allowing them to focus on critical cases. While recent efforts demonstrate the use of ensemble of deep convolutional neural networks(CNN), they do not take disease comorbidi… ▽ More Chest radiographs are primarily employed for the screening of cardio, thoracic and pulmonary conditions. Machine learning based automated solutions are being developed to reduce the burden of routine screening on Radiologists, allowing them to focus on critical cases. While recent efforts demonstrate the use of ensemble of deep convolutional neural networks(CNN), they do not take disease comorbidity into consideration, thus lowering their screening performance. To address this issue, we propose a Graph Neural Network (GNN) based solution to obtain ensemble predictions which models the dependencies between different diseases. A comprehensive evaluation of the proposed method demonstrated its potential by improving the performance over standard ensembling technique across a wide range of ensemble constructions. The best performance was achieved using the GNN ensemble of DenseNet121 with an average AUC of 0.821 across thirteen disease comorbidities. △ Less

Submitted 24 April, 2020; originally announced April 2020.

Comments: accepted in EMBC 2020, 4pg+2pg Supplementary Material

arXiv:2004.11693 [pdf, other]

A Systematic Search over Deep Convolutional Neural Network Architectures for Screening Chest Radiographs

Authors: Arka Mitra, Arunava Chakravarty, Nirmalya Ghosh, Tandra Sarkar, Ramanathan Sethuraman, Debdoot Sheet

Abstract: Chest radiographs are primarily employed for the screening of pulmonary and cardio-/thoracic conditions. Being undertaken at primary healthcare centers, they require the presence of an on-premise reporting Radiologist, which is a challenge in low and middle income countries. This has inspired the development of machine learning based automation of the screening process. While recent efforts demons… ▽ More Chest radiographs are primarily employed for the screening of pulmonary and cardio-/thoracic conditions. Being undertaken at primary healthcare centers, they require the presence of an on-premise reporting Radiologist, which is a challenge in low and middle income countries. This has inspired the development of machine learning based automation of the screening process. While recent efforts demonstrate a performance benchmark using an ensemble of deep convolutional neural networks (CNN), our systematic search over multiple standard CNN architectures identified single candidate CNN models whose classification performances were found to be at par with ensembles. Over 63 experiments spanning 400 hours, executed on a 11:3 FP32 TensorTFLOPS compute system, we found the Xception and ResNet-18 architectures to be consistent performers in identifying co-existing disease conditions with an average AUC of 0.87 across nine pathologies. We conclude on the reliability of the models by assessing their saliency maps generated using the randomized input sampling for explanation (RISE) method and qualitatively validating them against manual annotations locally sourced from an experienced Radiologist. We also draw a critical note on the limitations of the publicly available CheXpert dataset primarily on account of disparity in class distribution in training vs. testing sets, and unavailability of sufficient samples for few classes, which hampers quantitative reporting due to sample insufficiency. △ Less

Submitted 24 April, 2020; originally announced April 2020.

Comments: accepted in EMBC 2020, 4 pages+2 page Appendix

arXiv:2002.03639 [pdf, ps, other]

iDCR: Improved Dempster Combination Rule for Multisensor Fault Diagnosis

Authors: Nimisha Ghosh, Sayantan Saha, Rourab Paul

Abstract: Data gathered from multiple sensors can be effectively fused for accurate monitoring of many engineering applications. In the last few years, one of the most sought after applications for multi sensor fusion has been fault diagnosis. Dempster-Shafer Theory of Evidence along with Dempsters Combination Rule is a very popular method for multi sensor fusion which can be successfully applied to fault d… ▽ More Data gathered from multiple sensors can be effectively fused for accurate monitoring of many engineering applications. In the last few years, one of the most sought after applications for multi sensor fusion has been fault diagnosis. Dempster-Shafer Theory of Evidence along with Dempsters Combination Rule is a very popular method for multi sensor fusion which can be successfully applied to fault diagnosis. But if the information obtained from the different sensors shows high conflict, the classical Dempsters Combination Rule may produce counter-intuitive result. To overcome this shortcoming, this paper proposes an improved combination rule for multi sensor data fusion. Numerical examples have been put forward to show the effectiveness of the proposed method. Comparative analysis has also been carried out with existing methods to show the superiority of the proposed method in multi sensor fault diagnosis. △ Less

Submitted 10 February, 2020; originally announced February 2020.

arXiv:1910.12379 [pdf, other]

Landmark Ordinal Embedding

Authors: Nikhil Ghosh, Yuxin Chen, Yisong Yue

Abstract: In this paper, we aim to learn a low-dimensional Euclidean representation from a set of constraints of the form "item j is closer to item i than item k". Existing approaches for this "ordinal embedding" problem require expensive optimization procedures, which cannot scale to handle increasingly larger datasets. To address this issue, we propose a landmark-based strategy, which we call Landmark Ord… ▽ More In this paper, we aim to learn a low-dimensional Euclidean representation from a set of constraints of the form "item j is closer to item i than item k". Existing approaches for this "ordinal embedding" problem require expensive optimization procedures, which cannot scale to handle increasingly larger datasets. To address this issue, we propose a landmark-based strategy, which we call Landmark Ordinal Embedding (LOE). Our approach trades off statistical efficiency for computational efficiency by exploiting the low-dimensionality of the latent embedding. We derive bounds establishing the statistical consistency of LOE under the popular Bradley-Terry-Luce noise model. Through a rigorous analysis of the computational complexity, we show that LOE is significantly more efficient than conventional ordinal embedding approaches as the number of items grows. We validate these characterizations empirically on both synthetic and real datasets. We also present a practical approach that achieves the "best of both worlds", by using LOE to warm-start existing methods that are more statistically efficient but computationally expensive. △ Less

Submitted 27 October, 2019; originally announced October 2019.

Comments: NeurIPS 2019

arXiv:1908.11538 [pdf, other]

IoT based Smart Access Controlled Secure Smart City Architecture Using Blockchain

Authors: Rourab Paul, Nimisha Ghosh, Suman Sau, Amlan Chakrabarti, Prasant Mahapatra

Abstract: Standard security protocols like SSL, TLS, IPSec etc. have high memory and processor consumption which makes all these security protocols unsuitable for resource constrained platforms such as Internet of Things (IoT). Blockchain (BC) finds its efficient application in IoT platform to preserve the five basic cryptographic primitives, such as confidentiality, authenticity, integrity, availability an… ▽ More Standard security protocols like SSL, TLS, IPSec etc. have high memory and processor consumption which makes all these security protocols unsuitable for resource constrained platforms such as Internet of Things (IoT). Blockchain (BC) finds its efficient application in IoT platform to preserve the five basic cryptographic primitives, such as confidentiality, authenticity, integrity, availability and non-repudiation. Conventional adoption of BC in IoT platform causes high energy consumption, delay and computational overhead which are not appropriate for various resource constrained IoT devices. This work proposes a machine learning (ML) based smart access control framework in a public and a private BC for a smart city application which makes it more efficient as compared to the existing IoT applications. The proposed IoT based smart city architecture adopts BC technology for preserving all the cryptographic security and privacy issues. Moreover, BC has very minimal overhead on IoT platform as well. This work investigates the existing threat models and critical access control issues which handle multiple permissions of various nodes and detects relevant inconsistencies to notify the corresponding nodes. Comparison in terms of all security issues with existing literature shows that the proposed architecture is competitively efficient in terms of security access control. △ Less

Submitted 9 September, 2019; v1 submitted 30 August, 2019; originally announced August 2019.

Comments: Manuscript

arXiv:1908.01176 [pdf, other]

Adversarially Trained Convolutional Neural Networks for Semantic Segmentation of Ischaemic Stroke Lesion using Multisequence Magnetic Resonance Imaging

Authors: Rachana Sathish, Ronnie Rajan, Anusha Vupputuri, Nirmalya Ghosh, Debdoot Sheet

Abstract: Ischaemic stroke is a medical condition caused by occlusion of blood supply to the brain tissue thus forming a lesion. A lesion is zoned into a core associated with irreversible necrosis typically located at the center of the lesion, while reversible hypoxic changes in the outer regions of the lesion are termed as the penumbra. Early estimation of core and penumbra in ischaemic stroke is crucial f… ▽ More Ischaemic stroke is a medical condition caused by occlusion of blood supply to the brain tissue thus forming a lesion. A lesion is zoned into a core associated with irreversible necrosis typically located at the center of the lesion, while reversible hypoxic changes in the outer regions of the lesion are termed as the penumbra. Early estimation of core and penumbra in ischaemic stroke is crucial for timely intervention with thrombolytic therapy to reverse the damage and restore normalcy. Multisequence magnetic resonance imaging (MRI) is commonly employed for clinical diagnosis. However, a sequence singly has not been found to be sufficiently able to differentiate between core and penumbra, while a combination of sequences is required to determine the extent of the damage. The challenge, however, is that with an increase in the number of sequences, it cognitively taxes the clinician to discover symptomatic biomarkers in these images. In this paper, we present a data-driven fully automated method for estimation of core and penumbra in ischaemic lesions using diffusion-weighted imaging (DWI) and perfusion-weighted imaging (PWI) sequence maps of MRI. The method employs recent developments in convolutional neural networks (CNN) for semantic segmentation in medical images. In the absence of availability of a large amount of labeled data, the CNN is trained using an adversarial approach employing cross-entropy as a segmentation loss along with losses aggregated from three discriminators of which two employ relativistic visual Turing test. This method is experimentally validated on the ISLES-2015 dataset through three-fold cross-validation to obtain with an average Dice score of 0.82 and 0.73 for segmentation of penumbra and core respectively. △ Less

Submitted 3 August, 2019; originally announced August 2019.

arXiv:1906.09769 [pdf, ps, other]

Fault Matters: Sensor Data Fusion for Detection of Faults using Dempster-Shafer Theory of Evidence in IoT-Based Applications

Authors: Nimisha Ghosh, Rourab Paul, Satyabrata Maity, Krishanu Maity, Sayantan Saha

Abstract: Fault detection in sensor nodes is a pertinent issue that has been an important area of research for a very long time. But it is not explored much as yet in the context of Internet of Things. Internet of Things work with a massive amount of data so the responsibility for guaranteeing the accuracy of the data also lies with it. Moreover, a lot of important and critical decisions are made based on t… ▽ More Fault detection in sensor nodes is a pertinent issue that has been an important area of research for a very long time. But it is not explored much as yet in the context of Internet of Things. Internet of Things work with a massive amount of data so the responsibility for guaranteeing the accuracy of the data also lies with it. Moreover, a lot of important and critical decisions are made based on these data, so ensuring its correctness and accuracy is also very important. Also, the detection needs to be as precise as possible to avoid negative alerts. For this purpose, this work has adopted Dempster-Shafer Theory of Evidence which is a popular learning method to collate the information from sensors to come up with a decision regarding the faulty status of a sensor node. To verify the validity of the proposed method, simulations have been performed on a benchmark data set and data collected through a test bed in a laboratory set-up. For the different types of faults, the proposed method shows very competent accuracy for both the benchmark (99.8%) and laboratory data sets (99.9%) when compared to the other state-of-the-art machine learning techniques. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1808.09801 [pdf, other]

doi 10.1109/SMARTCOMP.2018.00091

PS-Sim: A Framework for Scalable Simulation of Participatory Sensing Data

Authors: Rajesh P Barnwal, Nirnay Ghosh, Soumya K Ghosh, Sajal K Das

Abstract: Emergence of smartphone and the participatory sensing (PS) paradigm have paved the way for a new variant of pervasive computing. In PS, human user performs sensing tasks and generates notifications, typically in lieu of incentives. These notifications are real-time, large-volume, and multi-modal, which are eventually fused by the PS platform to generate a summary. One major limitation with PS is t… ▽ More Emergence of smartphone and the participatory sensing (PS) paradigm have paved the way for a new variant of pervasive computing. In PS, human user performs sensing tasks and generates notifications, typically in lieu of incentives. These notifications are real-time, large-volume, and multi-modal, which are eventually fused by the PS platform to generate a summary. One major limitation with PS is the sparsity of notifications owing to lack of active participation, thus inhibiting large scale real-life experiments for the research community. On the flip side, research community always needs ground truth to validate the efficacy of the proposed models and algorithms. Most of the PS applications involve human mobility and report generation following sensing of any event of interest in the adjacent environment. This work is an attempt to study and empirically model human participation behavior and event occurrence distributions through development of a location-sensitive data simulation framework, called PS-Sim. From extensive experiments it has been observed that the synthetic data generated by PS-Sim replicates real participation and event occurrence behaviors in PS applications, which may be considered for validation purpose in absence of the groundtruth. As a proof-of-concept, we have used real-life dataset from a vehicular traffic management application to train the models in PS-Sim and cross-validated the simulated data with other parts of the same dataset. △ Less

Submitted 29 August, 2018; originally announced August 2018.

Comments: Published and Appeared in Proceedings of IEEE International Conference on Smart Computing (SMARTCOMP-2018)

arXiv:1805.06909 [pdf, other]

Fully Convolutional Model for Variable Bit Length and Lossy High Density Compression of Mammograms

Authors: Aupendu Kar, Sri Phani Krishna Karri, Nirmalya Ghosh, Ramanathan Sethuraman, Debdoot Sheet

Abstract: Early works on medical image compression date to the 1980's with the impetus on deployment of teleradiology systems for high-resolution digital X-ray detectors. Commercially deployed systems during the period could compress 4,096 x 4,096 sized images at 12 bpp to 2 bpp using lossless arithmetic coding, and over the years JPEG and JPEG2000 were imbibed reaching upto 0.1 bpp. Inspired by the reprise… ▽ More Early works on medical image compression date to the 1980's with the impetus on deployment of teleradiology systems for high-resolution digital X-ray detectors. Commercially deployed systems during the period could compress 4,096 x 4,096 sized images at 12 bpp to 2 bpp using lossless arithmetic coding, and over the years JPEG and JPEG2000 were imbibed reaching upto 0.1 bpp. Inspired by the reprise of deep learning based compression for natural images over the last two years, we propose a fully convolutional autoencoder for diagnostically relevant feature preserving lossy compression. This is followed by leveraging arithmetic coding for encapsulating high redundancy of features for further high-density code packing leading to variable bit length. We demonstrate performance on two different publicly available digital mammography datasets using peak signal-to-noise ratio (pSNR), structural similarity (SSIM) index and domain adaptability tests between datasets. At high density compression factors of >300x (~0.04 bpp), our approach rivals JPEG and JPEG2000 as evaluated through a Radiologist's visual Turing test. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Comments: 4 pages, 3 figures, To appear in Workshop on Learned Image Compression, CVPR 2018

arXiv:1709.03583 [pdf, other]

Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

Authors: Francesco Restuccia, Nirnay Ghosh, Shameek Bhattacharjee, Sajal Das, Tommaso Melodia

Abstract: Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary… ▽ More Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work. △ Less

Submitted 6 September, 2017; originally announced September 2017.

Comments: To appear in ACM Transactions on Sensor Networks (TOSN)

arXiv:1505.06219 [pdf]

A comparative study between proposed Hyper Kurtosis based Modified Duo-Histogram Equalization (HKMDHE) and Contrast Limited Adaptive Histogram Equalization (CLAHE) for Contrast Enhancement Purpose of Low Contrast Human Brain CT scan images

Authors: Sabyasachi Mukhopadhyay, Soham Mandal, Sawon Pratiher, Satyasaran Changdar, Ritwik Burman, Nirmalya Ghosh, Prasanta K. Panigrahi

Abstract: In this paper, a comparative study between proposed hyper kurtosis based modified duo-histogram equalization (HKMDHE) algorithm and contrast limited adaptive histogram enhancement (CLAHE) has been presented for the implementation of contrast enhancement and brightness preservation of low contrast human brain CT scan images. In HKMDHE algorithm, contrast enhancement is done on the hyper-kurtosis ba… ▽ More In this paper, a comparative study between proposed hyper kurtosis based modified duo-histogram equalization (HKMDHE) algorithm and contrast limited adaptive histogram enhancement (CLAHE) has been presented for the implementation of contrast enhancement and brightness preservation of low contrast human brain CT scan images. In HKMDHE algorithm, contrast enhancement is done on the hyper-kurtosis based application. The results are very promising of proposed HKMDHE technique with improved PSNR values and lesser AMMBE values than CLAHE technique. △ Less

Submitted 6 April, 2015; originally announced May 2015.

arXiv:1505.00192 [pdf]

Application of S-Transform on Hyper kurtosis based Modified Duo Histogram Equalized DIC images for Pre-cancer Detection

Authors: Sabyasachi Mukhopadhyay, Soham Mandal, Sawon Pratiher, Ritwik Barman, M. Venkatesh, Nirmalya Ghosh, Prasanta K. Panigrahi

Abstract: Our proposed hyper kurtosis based histogram equalized DIC images enhances the contrast by preserving the brightness. The evolution and development of precancerous activity among tissues are studied through S-transform (ST). The significant variations of amplitude spectra can be observed due to increased medium roughness from normal tissue were observed in time-frequency domain. The randomness and… ▽ More Our proposed hyper kurtosis based histogram equalized DIC images enhances the contrast by preserving the brightness. The evolution and development of precancerous activity among tissues are studied through S-transform (ST). The significant variations of amplitude spectra can be observed due to increased medium roughness from normal tissue were observed in time-frequency domain. The randomness and inhomogeneity of the tissue structures among human normal and different grades of DIC tissues is recognized by ST based timefrequency analysis. This study offers a simpler and better way to recognize the substantial changes among different stages of DIC tissues, which are reflected by spatial information containing within the inhomogeneity structures of different types of tissue. △ Less

Submitted 30 April, 2015; originally announced May 2015.

arXiv:1503.06323 [pdf]

Wavelet based approach for tissue fractal parameter measurement: Pre cancer detection

Authors: Sabyasachi Mukhopadhyay, Nandan K. Das, Soham Mandal, Sawon Pratiher, Asish Mitra, Asima Pradhan, Nirmalya Ghosh, Prasanta K. Panigrahi

Abstract: In this paper, we have carried out the detail studies of pre-cancer by wavelet coherency and multifractal based detrended fluctuation analysis (MFDFA) on differential interference contrast (DIC) images of stromal region among different grades of pre-cancer tissues. Discrete wavelet transform (DWT) through Daubechies basis has been performed for identifying fluctuations over polynomial trends for c… ▽ More In this paper, we have carried out the detail studies of pre-cancer by wavelet coherency and multifractal based detrended fluctuation analysis (MFDFA) on differential interference contrast (DIC) images of stromal region among different grades of pre-cancer tissues. Discrete wavelet transform (DWT) through Daubechies basis has been performed for identifying fluctuations over polynomial trends for clear characterization and differentiation of tissues. Wavelet coherence plots are performed for identifying the level of correlation in time scale plane between normal and various grades of DIC samples. Applying MFDFA on refractive index variations of cervical tissues, we have observed that the values of Hurst exponent (correlation) decreases from healthy (normal) to pre-cancer tissues. The width of singularity spectrum has a sudden degradation at grade-I in comparison of healthy (normal) tissue but later on it increases as cancer progresses from grade-II to grade-III. △ Less

Submitted 21 March, 2015; originally announced March 2015.

arXiv:1503.03913 [pdf]

Diagnosing Heterogeneous Dynamics for CT Scan Images of Human Brain in Wavelet and MFDFA domain

Authors: Sabyasachi Mukhopadhyay, Soham Mandal, Nandan K Das, Subhadip Dey, Asish Mitra, Nirmalya Ghosh, Prasanta K Panigrahi

Abstract: CT scan images of human brain of a particular patient in different cross sections are taken, on which wavelet transform and multi-fractal analysis are applied. The vertical and horizontal unfolding of images are done before analyzing these images. A systematic investigation of de-noised CT scan images of human brain in different cross-sections are carried out through wavelet normalized energy and… ▽ More CT scan images of human brain of a particular patient in different cross sections are taken, on which wavelet transform and multi-fractal analysis are applied. The vertical and horizontal unfolding of images are done before analyzing these images. A systematic investigation of de-noised CT scan images of human brain in different cross-sections are carried out through wavelet normalized energy and wavelet semi-log plots, which clearly points out the mismatch between results of vertical and horizontal unfolding. The mismatch of results confirms the heterogeneity in spatial domain. Using the multi-fractal de-trended fluctuation analysis (MFDFA), the mismatch between the values of Hurst exponent and width of singularity spectrum by vertical and horizontal unfolding confirms the same. △ Less

Submitted 12 March, 2015; originally announced March 2015.

Showing 1–30 of 30 results for author: Ghosh, N