Search | arXiv e-print repository

Narrow Transformer: Starcoder-Based Java-LM For Desktop

Authors: Kamalkumar Rathinasamy, Balaji A J, Ankush Kumar, Gagan Gayari, Harshini K, Rajab Ali Mondal, Sreenivasa Raghavan K S, Swayam Singh

Abstract: This paper presents NT-Java-1.1B, an open-source specialized code language model built on StarCoderBase-1.1B, designed for coding tasks in Java programming. NT-Java-1.1B achieves state-of-the-art performance, surpassing its base model and majority of other models of similar size on MultiPL-E Java code benchmark. While there have been studies on extending large, generic pre-trained models to improv… ▽ More This paper presents NT-Java-1.1B, an open-source specialized code language model built on StarCoderBase-1.1B, designed for coding tasks in Java programming. NT-Java-1.1B achieves state-of-the-art performance, surpassing its base model and majority of other models of similar size on MultiPL-E Java code benchmark. While there have been studies on extending large, generic pre-trained models to improve proficiency in specific programming languages like Python, similar investigations on small code models for other programming languages are lacking. Large code models require specialized hardware like GPUs for inference, highlighting the need for research into building small code models that can be deployed on developer desktops. This paper addresses this research gap by focusing on the development of a small Java code model, NT-Java-1.1B, and its quantized versions, which performs comparably to open models around 1.1B on MultiPL-E Java code benchmarks, making them ideal for desktop deployment. This paper establishes the foundation for specialized models across languages and sizes for a family of NT Models. △ Less

Submitted 4 July, 2024; originally announced July 2024.

ACM Class: I.2.7

arXiv:2407.03216 [pdf, other]

Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers

Authors: Sanket Gandhi, Atul, Samanyu Mahajan, Vishal Sharma, Rushil Gupta, Arnab Kumar Mondal, Parag Singla

Abstract: Recent work has shown that object-centric representations can greatly help improve the accuracy of learning dynamics while also bringing interpretability. In this work, we take this idea one step further, ask the following question: "can learning disentangled representation further improve the accuracy of visual dynamics prediction in object-centric models?" While there has been some attempt to le… ▽ More Recent work has shown that object-centric representations can greatly help improve the accuracy of learning dynamics while also bringing interpretability. In this work, we take this idea one step further, ask the following question: "can learning disentangled representation further improve the accuracy of visual dynamics prediction in object-centric models?" While there has been some attempt to learn such disentangled representations for the case of static images \citep{nsb}, to the best of our knowledge, ours is the first work which tries to do this in a general setting for video, without making any specific assumptions about the kind of attributes that an object might have. The key building block of our architecture is the notion of a {\em block}, where several blocks together constitute an object. Each block is represented as a linear combination of a given number of learnable concept vectors, which is iteratively refined during the learning process. The blocks in our model are discovered in an unsupervised manner, by attending over object masks, in a style similar to discovery of slots \citep{slot_attention}, for learning a dense object-centric representation. We employ self-attention via transformers over the discovered blocks to predict the next state resulting in discovery of visual dynamics. We perform a series of experiments on several benchmark 2-D, and 3-D datasets demonstrating that our architecture (1) can discover semantically meaningful blocks (2) help improve accuracy of dynamics prediction compared to SOTA object-centric models (3) perform significantly better in OOD setting where the specific attribute combinations are not seen earlier during training. Our experiments highlight the importance discovery of disentangled representation for visual dynamics prediction. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2406.17437 [pdf, other]

Advancing Question Answering on Handwritten Documents: A State-of-the-Art Recognition-Based Model for HW-SQuAD

Authors: Aniket Pal, Ajoy Mondal, C. V. Jawahar

Abstract: Question-answering handwritten documents is a challenging task with numerous real-world applications. This paper proposes a novel recognition-based approach that improves upon the previous state-of-the-art on the HW-SQuAD and BenthamQA datasets. Our model incorporates transformer-based document retrieval and ensemble methods at the model level, achieving an Exact Match score of 82.02% and 69% in H… ▽ More Question-answering handwritten documents is a challenging task with numerous real-world applications. This paper proposes a novel recognition-based approach that improves upon the previous state-of-the-art on the HW-SQuAD and BenthamQA datasets. Our model incorporates transformer-based document retrieval and ensemble methods at the model level, achieving an Exact Match score of 82.02% and 69% in HW-SQuAD and BenthamQA datasets, respectively, surpassing the previous best recognition-based approach by 10.89% and 3%. We also enhance the document retrieval component, boosting the top-5 retrieval accuracy from 90% to 95.30%. Our results demonstrate the significance of our proposed approach in advancing question answering on handwritten documents. The code and trained models will be publicly available to facilitate future research in this critical area of natural language. △ Less

Submitted 15 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: 16 pages

arXiv:2406.00550 [pdf, other]

Demystifying Object-based Big Data Storage Systems

Authors: Anindita Sarkar Mondal, Madhupa Sanyal, Ari Kusumastuti, Hrishav Bakul Barua, Kartick Chandra Mondal

Abstract: Today's era is the digitized era. Managing such generated big data is an important factor for data scientists. Day by day, it increases the demand for big data storage systems. Different organizations are involved in providing storage-related services. They follow the different architectures or storage models for storing big data. In this survey paper, our target is to highlight such storage archi… ▽ More Today's era is the digitized era. Managing such generated big data is an important factor for data scientists. Day by day, it increases the demand for big data storage systems. Different organizations are involved in providing storage-related services. They follow the different architectures or storage models for storing big data. In this survey paper, our target is to highlight such storage architectures which provided by different renowned storage service providers. On an architectural basis, we divide the big data storage systems into five parts, Distributed file systems (DFS), Clustered File Systems (CFS), Cloud Storage, Archive Storage, and Object Storage Systems (OSS). Also, we reveal a detailed architectural view of the big data storage systems provided by the different organizations under these parts. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 32 Pages

arXiv:2406.00010 [pdf, other]

EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search

Authors: Kamalkumar Rathinasamy, Jayarama Nettar, Amit Kumar, Vishal Manchanda, Arun Vijayakumar, Ayush Kataria, Venkateshprasanna Manjunath, Chidambaram GS, Jaskirat Singh Sodhi, Shoeb Shaikh, Wasim Akhtar Khan, Prashant Singh, Tanishq Dattatray Ige, Vipin Tiwari, Rajab Ali Mondal, Harshini K, S Reka, Chetana Amancharla, Faiz ur Rahman, Harikrishnan P A, Indraneel Saha, Bhavya Tiwary, Navin Shankar Patel, Pradeep T S, Balaji A J , et al. (2 additional authors not shown)

Abstract: Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components.… ▽ More Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract relevant insights to address employee inquiries. These solutions often leverage pre-trained embedding models and generative models as foundational components. While pre-trained embeddings may exhibit proximity or disparity based on their original training objectives, they might not fully align with the unique characteristics of enterprise-specific data, leading to suboptimal alignment with the retrieval goals of enterprise environments. In this paper, we propose a methodology to fine-tune pre-trained embedding models specifically for enterprise environments. By adapting the embeddings to better suit the retrieval tasks prevalent in enterprises, we aim to enhance the performance of information retrieval solutions. We discuss the process of fine-tuning, its effect on retrieval accuracy, and the potential benefits for enterprise information management. Our findings demonstrate the efficacy of fine-tuned embedding models in improving the precision and relevance of search results in enterprise settings. △ Less

Submitted 18 May, 2024; originally announced June 2024.

ACM Class: I.2.7

arXiv:2405.14089 [pdf, other]

Improved Canonicalization for Model Agnostic Equivariance

Authors: Siba Smarak Panigrahi, Arnab Kumar Mondal

Abstract: This work introduces a novel approach to achieving architecture-agnostic equivariance in deep learning, particularly addressing the limitations of traditional equivariant architectures and the inefficiencies of the existing architecture-agnostic methods. Building equivariant models using traditional methods requires designing equivariant versions of existing models and training them from scratch,… ▽ More This work introduces a novel approach to achieving architecture-agnostic equivariance in deep learning, particularly addressing the limitations of traditional equivariant architectures and the inefficiencies of the existing architecture-agnostic methods. Building equivariant models using traditional methods requires designing equivariant versions of existing models and training them from scratch, a process that is both impractical and resource-intensive. Canonicalization has emerged as a promising alternative for inducing equivariance without altering model architecture, but it suffers from the need for highly expressive and expensive equivariant networks to learn canonical orientations accurately. We propose a new method that employs any non-equivariant network for canonicalization. Our method uses contrastive learning to efficiently learn a unique canonical orientation and offers more flexibility for the choice of canonicalization network. We empirically demonstrate that this approach outperforms existing methods in achieving equivariance for large pretrained models and significantly speeds up the canonicalization process, making it up to 2 times faster. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Accepted to EquiVision workshop, CVPR 2024. 7 pages, 1 figure

arXiv:2405.02523 [pdf, other]

Optimal Toffoli-Depth Quantum Adder

Authors: Siyi Wang, Suman Deb, Ankit Mondal, Anupam Chattopadhyay

Abstract: Efficient quantum arithmetic circuits are commonly found in numerous quantum algorithms of practical significance. Till date, the logarithmic-depth quantum adders includes a constant coefficient k >= 2 while achieving the Toffoli-Depth of klog n + O(1). In this work, 160 alternative compositions of the carry-propagation structure are comprehensively explored to determine the optimal depth structur… ▽ More Efficient quantum arithmetic circuits are commonly found in numerous quantum algorithms of practical significance. Till date, the logarithmic-depth quantum adders includes a constant coefficient k >= 2 while achieving the Toffoli-Depth of klog n + O(1). In this work, 160 alternative compositions of the carry-propagation structure are comprehensively explored to determine the optimal depth structure for a quantum adder. By extensively studying these structures, it is shown that an exact Toffoli-Depth of log n + O(1) is achievable. This presents a reduction of Toffoli-Depth by almost 50% compared to the best known quantum adder circuits presented till date. We demonstrate a further possible design by incorporating a different expansion of propagate and generate forms, as well as an extension of the modular framework. Our paper elaborates on these designs, supported by detailed theoretical analyses and simulation-based studies, firmly substantiating our claims of optimality. The results also mirror similar improvements, recently reported in classical adder circuit complexity. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: This paper is under review in ACM Transactions on Quantum Computing

arXiv:2404.10880 [pdf, other]

HumMUSS: Human Motion Understanding using State Space Models

Authors: Arnab Kumar Mondal, Stefano Alletto, Denis Tome

Abstract: Understanding human motion from video is essential for a range of applications, including pose estimation, mesh recovery and action recognition. While state-of-the-art methods predominantly rely on transformer-based architectures, these approaches have limitations in practical scenarios. Transformers are slower when sequentially predicting on a continuous stream of frames in real-time, and do not… ▽ More Understanding human motion from video is essential for a range of applications, including pose estimation, mesh recovery and action recognition. While state-of-the-art methods predominantly rely on transformer-based architectures, these approaches have limitations in practical scenarios. Transformers are slower when sequentially predicting on a continuous stream of frames in real-time, and do not generalize to new frame rates. In light of these constraints, we propose a novel attention-free spatiotemporal model for human motion understanding building upon recent advancements in state space models. Our model not only matches the performance of transformer-based models in various motion understanding tasks but also brings added benefits like adaptability to different video frame rates and enhanced training speed when working with longer sequence of keypoints. Moreover, the proposed model supports both offline and real-time applications. For real-time sequential prediction, our model is both memory efficient and several times faster than transformer-based approaches while maintaining their high accuracy. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: CVPR 24

arXiv:2404.04642 [pdf]

Power-Efficient Image Storage: Leveraging Super Resolution Generative Adversarial Network for Sustainable Compression and Reduced Carbon Footprint

Authors: Ashok Mondal, Satyam Singh

Abstract: In recent years, large-scale adoption of cloud storage solutions has revolutionized the way we think about digital data storage. However, the exponential increase in data volume, especially images, has raised environmental concerns regarding power and resource consumption, as well as the rising digital carbon footprint emissions. The aim of this research is to propose a methodology for cloud-based… ▽ More In recent years, large-scale adoption of cloud storage solutions has revolutionized the way we think about digital data storage. However, the exponential increase in data volume, especially images, has raised environmental concerns regarding power and resource consumption, as well as the rising digital carbon footprint emissions. The aim of this research is to propose a methodology for cloud-based image storage by integrating image compression technology with SuperResolution Generative Adversarial Networks (SRGAN). Rather than storing images in their original format directly on the cloud, our approach involves initially reducing the image size through compression and downsizing techniques before storage. Upon request, these compressed images will be retrieved and processed by SRGAN to generate images. The efficacy of the proposed method is evaluated in terms of PSNR and SSIM metrics. Additionally, a mathematical analysis is given to calculate power consumption and carbon footprint assesment. The proposed data compression technique provides a significant solution to achieve a reasonable trade off between environmental sustainability and industrial efficiency. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 5 pages, 5 figures

MSC Class: 68T07 ACM Class: I.2.m; H.3.2

arXiv:2403.08007 [pdf, other]

doi 10.1007/978-3-031-41498-5_17

IndicSTR12: A Dataset for Indic Scene Text Recognition

Authors: Harsh Lunia, Ajoy Mondal, C V Jawahar

Abstract: The importance of Scene Text Recognition (STR) in today's increasingly digital world cannot be overstated. Given the significance of STR, data intensive deep learning approaches that auto-learn feature mappings have primarily driven the development of STR solutions. Several benchmark datasets and substantial work on deep learning models are available for Latin languages to meet this need. On more… ▽ More The importance of Scene Text Recognition (STR) in today's increasingly digital world cannot be overstated. Given the significance of STR, data intensive deep learning approaches that auto-learn feature mappings have primarily driven the development of STR solutions. Several benchmark datasets and substantial work on deep learning models are available for Latin languages to meet this need. On more complex, syntactically and semantically, Indian languages spoken and read by 1.3 billion people, there is less work and datasets available. This paper aims to address the Indian space's lack of a comprehensive dataset by proposing the largest and most comprehensive real dataset - IndicSTR12 - and benchmarking STR performance on 12 major Indian languages. A few works have addressed the same issue, but to the best of our knowledge, they focused on a small number of Indian languages. The size and complexity of the proposed dataset are comparable to those of existing Latin contemporaries, while its multilingualism will catalyse the development of robust text detection and recognition models. It was created specifically for a group of related languages with different scripts. The dataset contains over 27000 word-images gathered from various natural scenes, with over 1000 word-images for each language. Unlike previous datasets, the images cover a broader range of realistic conditions, including blur, illumination changes, occlusion, non-iconic texts, low resolution, perspective text etc. Along with the new dataset, we provide a high-performing baseline on three models - PARSeq, CRNN, and STARNet. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Journal ref: ICDAR 2023 Workshops. Lecture Notes in Computer Science, vol 14193. Springer, Cham (2023)

arXiv:2403.05435 [pdf, other]

OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

Authors: Anindya Mondal, Sauradip Nag, Xiatian Zhu, Anjan Dutta

Abstract: Object counting is pivotal for understanding the composition of scenes. Previously, this task was dominated by class-specific methods, which have gradually evolved into more adaptable class-agnostic strategies. However, these strategies come with their own set of limitations, such as the need for manual exemplar input and multiple passes for multiple categories, resulting in significant inefficien… ▽ More Object counting is pivotal for understanding the composition of scenes. Previously, this task was dominated by class-specific methods, which have gradually evolved into more adaptable class-agnostic strategies. However, these strategies come with their own set of limitations, such as the need for manual exemplar input and multiple passes for multiple categories, resulting in significant inefficiencies. This paper introduces a more practical approach enabling simultaneous counting of multiple object categories using an open-vocabulary framework. Our solution, OmniCount, stands out by using semantic and geometric insights (priors) from pre-trained models to count multiple categories of objects as specified by users, all without additional training. OmniCount distinguishes itself by generating precise object masks and leveraging varied interactive prompts via the Segment Anything Model for efficient counting. To evaluate OmniCount, we created the OmniCount-191 benchmark, a first-of-its-kind dataset with multi-label object counts, including points, bounding boxes, and VQA annotations. Our comprehensive evaluation in OmniCount-191, alongside other leading benchmarks, demonstrates OmniCount's exceptional performance, significantly outpacing existing solutions. △ Less

Submitted 20 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04178 [pdf, other]

Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation

Authors: Sai Akarsh, Vamshi Raghusimha, Anindita Mondal, Anil Vuppala

Abstract: The language diversity in India's education sector poses a significant challenge, hindering inclusivity. Despite the democratization of knowledge through online educational content, the dominance of English, as the internet's lingua franca, limits accessibility, emphasizing the crucial need for translation into Indian languages. Despite existing Speech-to-Speech Machine Translation (SSMT) technolo… ▽ More The language diversity in India's education sector poses a significant challenge, hindering inclusivity. Despite the democratization of knowledge through online educational content, the dominance of English, as the internet's lingua franca, limits accessibility, emphasizing the crucial need for translation into Indian languages. Despite existing Speech-to-Speech Machine Translation (SSMT) technologies, the lack of intonation in these systems gives monotonous translations, leading to a loss of audience interest and disengagement from the content. To address this, our paper introduces a dataset with stress annotations in Indian English and also a Text-to-Speech (TTS) architecture capable of incorporating stress into synthesized speech. This dataset is used for training a stress detection model, which is then used in the SSMT system for detecting stress in the source speech and transferring it into the target language speech. The TTS architecture is based on FastPitch and can modify the variances based on stressed words given. We present an Indian English-to-Hindi SSMT system that can transfer stress and aim to enhance the overall quality and engagement of educational content. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.02957

Multi-Agent Reinforcement Learning for Offloading Cellular Communications with Cooperating UAVs

Authors: Abhishek Mondal, Deepak Mishra, Ganesh Prasad, George C. Alexandropoulos, Azzam Alnahari, Riku Jantti

Abstract: Effective solutions for intelligent data collection in terrestrial cellular networks are crucial, especially in the context of Internet of Things applications. The limited spectrum and coverage area of terrestrial base stations pose challenges in meeting the escalating data rate demands of network users. Unmanned aerial vehicles, known for their high agility, mobility, and flexibility, present an… ▽ More Effective solutions for intelligent data collection in terrestrial cellular networks are crucial, especially in the context of Internet of Things applications. The limited spectrum and coverage area of terrestrial base stations pose challenges in meeting the escalating data rate demands of network users. Unmanned aerial vehicles, known for their high agility, mobility, and flexibility, present an alternative means to offload data traffic from terrestrial BSs, serving as additional access points. This paper introduces a novel approach to efficiently maximize the utilization of multiple UAVs for data traffic offloading from terrestrial BSs. Specifically, the focus is on maximizing user association with UAVs by jointly optimizing UAV trajectories and users association indicators under quality of service constraints. Since, the formulated UAVs control problem is nonconvex and combinatorial, this study leverages the multi agent reinforcement learning framework. In this framework, each UAV acts as an independent agent, aiming to maintain inter UAV cooperative behavior. The proposed approach utilizes the finite state Markov decision process to account for UAVs velocity constraints and the relationship between their trajectories and state space. A low complexity distributed state action reward state action algorithm is presented to determine UAVs optimal sequential decision making policies over training episodes. The extensive simulation results validate the proposed analysis and offer valuable insights into the optimal UAV trajectories. The derived trajectories demonstrate superior average UAV association performance compared to benchmark techniques such as Q learning and particle swarm optimization. △ Less

Submitted 31 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Significant modification required to get novel results

arXiv:2402.02036 [pdf, other]

Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks

Authors: Zhuomin Chen, Jiaxing Zhang, Jingchao Ni, Xiaoting Li, Yuchen Bian, Md Mezbahul Islam, Ananda Mohan Mondal, Hua Wei, Dongsheng Luo

Abstract: Graph Neural Networks (GNNs) have become a building block in graph data processing, with wide applications in critical domains. The growing needs to deploy GNNs in high-stakes applications necessitate explainability for users in the decision-making processes. A popular paradigm for the explainability of GNNs is to identify explainable subgraphs by comparing their labels with the ones of original g… ▽ More Graph Neural Networks (GNNs) have become a building block in graph data processing, with wide applications in critical domains. The growing needs to deploy GNNs in high-stakes applications necessitate explainability for users in the decision-making processes. A popular paradigm for the explainability of GNNs is to identify explainable subgraphs by comparing their labels with the ones of original graphs. This task is challenging due to the substantial distributional shift from the original graphs in the training set to the set of explainable subgraphs, which prevents accurate prediction of labels with the subgraphs. To address it, in this paper, we propose a novel method that generates proxy graphs for explainable subgraphs that are in the distribution of training data. We introduce a parametric method that employs graph generators to produce proxy graphs. A new training objective based on information theory is designed to ensure that proxy graphs not only adhere to the distribution of training data but also preserve explanatory factors. Such generated proxy graphs can be reliably used to approximate the predictions of the labels of explainable subgraphs. Empirical evaluations across various datasets demonstrate our method achieves more accurate explanations for GNNs. △ Less

Submitted 29 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: Accepted to International Conference on Machine Learning (ICML 2024)

arXiv:2310.12723 [pdf, other]

Tight Short-Lived Signatures

Authors: Arup Mondal, Ruthu Hulikal Rooparaghunath, Debayan Gupta

Abstract: A Time-lock puzzle (TLP) sends information into the future: a predetermined number of sequential computations must occur (i.e., a predetermined amount of time must pass) to retrieve the information, regardless of parallelization. Buoyed by the excitement around secure decentralized applications and cryptocurrencies, the last decade has witnessed numerous constructions of TLP variants and related a… ▽ More A Time-lock puzzle (TLP) sends information into the future: a predetermined number of sequential computations must occur (i.e., a predetermined amount of time must pass) to retrieve the information, regardless of parallelization. Buoyed by the excitement around secure decentralized applications and cryptocurrencies, the last decade has witnessed numerous constructions of TLP variants and related applications (e.g., cost-efficient blockchain designs, randomness beacons, e-voting, etc.). In this poster, we first extend the notion of TLP by formally defining the "time-lock public key encryption" (TLPKE) scheme. Next, we introduce and construct a "tight short-lived signatures" scheme using our TLPKE. Furthermore, to test the validity of our proposed schemes, we do a proof-of-concept implementation and run detailed simulations. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.12693 [pdf, ps, other]

RANDGENER: Distributed Randomness Beacon from Verifiable Delay Function

Authors: Arup Mondal, Ruthu Hulikal Rooparaghunath, Debayan Gupta

Abstract: Buoyed by the excitement around secure decentralized applications, the last few decades have seen numerous constructions of distributed randomness beacons (DRB) along with use cases; however, a secure DRB (in many variations) remains an open problem. We further note that it is natural to want some kind of reward for participants who spend time and energy evaluating the randomness beacon value -- t… ▽ More Buoyed by the excitement around secure decentralized applications, the last few decades have seen numerous constructions of distributed randomness beacons (DRB) along with use cases; however, a secure DRB (in many variations) remains an open problem. We further note that it is natural to want some kind of reward for participants who spend time and energy evaluating the randomness beacon value -- this is already common in distributed protocols. In this work, we present RandGener, a novel $n$-party commit-reveal-recover (or collaborative) DRB protocol with a novel reward and penalty mechanism along with a set of realistic guarantees. We design our protocol using trapdoor watermarkable verifiable delay functions in the RSA group setting (without requiring a trusted dealer or distributed key generation). △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.01647 [pdf, other]

Equivariant Adaptation of Large Pretrained Models

Authors: Arnab Kumar Mondal, Siba Smarak Panigrahi, Sékou-Oumar Kaba, Sai Rajeswar, Siamak Ravanbakhsh

Abstract: Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to higher sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during… ▽ More Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to higher sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during both training and inference. A recently proposed alternative towards equivariance that removes the architectural constraints is to use a simple canonicalization network that transforms the input to a canonical form before feeding it to an unconstrained prediction network. We show here that this approach can effectively be used to make a large pretrained network equivariant. However, we observe that the produced canonical orientations can be misaligned with those of the training distribution, hindering performance. Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance. This significantly improves the robustness of these models to deterministic transformations of the data, such as rotations. We believe this equivariant adaptation of large pretrained models can help their domain-specific applications with known symmetry priors. △ Less

Submitted 29 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: 17 pages, 6 figures. Accepted to NeurIPS 2023

arXiv:2307.10763 [pdf, other]

doi 10.1109/ICCVW60793.2023.00086

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

Authors: Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta

Abstract: Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecti… ▽ More Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code is made available at https://github.com/mondalanindya/MSQNet. △ Less

Submitted 10 January, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Published at the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France

arXiv:2306.11941 [pdf, other]

Efficient Dynamics Modeling in Interactive Environments with Koopman Theory

Authors: Arnab Kumar Mondal, Siba Smarak Panigrahi, Sai Rajeswar, Kaleem Siddiqi, Siamak Ravanbakhsh

Abstract: The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could advance Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlin… ▽ More The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could advance Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results. △ Less

Submitted 12 May, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Accepted to ICLR 2024 and EWRL 2023

arXiv:2305.14410 [pdf, other]

Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach

Authors: Harman Singh, Poorva Garg, Mohit Gupta, Kevin Shah, Ashish Goswami, Satyam Modi, Arnab Kumar Mondal, Dinesh Khandelwal, Dinesh Garg, Parag Singla

Abstract: We are interested in image manipulation via natural language text -- a task that is useful for multiple AI applications but requires complex reasoning over multi-modal spaces. We extend recently proposed Neuro Symbolic Concept Learning (NSCL), which has been quite effective for the task of Visual Question Answering (VQA), for the task of image manipulation. Our system referred to as NeuroSIM can p… ▽ More We are interested in image manipulation via natural language text -- a task that is useful for multiple AI applications but requires complex reasoning over multi-modal spaces. We extend recently proposed Neuro Symbolic Concept Learning (NSCL), which has been quite effective for the task of Visual Question Answering (VQA), for the task of image manipulation. Our system referred to as NeuroSIM can perform complex multi-hop reasoning over multi-object scenes and only requires weak supervision in the form of annotated data for VQA. NeuroSIM parses an instruction into a symbolic program, based on a Domain Specific Language (DSL) comprising of object attributes and manipulation operations, that guides its execution. We create a new dataset for the task, and extensive experiments demonstrate that NeuroSIM is highly competitive with or beats SOTA baselines that make use of supervised data for manipulation. △ Less

Submitted 24 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 (long paper, main conference)

arXiv:2304.10951 [pdf, ps, other]

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

Authors: Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

Abstract: We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the… ▽ More We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms to find an $ε$-SOSP is $O(ε^{-3.5})$, which is an improvement over the state-of-the-art sample complexity of $O(ε^{-4.5})$. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2302.11313 [pdf, other]

doi 10.1109/ICASSP49357.2023.10096168

Time-varying Signals Recovery via Graph Neural Networks

Authors: Jhon A. Castro-Correa, Jhony H. Giraldo, Anindya Mondal, Mohsen Badiey, Thierry Bouwmans, Fragkiskos D. Malliaros

Abstract: The recovery of time-varying graph signals is a fundamental problem with numerous applications in sensor networks and forecasting in time series. Effectively capturing the spatio-temporal information in these signals is essential for the downstream tasks. Previous studies have used the smoothness of the temporal differences of such graph signals as an initial assumption. Nevertheless, this smoothn… ▽ More The recovery of time-varying graph signals is a fundamental problem with numerous applications in sensor networks and forecasting in time series. Effectively capturing the spatio-temporal information in these signals is essential for the downstream tasks. Previous studies have used the smoothness of the temporal differences of such graph signals as an initial assumption. Nevertheless, this smoothness assumption could result in a degradation of performance in the corresponding application when the prior does not hold. In this work, we relax the requirement of this hypothesis by including a learning module. We propose a Time Graph Neural Network (TimeGNN) for the recovery of time-varying graph signals. Our algorithm uses an encoder-decoder architecture with a specialized loss composed of a mean squared error function and a Sobolev smoothness operator.TimeGNN shows competitive performance against previous methods in real datasets. △ Less

Submitted 12 August, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, Greece

arXiv:2302.06004 [pdf, other]

Improving UE Energy Efficiency through Network-aware Video Streaming over 5G

Authors: Basabdatta Palit, Argha Sen, Abhijit Mondal, Ayan Zunaid, Jay Jayatheerthan, Sandip Chakraborty

Abstract: Adaptive Bitrate (ABR) Streaming over the cellular networks has been well studied in the literature; however, existing ABR algorithms primarily focus on improving the end-users' Quality of Experience (QoE) while ignoring the resource consumption aspect of the underlying device. Consequently, proactive attempts to download video data to maintain the user's QoE often impact the battery life of the u… ▽ More Adaptive Bitrate (ABR) Streaming over the cellular networks has been well studied in the literature; however, existing ABR algorithms primarily focus on improving the end-users' Quality of Experience (QoE) while ignoring the resource consumption aspect of the underlying device. Consequently, proactive attempts to download video data to maintain the user's QoE often impact the battery life of the underlying device unless the download attempts are synchronized with the network's channel condition. In this work, we develop EnDASH-5G -- a wrapper over the popular DASH-based ABR streaming algorithm, which establishes this synchronization by utilizing a network-aware video data download mechanism. EnDASH-5G utilizes a novel throughput prediction mechanism for 5G mmWave networks by upgrading the existing throughput prediction models with a transfer learning-based approach leveraging publicly available 5G datasets. It then exploits deep reinforcement learning to dynamically decide the playback buffer length and the video bitrate using the predicted throughput. This ensures that the data download attempts get synchronized with the underlying network condition, thus saving the device's battery power. From a thorough evaluation of EnDASH-5G, we observe that it achieves a near $30.5\%$ decrease in the maximum energy consumption than the state-of-the-art Pensieve ABR algorithm while performing almost at par in QoE. △ Less

Submitted 12 February, 2023; originally announced February 2023.

Comments: 14 pages, 11 figures, journal

arXiv:2212.08834 [pdf, other]

Towards Robust Handwritten Text Recognition with On-the-fly User Participation

Authors: Ajoy Mondal, Rohit saluja, C. V. Jawahar

Abstract: Long-term OCR services aim to provide high-quality output to their users at competitive costs. It is essential to upgrade the models because of the complex data loaded by the users. The service providers encourage the users who provide data where the OCR model fails by rewarding them based on data complexity, readability, and available budget. Hitherto, the OCR works include preparing the models o… ▽ More Long-term OCR services aim to provide high-quality output to their users at competitive costs. It is essential to upgrade the models because of the complex data loaded by the users. The service providers encourage the users who provide data where the OCR model fails by rewarding them based on data complexity, readability, and available budget. Hitherto, the OCR works include preparing the models on standard datasets without considering the end-users. We propose a strategy of consistently upgrading an existing Handwritten Hindi OCR model three times on the dataset of 15 users. We fix the budget of 4 users for each iteration. For the first iteration, the model directly trains on the dataset from the first four users. For the rest iteration, all remaining users write a page each, which service providers later analyze to select the 4 (new) best users based on the quality of predictions on the human-readable words. Selected users write 23 more pages for upgrading the model. We upgrade the model with Curriculum Learning (CL) on the data available in the current iteration and compare the subset from previous iterations. The upgraded model is tested on a held-out set of one page each from all 23 users. We provide insights into our investigations on the effect of CL, user selection, and especially the data from unseen writing styles. Our work can be used for long-term OCR services in crowd-sourcing scenarios for the service providers and end users. △ Less

Submitted 17 December, 2022; originally announced December 2022.

arXiv:2212.07776 [pdf, other]

Enhancing Indic Handwritten Text Recognition Using Global Semantic Information

Authors: Ajoy Mondal, C. V. Jawahar

Abstract: Handwritten Text Recognition (HTR) is more interesting and challenging than printed text due to uneven variations in the handwriting style of the writers, content, and time. HTR becomes more challenging for the Indic languages because of (i) multiple characters combined to form conjuncts which increase the number of characters of respective languages, and (ii) near to 100 unique basic Unicode char… ▽ More Handwritten Text Recognition (HTR) is more interesting and challenging than printed text due to uneven variations in the handwriting style of the writers, content, and time. HTR becomes more challenging for the Indic languages because of (i) multiple characters combined to form conjuncts which increase the number of characters of respective languages, and (ii) near to 100 unique basic Unicode characters in each Indic script. Recently, many recognition methods based on the encoder-decoder framework have been proposed to handle such problems. They still face many challenges, such as image blur and incomplete characters due to varying writing styles and ink density. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we enhance the performance of Indic handwritten text recognizers using global semantic information. We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts. The semantic information is used in both the encoder for supervision and the decoder for initialization. The semantic information is predicted from the word embedding of a pre-trained language model. Extensive experiments demonstrate that the proposed framework achieves state-of-the-art results on handwritten texts of ten Indic languages. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2211.11583 [pdf, other]

Recommending Related Products Using Graph Neural Networks in Directed Graphs

Authors: Srinivas Virinchi, Anoop Saladi, Abhirup Mondal

Abstract: Related product recommendation (RPR) is pivotal to the success of any e-commerce service. In this paper, we deal with the problem of recommending related products i.e., given a query product, we would like to suggest top-k products that have high likelihood to be bought together with it. Our problem implicitly assumes asymmetry i.e., for a phone, we would like to recommend a suitable phone case, b… ▽ More Related product recommendation (RPR) is pivotal to the success of any e-commerce service. In this paper, we deal with the problem of recommending related products i.e., given a query product, we would like to suggest top-k products that have high likelihood to be bought together with it. Our problem implicitly assumes asymmetry i.e., for a phone, we would like to recommend a suitable phone case, but for a phone case, it may not be apt to recommend a phone because customers typically would purchase a phone case only while owning a phone. We also do not limit ourselves to complementary or substitute product recommendation. For example, for a specific night wear t-shirt, we can suggest similar t-shirts as well as track pants. So, the notion of relatedness is subjective to the query product and dependent on customer preferences. Further, various factors such as product price, availability lead to presence of selection bias in the historical purchase data, that needs to be controlled for while training related product recommendations model. These challenges are orthogonal to each other deeming our problem nontrivial. To address these, we propose DAEMON, a novel Graph Neural Network (GNN) based framework for related product recommendation, wherein the problem is formulated as a node recommendation task on a directed product graph. In order to capture product asymmetry, we employ an asymmetric loss function and learn dual embeddings for each product, by appropriately aggregating features from its neighborhood. DAEMON leverages multi-modal data sources such as catalog metadata, browse behavioral logs to mitigate selection bias and generate recommendations for cold-start products. Extensive offline experiments show that DAEMON outperforms state-of-the-art baselines by 30-160% in terms of HitRate and MRR for the node recommendation task. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: This work was accepted in ECML PKDD 2022

arXiv:2211.06489 [pdf, other]

Equivariance with Learned Canonicalization Functions

Authors: Sékou-Oumar Kaba, Arnab Kumar Mondal, Yan Zhang, Yoshua Bengio, Siamak Ravanbakhsh

Abstract: Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data. These canonicalization functions can readily be plugged into non-equivariant backbone architectures. We offer exp… ▽ More Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data. These canonicalization functions can readily be plugged into non-equivariant backbone architectures. We offer explicit ways to implement them for some groups of interest. We show that this approach enjoys universality while providing interpretable insights. Our main hypothesis, supported by our empirical results, is that learning a small neural network to perform canonicalization is better than using predefined heuristics. Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks, including image classification, $N$-body dynamics prediction, point cloud classification and part segmentation, while being faster across the board. △ Less

Submitted 7 July, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 21 pages, 5 figures

arXiv:2208.08697 [pdf, other]

Resisting Adversarial Attacks in Deep Neural Networks using Diverse Decision Boundaries

Authors: Manaar Alam, Shubhajit Datta, Debdeep Mukhopadhyay, Arijit Mondal, Partha Pratim Chakrabarti

Abstract: The security of deep learning (DL) systems is an extremely important field of study as they are being deployed in several applications due to their ever-improving performance to solve challenging tasks. Despite overwhelming promises, the deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify. Protecti… ▽ More The security of deep learning (DL) systems is an extremely important field of study as they are being deployed in several applications due to their ever-improving performance to solve challenging tasks. Despite overwhelming promises, the deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify. Protections against adversarial perturbations on ensemble-based techniques have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation. In this paper, we attempt to develop a new ensemble-based solution that constructs defender models with diverse decision boundaries with respect to the original model. The ensemble of classifiers constructed by (1) transformation of the input by a method called Split-and-Shuffle, and (2) restricting the significant features by a method called Contrast-Significant-Features are shown to result in diverse gradients with respect to adversarial attacks, which reduces the chance of transferring adversarial examples from the original to the defender model targeting the same class. We present extensive experimentations using standard image classification datasets, namely MNIST, CIFAR-10 and CIFAR-100 against state-of-the-art adversarial attacks to demonstrate the robustness of the proposed ensemble-based defense. We also evaluate the robustness in the presence of a stronger adversary targeting all the models within the ensemble simultaneously. Results for the overall false positives and false negatives have been furnished to estimate the overall performance of the proposed methodology. △ Less

Submitted 18 August, 2022; originally announced August 2022.

arXiv:2208.00290 [pdf, ps, other]

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

Authors: Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

Abstract: In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of… ▽ More In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex, and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtains the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an epsilon-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that the performance of our algorithm outperforms GSF, SPSA, and RDSA by a significant margin over a few non-convex settings and further validate its performance over convex (noisy) objectives. △ Less

Submitted 30 June, 2023; v1 submitted 30 July, 2022; originally announced August 2022.

arXiv:2205.01904 [pdf, other]

doi 10.1109/LSENS.2022.3206307

ImAiR: Airwriting Recognition framework using Image Representation of IMU Signals

Authors: Ayush Tripathi, Arnab Kumar Mondal, Lalan Kumar, Prathosh A. P

Abstract: The problem of Airwriting Recognition is focused on identifying letters written by movement of finger in free space. It is a type of gesture recognition where the dictionary corresponds to letters in a specific language. In particular, airwriting recognition using sensor data from wrist-worn devices can be used as a medium of user input for applications in Human-Computer Interaction (HCI). Recogni… ▽ More The problem of Airwriting Recognition is focused on identifying letters written by movement of finger in free space. It is a type of gesture recognition where the dictionary corresponds to letters in a specific language. In particular, airwriting recognition using sensor data from wrist-worn devices can be used as a medium of user input for applications in Human-Computer Interaction (HCI). Recognition of in-air trajectories using such wrist-worn devices is limited in literature and forms the basis of the current work. In this paper, we propose an airwriting recognition framework by first encoding the time-series data obtained from a wearable Inertial Measurement Unit (IMU) on the wrist as images and then utilizing deep learning-based models for identifying the written alphabets. The signals recorded from 3-axis accelerometer and gyroscope in IMU are encoded as images using different techniques such as Self Similarity Matrix (SSM), Gramian Angular Field (GAF) and Markov Transition Field (MTF) to form two sets of 3-channel images. These are then fed to two separate classification models and letter prediction is made based on an average of the class conditional probabilities obtained from the two models. Several standard model architectures for image classification such as variants of ResNet, DenseNet, VGGNet, AlexNet and GoogleNet have been utilized. Experiments performed on two publicly available datasets demonstrate the efficacy of the proposed strategy. The code for our implementation will be made available at https://github.com/ayushayt/ImAiR. △ Less

Submitted 8 September, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

arXiv:2203.11977 [pdf, other]

YouTube over Google's QUIC vs Internet Middleboxes: A Tug of War between Protocol Sustainability and Application QoE

Authors: Sapna Chaudhary, Prince Sachdeva, Abhijit Mondal, Sandip Chakraborty, Mukulika Maity

Abstract: Middleboxes such as web proxies, firewalls, etc. are widely deployed in today's network infrastructure. As a result, most protocols need to adapt their behavior to co-exist. One of the most commonly used transport protocols, QUIC, adapts to such middleboxes by falling back to TCP, where they block it. In this paper, we argue that the blind fallback behavior of QUIC, i.e., not distinguishing betwee… ▽ More Middleboxes such as web proxies, firewalls, etc. are widely deployed in today's network infrastructure. As a result, most protocols need to adapt their behavior to co-exist. One of the most commonly used transport protocols, QUIC, adapts to such middleboxes by falling back to TCP, where they block it. In this paper, we argue that the blind fallback behavior of QUIC, i.e., not distinguishing between failures caused by middleboxes and that caused by network congestion, hugely impacts the performance of QUIC. For this, we focus on YouTube video streaming and conduct a measurement study by utilizing production endpoints of YouTube by enabling TCP and QUIC at a time. In total, we collect over 2600 streaming hours of data over various bandwidth patterns, from 5 different geographical locations and various video genres. To our surprise, we observe that the legacy setup (TCP) either outperforms or performs the same as the QUIC-enabled browser for more than 60% of cases. We see that our observation is consistent across individual QoE parameters, bandwidth patterns, locations, and videos. Next, we conduct a deep-dive analysis to discover the root cause behind such behavior. We find a good correlation (0.3-0.7) between fallback and QoE drop events, i.e., quality drop and re-buffering or stalling. We further perform Granger causal analysis and find that fallback Granger causes either quality drop or stalling for 70% of the QUIC-enabled sessions. We believe our study will help designers revisit the decision to enable fallback in QUIC and distinguish between the packet drops caused by middleboxes and network congestion. △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.00418 [pdf, other]

doi 10.23919/EUSIPCO55093.2022.9909940

Recovery of Missing Sensor Data by Reconstructing Time-varying Graph Signals

Authors: Anindya Mondal, Mayukhmali Das, Aditi Chatterjee, Palaniandavar Venkateswaran

Abstract: Wireless sensor networks are among the most promising technologies of the current era because of their small size, lower cost, and ease of deployment. With the increasing number of wireless sensors, the probability of generating missing data also rises. This incomplete data could lead to disastrous consequences if used for decision-making. There is rich literature dealing with this problem. Howeve… ▽ More Wireless sensor networks are among the most promising technologies of the current era because of their small size, lower cost, and ease of deployment. With the increasing number of wireless sensors, the probability of generating missing data also rises. This incomplete data could lead to disastrous consequences if used for decision-making. There is rich literature dealing with this problem. However, most approaches show performance degradation when a sizable amount of data is lost. Inspired by the emerging field of graph signal processing, this paper performs a new study of a Sobolev reconstruction algorithm in wireless sensor networks. Experimental comparisons on several publicly available datasets demonstrate that the algorithm surpasses multiple state-of-the-art techniques by a maximum margin of 54%. We further show that this algorithm consistently retrieves the missing data even during massive data loss situations. △ Less

Submitted 23 December, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: Five pages, two figures, 2022 30th European Signal Processing Conference (EUSIPCO). Published version available at: https://ieeexplore.ieee.org/document/9909940

arXiv:2202.10930 [pdf, other]

Transformation Coding: Simple Objectives for Equivariant Representations

Authors: Mehran Shakerinava, Arnab Kumar Mondal, Siamak Ravanbakhsh

Abstract: We present a simple non-generative approach to deep representation learning that seeks equivariant deep embedding through simple objectives. In contrast to existing equivariant networks, our transformation coding approach does not constrain the choice of the feed-forward layer or the architecture and allows for an unknown group action on the input space. We introduce several such transformation co… ▽ More We present a simple non-generative approach to deep representation learning that seeks equivariant deep embedding through simple objectives. In contrast to existing equivariant networks, our transformation coding approach does not constrain the choice of the feed-forward layer or the architecture and allows for an unknown group action on the input space. We introduce several such transformation coding objectives for different Lie groups such as the Euclidean, Orthogonal and the Unitary groups. When using product groups, the representation is decomposed and disentangled. We show that the presence of additional information on different transformations improves disentanglement in transformation coding. We evaluate the representations learnt by transformation coding both qualitatively and quantitatively on downstream tasks, including reinforcement learning. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2202.05808 [pdf, other]

Investigating Power laws in Deep Representation Learning

Authors: Arna Ghosh, Arnab Kumar Mondal, Kumar Krishna Agrawal, Blake Richards

Abstract: Representation learning that leverages large-scale labelled datasets, is central to recent progress in machine learning. Access to task relevant labels at scale is often scarce or expensive, motivating the need to learn from unlabelled datasets with self-supervised learning (SSL). Such large unlabelled datasets (with data augmentations) often provide a good coverage of the underlying input distrib… ▽ More Representation learning that leverages large-scale labelled datasets, is central to recent progress in machine learning. Access to task relevant labels at scale is often scarce or expensive, motivating the need to learn from unlabelled datasets with self-supervised learning (SSL). Such large unlabelled datasets (with data augmentations) often provide a good coverage of the underlying input distribution. However evaluating the representations learned by SSL algorithms still requires task-specific labelled samples in the training pipeline. Additionally, the generalization of task-specific encoding is often sensitive to potential distribution shift. Inspired by recent advances in theoretical machine learning and vision neuroscience, we observe that the eigenspectrum of the empirical feature covariance matrix often follows a power law. For visual representations, we estimate the coefficient of the power law, $α$, across three key attributes which influence representation learning: learning objective (supervised, SimCLR, Barlow Twins and BYOL), network architecture (VGG, ResNet and Vision Transformer), and tasks (object and scene recognition). We observe that under mild conditions, proximity of $α$ to 1, is strongly correlated to the downstream generalization performance. Furthermore, $α\approx 1$ is a strong indicator of robustness to label noise during fine-tuning. Notably, $α$ is computable from the representations without knowledge of any labels, thereby offering a framework to evaluate the quality of representations in unlabelled datasets. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2202.02817 [pdf, other]

BEAS: Blockchain Enabled Asynchronous & Secure Federated Machine Learning

Authors: Arup Mondal, Harpreet Virk, Debayan Gupta

Abstract: Federated Learning (FL) enables multiple parties to distributively train a ML model without revealing their private datasets. However, it assumes trust in the centralized aggregator which stores and aggregates model updates. This makes it prone to gradient tampering and privacy leakage by a malicious aggregator. Malicious parties can also introduce backdoors into the joint model by poisoning the t… ▽ More Federated Learning (FL) enables multiple parties to distributively train a ML model without revealing their private datasets. However, it assumes trust in the centralized aggregator which stores and aggregates model updates. This makes it prone to gradient tampering and privacy leakage by a malicious aggregator. Malicious parties can also introduce backdoors into the joint model by poisoning the training data or model gradients. To address these issues, we present BEAS, the first blockchain-based framework for N-party FL that provides strict privacy guarantees of training data using gradient pruning (showing improved differential privacy compared to existing noise and clipping based techniques). Anomaly detection protocols are used to minimize the risk of data-poisoning attacks, along with gradient pruning that is further used to limit the efficacy of model-poisoning attacks. We also define a novel protocol to prevent premature convergence in heterogeneous learning environments. We perform extensive experiments on multiple datasets with promising results: BEAS successfully prevents privacy leakage from dataset reconstruction attacks, and minimizes the efficacy of poisoning attacks. Moreover, it achieves an accuracy similar to centralized frameworks, and its communication and computation overheads scale linearly with the number of participants. △ Less

Submitted 6 February, 2022; originally announced February 2022.

Comments: The Third AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-22) at the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

arXiv:2201.08574 [pdf, other]

Classroom Slide Narration System

Authors: Jobin K. V., Ajoy Mondal, C. V. Jawahar

Abstract: Slide presentations are an effective and efficient tool used by the teaching community for classroom communication. However, this teaching model can be challenging for blind and visually impaired (VI) students. The VI student required personal human assistance for understand the presented slide. This shortcoming motivates us to design a Classroom Slide Narration System (CSNS) that generates audio… ▽ More Slide presentations are an effective and efficient tool used by the teaching community for classroom communication. However, this teaching model can be challenging for blind and visually impaired (VI) students. The VI student required personal human assistance for understand the presented slide. This shortcoming motivates us to design a Classroom Slide Narration System (CSNS) that generates audio descriptions corresponding to the slide content. This problem poses as an image-to-markup language generation task. The initial step is to extract logical regions such as title, text, equation, figure, and table from the slide image. In the classroom slide images, the logical regions are distributed based on the location of the image. To utilize the location of the logical regions for slide image segmentation, we propose the architecture, Classroom Slide Segmentation Network (CSSN). The unique attributes of this architecture differs from most other semantic segmentation networks. Publicly available benchmark datasets such as WiSe and SPaSe are used to validate the performance of our segmentation architecture. We obtained 9.54 segmentation accuracy improvement in WiSe dataset. We extract content (information) from the slide using four well-established modules such as optical character recognition (OCR), figure classification, equation description, and table structure recognizer. With this information, we build a Classroom Slide Narration System (CSNS) to help VI students understand the slide content. The users have given better feedback on the quality output of the proposed CSNS in comparison to existing systems like Facebooks Automatic Alt-Text (AAT) and Tesseract. △ Less

Submitted 21 January, 2022; originally announced January 2022.

Journal ref: CVIP 2021

arXiv:2201.07730 [pdf, other]

SCOTCH: An Efficient Secure Computation Framework for Secure Aggregation

Authors: Yash More, Prashanthi Ramachandran, Priyam Panda, Arup Mondal, Harpreet Virk, Debayan Gupta

Abstract: Federated learning enables multiple data owners to jointly train a machine learning model without revealing their private datasets. However, a malicious aggregation server might use the model parameters to derive sensitive information about the training dataset used. To address such leakage, differential privacy and cryptographic techniques have been investigated in prior work, but these often res… ▽ More Federated learning enables multiple data owners to jointly train a machine learning model without revealing their private datasets. However, a malicious aggregation server might use the model parameters to derive sensitive information about the training dataset used. To address such leakage, differential privacy and cryptographic techniques have been investigated in prior work, but these often result in large communication overheads or impact model performance. To mitigate this centralization of power, we propose SCOTCH, a decentralized m-party secure-computation framework for federated aggregation that deploys MPC primitives, such as secret sharing. Our protocol is simple, efficient, and provides strict privacy guarantees against curious aggregators or colluding data-owners with minimal communication overheads compared to other existing state-of-the-art privacy-preserving federated learning frameworks. We evaluate our framework by performing extensive experiments on multiple datasets with promising results. SCOTCH can train the standard MLP NN with the training dataset split amongst 3 participating users and 3 aggregating servers with 96.57% accuracy on MNIST, and 98.40% accuracy on the Extended MNIST (digits) dataset, while providing various optimizations. △ Less

Submitted 15 February, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22), Third AAAI Privacy-Preserving Artificial Intelligence (PPAI-22) Workshop

arXiv:2112.04948 [pdf, other]

PARL: Enhancing Diversity of Ensemble Networks to Resist Adversarial Attacks via Pairwise Adversarially Robust Loss Function

Authors: Manaar Alam, Shubhajit Datta, Debdeep Mukhopadhyay, Arijit Mondal, Partha Pratim Chakrabarti

Abstract: The security of Deep Learning classifiers is a critical field of study because of the existence of adversarial attacks. Such attacks usually rely on the principle of transferability, where an adversarial example crafted on a surrogate classifier tends to mislead the target classifier trained on the same dataset even if both classifiers have quite different architecture. Ensemble methods against ad… ▽ More The security of Deep Learning classifiers is a critical field of study because of the existence of adversarial attacks. Such attacks usually rely on the principle of transferability, where an adversarial example crafted on a surrogate classifier tends to mislead the target classifier trained on the same dataset even if both classifiers have quite different architecture. Ensemble methods against adversarial attacks demonstrate that an adversarial example is less likely to mislead multiple classifiers in an ensemble having diverse decision boundaries. However, recent ensemble methods have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation. This paper attempts to develop a new ensemble methodology that constructs multiple diverse classifiers using a Pairwise Adversarially Robust Loss (PARL) function during the training procedure. PARL utilizes gradients of each layer with respect to input in every classifier within the ensemble simultaneously. The proposed training procedure enables PARL to achieve higher robustness against black-box transfer attacks compared to previous ensemble methods without adversely affecting the accuracy of clean examples. We also evaluate the robustness in the presence of white-box attacks, where adversarial examples are crafted using parameters of the target classifier. We present extensive experiments using standard image classification datasets like CIFAR-10 and CIFAR-100 trained using standard ResNet20 classifier against state-of-the-art adversarial attacks to demonstrate the robustness of the proposed ensemble methodology. △ Less

Submitted 9 December, 2021; originally announced December 2021.

arXiv:2111.12938 [pdf, other]

doi 10.1109/LSENS.2021.3139473

SCLAiR : Supervised Contrastive Learning for User and Device Independent Airwriting Recognition

Authors: Ayush Tripathi, Arnab Kumar Mondal, Lalan Kumar, Prathosh A. P

Abstract: Airwriting Recognition is the problem of identifying letters written in free space with finger movement. It is essentially a specialized case of gesture recognition, wherein the vocabulary of gestures corresponds to letters as in a particular language. With the wide adoption of smart wearables in the general population, airwriting recognition using motion sensors from a smart-band can be used as a… ▽ More Airwriting Recognition is the problem of identifying letters written in free space with finger movement. It is essentially a specialized case of gesture recognition, wherein the vocabulary of gestures corresponds to letters as in a particular language. With the wide adoption of smart wearables in the general population, airwriting recognition using motion sensors from a smart-band can be used as a medium of user input for applications in Human-Computer Interaction. There has been limited work in the recognition of in-air trajectories using motion sensors, and the performance of the techniques in the case when the device used to record signals is changed has not been explored hitherto. Motivated by these, a new paradigm for device and user-independent airwriting recognition based on supervised contrastive learning is proposed. A two stage classification strategy is employed, the first of which involves training an encoder network with supervised contrastive loss. In the subsequent stage, a classification head is trained with the encoder weights kept frozen. The efficacy of the proposed method is demonstrated through experiments on a publicly available dataset and also with a dataset recorded in our lab using a different device. Experiments have been performed in both supervised and unsupervised settings and compared against several state-of-the-art domain adaptation techniques. Data and the code for our implementation will be made available at https://github.com/ayushayt/SCLAiR. △ Less

Submitted 29 December, 2021; v1 submitted 25 November, 2021; originally announced November 2021.

arXiv:2111.07145 [pdf, other]

doi 10.1007/s00530-021-00775-9.pdf

New Performance Measures for Object Tracking under Complex Environments

Authors: Ajoy Mondal

Abstract: Various performance measures based on the ground truth and without ground truth exist to evaluate the quality of a developed tracking algorithm. The existing popular measures - average center location error (ACLE) and average tracking accuracy (ATA) based on ground truth, may sometimes create confusion to quantify the quality of a developed algorithm for tracking an object under some complex envir… ▽ More Various performance measures based on the ground truth and without ground truth exist to evaluate the quality of a developed tracking algorithm. The existing popular measures - average center location error (ACLE) and average tracking accuracy (ATA) based on ground truth, may sometimes create confusion to quantify the quality of a developed algorithm for tracking an object under some complex environments (e.g., scaled or oriented or both scaled and oriented object). In this article, we propose three new auxiliary performance measures based on ground truth information to evaluate the quality of a developed tracking algorithm under such complex environments. Moreover, one performance measure is developed by combining both two existing measures ACLE and ATA and three new proposed measures for better quantifying the developed tracking algorithm under such complex conditions. Some examples and experimental results conclude that the proposed measure is better than existing measures to quantify one developed algorithm for tracking objects under such complex environments. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2111.07129 [pdf, other]

Visual Understanding of Complex Table Structures from Document Images

Authors: Sachin Raja, Ajoy Mondal, C V Jawahar

Abstract: Table structure recognition is necessary for a comprehensive understanding of documents. Tables in unstructured business documents are tough to parse due to the high diversity of layouts, varying alignments of contents, and the presence of empty cells. The problem is particularly difficult because of challenges in identifying individual cells using visual or linguistic contexts or both. Accurate d… ▽ More Table structure recognition is necessary for a comprehensive understanding of documents. Tables in unstructured business documents are tough to parse due to the high diversity of layouts, varying alignments of contents, and the presence of empty cells. The problem is particularly difficult because of challenges in identifying individual cells using visual or linguistic contexts or both. Accurate detection of table cells (including empty cells) simplifies structure extraction and hence, it becomes the prime focus of our work. We propose a novel object-detection-based deep model that captures the inherent alignments of cells within tables and is fine-tuned for fast optimization. Despite accurate detection of cells, recognizing structures for dense tables may still be challenging because of difficulties in capturing long-range row/column dependencies in presence of multi-row/column spanning cells. Therefore, we also aim to improve structure recognition by deducing a novel rectilinear graph-based formulation. From a semantics perspective, we highlight the significance of empty cells in a table. To take these cells into account, we suggest an enhancement to a popular evaluation criterion. Finally, we introduce a modestly sized evaluation dataset with an annotation style inspired by human cognition to encourage new approaches to the problem. Our framework improves the previous state-of-the-art performance by a 2.7% average F1-score on benchmark datasets. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2111.07102 [pdf, other]

doi 10.1007/s10489-021-02530-z

Deep Neural Networks for Automatic Grain-matrix Segmentation in Plane and Cross-polarized Sandstone Photomicrographs

Authors: Rajdeep Das, Ajoy Mondal, Tapan Chakraborty, Kuntal Ghosh

Abstract: Grain segmentation of sandstone that is partitioning the grain from its surrounding matrix/cement in the thin section is the primary step for computer-aided mineral identification and sandstone classification. The microscopic images of sandstone contain many mineral grains and their surrounding matrix/cement. The distinction between adjacent grains and the matrix is often ambiguous, making grain s… ▽ More Grain segmentation of sandstone that is partitioning the grain from its surrounding matrix/cement in the thin section is the primary step for computer-aided mineral identification and sandstone classification. The microscopic images of sandstone contain many mineral grains and their surrounding matrix/cement. The distinction between adjacent grains and the matrix is often ambiguous, making grain segmentation difficult. Various solutions exist in literature to handle these problems; however, they are not robust against sandstone petrography's varied pattern. In this paper, we formulate grain segmentation as a pixel-wise two-class (i.e., grain and background) semantic segmentation task. We develop a deep learning-based end-to-end trainable framework named Deep Semantic Grain Segmentation network (DSGSN), a data-driven method, and provide a generic solution. As per the authors' knowledge, this is the first work where the deep neural network is explored to solve the grain segmentation problem. Extensive experiments on microscopic images highlight that our method obtains better segmentation accuracy than various segmentation architectures with more parameters. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2111.06867 [pdf, other]

Flatee: Federated Learning Across Trusted Execution Environments

Authors: Arup Mondal, Yash More, Ruthu Hulikal Rooparaghunath, Debayan Gupta

Abstract: Federated learning allows us to distributively train a machine learning model where multiple parties share local model parameters without sharing private data. However, parameter exchange may still leak information. Several approaches have been proposed to overcome this, based on multi-party computation, fully homomorphic encryption, etc.; many of these protocols are slow and impractical for real-… ▽ More Federated learning allows us to distributively train a machine learning model where multiple parties share local model parameters without sharing private data. However, parameter exchange may still leak information. Several approaches have been proposed to overcome this, based on multi-party computation, fully homomorphic encryption, etc.; many of these protocols are slow and impractical for real-world use as they involve a large number of cryptographic operations. In this paper, we propose the use of Trusted Execution Environments (TEE), which provide a platform for isolated execution of code and handling of data, for this purpose. We describe Flatee, an efficient privacy-preserving federated learning framework across TEEs, which considerably reduces training and communication time. Our framework can handle malicious parties (we do not natively solve adversarial data poisoning, though we describe a preliminary approach to handle this). △ Less

Submitted 12 November, 2021; originally announced November 2021.

Comments: IEEE Euro S&P 2021 Poster; see https://www.ieee-security.org/TC/EuroSP2021/posters.html

arXiv:2109.14979 [pdf, other]

doi 10.1109/ICCVW54120.2021.00103

Moving Object Detection for Event-based vision using Graph Spectral Clustering

Authors: Anindya Mondal, Shashant R, Jhony H. Giraldo, Thierry Bouwmans, Ananda S. Chowdhury

Abstract: Moving object detection has been a central topic of discussion in computer vision for its wide range of applications like in self-driving cars, video surveillance, security, and enforcement. Neuromorphic Vision Sensors (NVS) are bio-inspired sensors that mimic the working of the human eye. Unlike conventional frame-based cameras, these sensors capture a stream of asynchronous 'events' that pose mu… ▽ More Moving object detection has been a central topic of discussion in computer vision for its wide range of applications like in self-driving cars, video surveillance, security, and enforcement. Neuromorphic Vision Sensors (NVS) are bio-inspired sensors that mimic the working of the human eye. Unlike conventional frame-based cameras, these sensors capture a stream of asynchronous 'events' that pose multiple advantages over the former, like high dynamic range, low latency, low power consumption, and reduced motion blur. However, these advantages come at a high cost, as the event camera data typically contains more noise and has low resolution. Moreover, as event-based cameras can only capture the relative changes in brightness of a scene, event data do not contain usual visual information (like texture and color) as available in video data from normal cameras. So, moving object detection in event-based cameras becomes an extremely challenging task. In this paper, we present an unsupervised Graph Spectral Clustering technique for Moving Object Detection in Event-based data (GSCEventMOD). We additionally show how the optimum number of moving objects can be automatically determined. Experimental comparisons on publicly available datasets show that the proposed GSCEventMOD algorithm outperforms a number of state-of-the-art techniques by a maximum margin of 30%. △ Less

Submitted 2 December, 2021; v1 submitted 30 September, 2021; originally announced September 2021.

Comments: Ten pages, five figures, Published in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada

arXiv:2109.01879 [pdf]

doi 10.1109/UPCON52273.2021.9667636

Moving Object Detection for Event-based Vision using k-means Clustering

Authors: Anindya Mondal, Mayukhmali Das

Abstract: Moving object detection is important in computer vision. Event-based cameras are bio-inspired cameras that work by mimicking the working of the human eye. These cameras have multiple advantages over conventional frame-based cameras, like reduced latency, HDR, reduced motion blur during high motion, low power consumption, etc. In spite of these advantages, event-based cameras are noise-sensitive an… ▽ More Moving object detection is important in computer vision. Event-based cameras are bio-inspired cameras that work by mimicking the working of the human eye. These cameras have multiple advantages over conventional frame-based cameras, like reduced latency, HDR, reduced motion blur during high motion, low power consumption, etc. In spite of these advantages, event-based cameras are noise-sensitive and have low resolution. Moreover, the task of moving object detection in these cameras is difficult, as event-based sensors lack useful visual features like texture and color. In this paper, we investigate the application of the k-means clustering technique in detecting moving objects in event-based data. △ Less

Submitted 11 January, 2022; v1 submitted 4 September, 2021; originally announced September 2021.

Comments: Nine pages, five figures, Published in 2021 IEEE 8th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)

arXiv:2109.00659 [pdf, other]

Semantic Slicing of Architectural Change Commits: Towards Semantic Design Review

Authors: Amit Kumar Mondal, Chanchal K. Roy, Kevin A. Schneider, Banani Roy, Sristy Sumana Nath

Abstract: Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural for… ▽ More Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural formulations are paramount for developing and deploying a system. Untangling architectural changes, recovering semantic design, and producing design notes are the crucial tasks of the design review process. To support these tasks, we construct a lightweight tool [4] that can detect and decompose semantic slices of a commit containing architectural instances. A semantic slice consists of a description of relational information of involved modules, their classes, methods and connected modules in a change instance, which is easy to understand to a reviewer. We extract various directory and naming structures (DANS) properties from the source code for developing our tool. Utilizing the DANS properties, our tool first detects architectural change instances based on our defined metric and then decomposes the slices (based on string processing). Our preliminary investigation with ten open-source projects (developed in Java and Kotlin) reveals that the DANS properties produce highly reliable precision and recall (93-100%) for detecting and generating architectural slices. Our proposed tool will serve as the preliminary approach for the semantic design recovery and design summary generation for the project releases. △ Less

Submitted 1 September, 2021; originally announced September 2021.

arXiv:2108.07793

Modeling Pedagogical Learning Environment with Hybrid Model based on ICT

Authors: Al Maruf Hassan, Istiak Ahmed Mondal

Abstract: Pedagogy is a method that handles the ethos and culture of instruction from educators and the learning of learners. Pedagogy of Information and Communications Technology (ICT) refers to the interactions among the teacher, children, and learning environment based on ICT. It is a discipline that deals with the theory and practice of teaching strategies, teaching actions, teaching judgments, and deci… ▽ More Pedagogy is a method that handles the ethos and culture of instruction from educators and the learning of learners. Pedagogy of Information and Communications Technology (ICT) refers to the interactions among the teacher, children, and learning environment based on ICT. It is a discipline that deals with the theory and practice of teaching strategies, teaching actions, teaching judgments, and decisions. It is also the understanding and needs of students as well as the background and interests of an individual one. In this paper, we have designed the pedagogical learning environment from the perspective of ICT education. In our methodology of the pedagogy for ICT, education includes the interaction among different elements. The methodology improves to propagate convenience differently into the educational environment. We are also building a hybrid model for the ICT development program. The hybrid model represents the combination of standards, stages, year level, and class level as well as brings it into one umbrella. We have constructed the pedagogical learning environment theoretically from the perspective of ICT education to the consideration of outcome-based ICT learning. Outcome-based education is a fundamental element for building any nation completely around the globe. △ Less

Submitted 27 August, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: Problem has been solved related to the problem has been solved related to a big error in the basic concept

arXiv:2107.07709 [pdf, other]

ScRAE: Deterministic Regularized Autoencoders with Flexible Priors for Clustering Single-cell Gene Expression Data

Authors: Arnab Kumar Mondal, Himanshu Asnani, Parag Singla, Prathosh AP

Abstract: Clustering single-cell RNA sequence (scRNA-seq) data poses statistical and computational challenges due to their high-dimensionality and data-sparsity, also known as `dropout' events. Recently, Regularized Auto-Encoder (RAE) based deep neural network models have achieved remarkable success in learning robust low-dimensional representations. The basic idea in RAEs is to learn a non-linear mapping f… ▽ More Clustering single-cell RNA sequence (scRNA-seq) data poses statistical and computational challenges due to their high-dimensionality and data-sparsity, also known as `dropout' events. Recently, Regularized Auto-Encoder (RAE) based deep neural network models have achieved remarkable success in learning robust low-dimensional representations. The basic idea in RAEs is to learn a non-linear mapping from the high-dimensional data space to a low-dimensional latent space and vice-versa, simultaneously imposing a distributional prior on the latent space, which brings in a regularization effect. This paper argues that RAEs suffer from the infamous problem of bias-variance trade-off in their naive formulation. While a simple AE without a latent regularization results in data over-fitting, a very strong prior leads to under-representation and thus bad clustering. To address the above issues, we propose a modified RAE framework (called the scRAE) for effective clustering of the single-cell RNA sequencing data. scRAE consists of deterministic AE with a flexibly learnable prior generator network, which is jointly trained with the AE. This facilitates scRAE to trade-off better between the bias and variance in the latent space. We demonstrate the efficacy of the proposed method through extensive experimentation on several real-world single-cell Gene expression datasets. △ Less

Submitted 16 July, 2021; originally announced July 2021.

Comments: IEEE/ACM Transactions on Computational Biology and Bioinformatics

arXiv:2105.03237 [pdf, other]

Mini-batch graphs for robust image classification

Authors: Arnab Kumar Mondal, Vineet Jain, Kaleem Siddiqi

Abstract: Current deep learning models for classification tasks in computer vision are trained using mini-batches. In the present article, we take advantage of the relationships between samples in a mini-batch, using graph neural networks to aggregate information from similar images. This helps mitigate the adverse effects of alterations to the input images on classification performance. Diverse experiments… ▽ More Current deep learning models for classification tasks in computer vision are trained using mini-batches. In the present article, we take advantage of the relationships between samples in a mini-batch, using graph neural networks to aggregate information from similar images. This helps mitigate the adverse effects of alterations to the input images on classification performance. Diverse experiments on image-based object and scene classification show that this approach not only improves a classifier's performance but also increases its robustness to image perturbations and adversarial attacks. Further, we also show that mini-batch graph neural networks can help to alleviate the problem of mode collapse in Generative Adversarial Networks. △ Less

Submitted 21 April, 2021; originally announced May 2021.

arXiv:2104.08524 [pdf, other]

Multilingual and Cross-Lingual Intent Detection from Spoken Data

Authors: Daniela Gerz, Pei-Hao Su, Razvan Kusztos, Avishek Mondal, Michał Lis, Eshan Singhal, Nikola Mrkšić, Tsung-Hsien Wen, Ivan Vulić

Abstract: We present a systematic study on multilingual and cross-lingual intent detection from spoken data. The study leverages a new resource put forth in this work, termed MInDS-14, a first training and evaluation resource for the intent detection task with spoken data. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language… ▽ More We present a systematic study on multilingual and cross-lingual intent detection from spoken data. The study leverages a new resource put forth in this work, termed MInDS-14, a first training and evaluation resource for the intent detection task with spoken data. It covers 14 intents extracted from a commercial system in the e-banking domain, associated with spoken examples in 14 diverse language varieties. Our key results indicate that combining machine translation models with state-of-the-art multilingual sentence encoders (e.g., LaBSE) can yield strong intent detectors in the majority of target languages covered in MInDS-14, and offer comparative analyses across different axes: e.g., zero-shot versus few-shot learning, translation direction, and impact of speech recognition. We see this work as an important step towards more inclusive development and evaluation of multilingual intent detectors from spoken data, in a much wider spectrum of languages compared to prior work. △ Less

Submitted 17 April, 2021; originally announced April 2021.

Showing 1–50 of 85 results for author: Mondal, A