Search | arXiv e-print repository

Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level Synthesis

Authors: Sneha Swaroopa, Rijoy Mukherjee, Anushka Debnath, Rajat Subhra Chakraborty

Abstract: The ever-growing popularity of large language models (LLMs) has resulted in their increasing adoption for hardware design and verification. Prior research has attempted to assess the capability of LLMs to automate digital hardware design by producing superior-quality Register Transfer Logic (RTL) descriptions, particularly in Verilog. However, these tests have revealed that Verilog code production… ▽ More The ever-growing popularity of large language models (LLMs) has resulted in their increasing adoption for hardware design and verification. Prior research has attempted to assess the capability of LLMs to automate digital hardware design by producing superior-quality Register Transfer Logic (RTL) descriptions, particularly in Verilog. However, these tests have revealed that Verilog code production using LLMs at current state-of-the-art lack sufficient functional correctness to be practically viable, compared to automatic generation of programs in general-purpose programming languages such as C, C++, Python, etc. With this as the key insight, in this paper we assess the performance of a two-stage software pipeline for automated Verilog RTL generation: LLM based automatic generation of annotated C++ code suitable for high-level synthesis (HLS), followed by HLS to generate Verilog RTL. We have benchmarked the performance of our proposed scheme using the open-source VerilogEval dataset, for four different industry-scale LLMs, and the Vitis HLS tool. Our experimental results demonstrate that our two-step technique substantially outperforms previous proposed techniques of direct Verilog RTL generation by LLMs in terms of average functional correctness rates, reaching score of 0.86 in pass@1 metric. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2406.12444 [pdf, other]

Who Checks the Checkers? Exploring Source Credibility in Twitter's Community Notes

Authors: Uku Kangur, Roshni Chakraborty, Rajesh Sharma

Abstract: In recent years, the proliferation of misinformation on social media platforms has become a significant concern. Initially designed for sharing information and fostering social connections, platforms like Twitter (now rebranded as X) have also unfortunately become conduits for spreading misinformation. To mitigate this, these platforms have implemented various mechanisms, including the recent sugg… ▽ More In recent years, the proliferation of misinformation on social media platforms has become a significant concern. Initially designed for sharing information and fostering social connections, platforms like Twitter (now rebranded as X) have also unfortunately become conduits for spreading misinformation. To mitigate this, these platforms have implemented various mechanisms, including the recent suggestion to use crowd-sourced non-expert fact-checkers to enhance the scalability and efficiency of content vetting. An example of this is the introduction of Community Notes on Twitter. While previous research has extensively explored various aspects of Twitter tweets, such as information diffusion, sentiment analytics and opinion summarization, there has been a limited focus on the specific feature of Twitter Community Notes, despite its potential role in crowd-sourced fact-checking. Prior research on Twitter Community Notes has involved empirical analysis of the feature's dataset and comparative studies that also include other methods like expert fact-checking. Distinguishing itself from prior works, our study covers a multi-faceted analysis of sources and audience perception within Community Notes. We find that the majority of cited sources are news outlets that are left-leaning and are of high factuality, pointing to a potential bias in the platform's community fact-checking. Left biased and low factuality sources validate tweets more, while Center sources are used more often to refute tweet content. Additionally, source factuality significantly influences public agreement and helpfulness of the notes, highlighting the effectiveness of the Community Notes Ranking algorithm. These findings showcase the impact and biases inherent in community-based fact-checking initiatives. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.09390 [pdf, other]

LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

Authors: Rajatsubhra Chakraborty, Arkaprava Sinha, Dominick Reilly, Manish Kumar Govind, Pu Wang, Francois Bremond, Srijan Das

Abstract: Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X,… ▽ More Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X, comprising 100K RGB video-instruction pairs, language descriptions, 3D skeletons, and action-conditioned object trajectories. We introduce LLAVIDAL, an LLVM capable of incorporating 3D poses and relevant object trajectories to understand the intricate spatiotemporal relationships within ADLs. Furthermore, we present a novel benchmark, ADLMCQ, for quantifying LLVM effectiveness in ADL scenarios. When trained on ADL-X, LLAVIDAL consistently achieves state-of-the-art performance across all ADL evaluation metrics. Qualitative analysis reveals LLAVIDAL's temporal reasoning capabilities in understanding ADL. The link to the dataset is provided at: https://adl-x.github.io/ △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.06551 [pdf, other]

ADSumm: Annotated Ground-truth Summary Datasets for Disaster Tweet Summarization

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: Online social media platforms, such as Twitter, provide valuable information during disaster events. Existing tweet disaster summarization approaches provide a summary of these events to aid government agencies, humanitarian organizations, etc., to ensure effective disaster response. In the literature, there are two types of approaches for disaster summarization, namely, supervised and unsupervise… ▽ More Online social media platforms, such as Twitter, provide valuable information during disaster events. Existing tweet disaster summarization approaches provide a summary of these events to aid government agencies, humanitarian organizations, etc., to ensure effective disaster response. In the literature, there are two types of approaches for disaster summarization, namely, supervised and unsupervised approaches. Although supervised approaches are typically more effective, they necessitate a sizable number of disaster event summaries for testing and training. However, there is a lack of good number of disaster summary datasets for training and evaluation. This motivates us to add more datasets to make supervised learning approaches more efficient. In this paper, we present ADSumm, which adds annotated ground-truth summaries for eight disaster events which consist of both natural and man-made disaster events belonging to seven different countries. Our experimental analysis shows that the newly added datasets improve the performance of the supervised summarization approaches by 8-28% in terms of ROUGE-N F1-score. Moreover, in newly annotated dataset, we have added a category label for each input tweet which helps to ensure good coverage from different categories in summary. Additionally, we have added two other features relevance label and key-phrase, which provide information about the quality of a tweet and explanation about the inclusion of the tweet into summary, respectively. For ground-truth summary creation, we provide the annotation procedure adapted in detail, which has not been described in existing literature. Experimental analysis shows the quality of ground-truth summary is very good with Coverage, Relevance and Diversity. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06541 [pdf, other]

ATSumm: Auxiliary information enhanced approach for abstractive disaster Tweet Summarization with sparse training data

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: The abundance of situational information on Twitter poses a challenge for users to manually discern vital and relevant information during disasters. A concise and human-interpretable overview of this information helps decision-makers in implementing efficient and quick disaster response. Existing abstractive summarization approaches can be categorized as sentence-based or key-phrase-based approach… ▽ More The abundance of situational information on Twitter poses a challenge for users to manually discern vital and relevant information during disasters. A concise and human-interpretable overview of this information helps decision-makers in implementing efficient and quick disaster response. Existing abstractive summarization approaches can be categorized as sentence-based or key-phrase-based approaches. This paper focuses on sentence-based approach, which is typically implemented as a dual-phase procedure in literature. The initial phase, known as the extractive phase, involves identifying the most relevant tweets. The subsequent phase, referred to as the abstractive phase, entails generating a more human-interpretable summary. In this study, we adopt the methodology from prior research for the extractive phase. For the abstractive phase of summarization, most existing approaches employ deep learning-based frameworks, which can either be pre-trained or require training from scratch. However, to achieve the appropriate level of performance, it is imperative to have substantial training data for both methods, which is not readily available. This work presents an Abstractive Tweet Summarizer (ATSumm) that effectively addresses the issue of data sparsity by using auxiliary information. We introduced the Auxiliary Pointer Generator Network (AuxPGN) model, which utilizes a unique attention mechanism called Key-phrase attention. This attention mechanism incorporates auxiliary information in the form of key-phrases and their corresponding importance scores from the input tweets. We evaluate the proposed approach by comparing it with 10 state-of-the-art approaches across 13 disaster datasets. The evaluation results indicate that ATSumm achieves superior performance compared to state-of-the-art approaches, with improvement of 4-80% in ROUGE-N F1-score. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2403.13870 [pdf, other]

ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Authors: Rwiddhi Chakraborty, Adrian Sletten, Michael Kampffmeyer

Abstract: Group robustness strategies aim to mitigate learned biases in deep learning models that arise from spurious correlations present in their training datasets. However, most existing methods rely on the access to the label distribution of the groups, which is time-consuming and expensive to obtain. As a result, unsupervised group robustness strategies are sought. Based on the insight that a trained m… ▽ More Group robustness strategies aim to mitigate learned biases in deep learning models that arise from spurious correlations present in their training datasets. However, most existing methods rely on the access to the label distribution of the groups, which is time-consuming and expensive to obtain. As a result, unsupervised group robustness strategies are sought. Based on the insight that a trained model's classification strategies can be inferred accurately based on explainability heatmaps, we introduce ExMap, an unsupervised two stage mechanism designed to enhance group robustness in traditional classifiers. ExMap utilizes a clustering module to infer pseudo-labels based on a model's explainability heatmaps, which are then used during training in lieu of actual labels. Our empirical studies validate the efficacy of ExMap - We demonstrate that it bridges the performance gap with its supervised counterparts and outperforms existing partially supervised and unsupervised methods. Additionally, ExMap can be seamlessly integrated with existing group robustness learning strategies. Finally, we demonstrate its potential in tackling the emerging issue of multiple shortcut mitigation\footnote{Code available at \url{https://github.com/rwchakra/exmap}}. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.11418 [pdf, other]

Variational Sampling of Temporal Trajectories

Authors: Jurijs Nazarovs, Zhichun Huang, Xingjian Zhen, Sourav Pal, Rudrasis Chakraborty, Vikas Singh

Abstract: A deterministic temporal process can be determined by its trajectory, an element in the product space of (a) initial condition $z_0 \in \mathcal{Z}$ and (b) transition function $f: (\mathcal{Z}, \mathcal{T}) \to \mathcal{Z}$ often influenced by the control of the underlying dynamical system. Existing methods often model the transition function as a differential equation or as a recurrent neural ne… ▽ More A deterministic temporal process can be determined by its trajectory, an element in the product space of (a) initial condition $z_0 \in \mathcal{Z}$ and (b) transition function $f: (\mathcal{Z}, \mathcal{T}) \to \mathcal{Z}$ often influenced by the control of the underlying dynamical system. Existing methods often model the transition function as a differential equation or as a recurrent neural network. Despite their effectiveness in predicting future measurements, few results have successfully established a method for sampling and statistical inference of trajectories using neural networks, partially due to constraints in the parameterization. In this work, we introduce a mechanism to learn the distribution of trajectories by parameterizing the transition function $f$ explicitly as an element in a function space. Our framework allows efficient synthesis of novel trajectories, while also directly providing a convenient tool for inference, i.e., uncertainty estimation, likelihood evaluations and out of distribution detection for abnormal trajectories. These capabilities can have implications for various downstream tasks, e.g., simulation and evaluation for reinforcement learning. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2312.04548 [pdf, other]

Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?

Authors: Aritra Dutta, Srijan Das, Jacob Nielsen, Rajatsubhra Chakraborty, Mubarak Shah

Abstract: Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality. Additionally, the color content of the scenes, solar-zenith angle, and population density of different geographies influence the data diversity. These two factors… ▽ More Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality. Additionally, the color content of the scenes, solar-zenith angle, and population density of different geographies influence the data diversity. These two factors conjointly render suboptimal aerial-visual perception of the deep neural network (DNN) models trained primarily on the ground-view data, including the open-world foundational models. To pave the way for a transformative era of aerial detection, we present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives -- ground camera and drone-mounted camera. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes. This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets across all modalities and tasks. Through our extensive benchmarking on MAVREC, we recognize that augmenting object detectors with ground-view images from the corresponding geographical location is a superior pre-training strategy for aerial detection. Building on this strategy, we benchmark MAVREC with a curriculum-based semi-supervised object detection approach that leverages labeled (ground and aerial) and unlabeled (only aerial) images to enhance the aerial detection. We publicly release the MAVREC dataset: https://mavrec.github.io. △ Less

Submitted 7 December, 2023; originally announced December 2023.

ACM Class: I.4.0; I.4.8; I.5.1; I.5.4; I.2.10

arXiv:2310.18367 [pdf, other]

Unsupervised Learning of Molecular Embeddings for Enhanced Clustering and Emergent Properties for Chemical Compounds

Authors: Jaiveer Gill, Ratul Chakraborty, Reetham Gubba, Amy Liu, Shrey Jain, Chirag Iyer, Obaid Khwaja, Saurav Kumar

Abstract: The detailed analysis of molecular structures and properties holds great potential for drug development discovery through machine learning. Developing an emergent property in the model to understand molecules would broaden the horizons for development with a new computational tool. We introduce various methods to detect and cluster chemical compounds based on their SMILES data. Our first method, a… ▽ More The detailed analysis of molecular structures and properties holds great potential for drug development discovery through machine learning. Developing an emergent property in the model to understand molecules would broaden the horizons for development with a new computational tool. We introduce various methods to detect and cluster chemical compounds based on their SMILES data. Our first method, analyzing the graphical structures of chemical compounds using embedding data, employs vector search to meet our threshold value. The results yielded pronounced, concentrated clusters, and the method produced favorable results in querying and understanding the compounds. We also used natural language description embeddings stored in a vector database with GPT3.5, which outperforms the base model. Thus, we introduce a similarity search and clustering algorithm to aid in searching for and interacting with molecules, enhancing efficiency in chemical exploration and enabling future development of emergent properties in molecular property prediction models. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2309.15670 [pdf]

MONOVAB : An Annotated Corpus for Bangla Multi-label Emotion Detection

Authors: Sumit Kumar Banshal, Sajal Das, Shumaiya Akter Shammi, Narayan Ranjan Chakraborty

Abstract: In recent years, Sentiment Analysis (SA) and Emotion Recognition (ER) have been increasingly popular in the Bangla language, which is the seventh most spoken language throughout the entire world. However, the language is structurally complicated, which makes this field arduous to extract emotions in an accurate manner. Several distinct approaches such as the extraction of positive and negative sen… ▽ More In recent years, Sentiment Analysis (SA) and Emotion Recognition (ER) have been increasingly popular in the Bangla language, which is the seventh most spoken language throughout the entire world. However, the language is structurally complicated, which makes this field arduous to extract emotions in an accurate manner. Several distinct approaches such as the extraction of positive and negative sentiments as well as multiclass emotions, have been implemented in this field of study. Nevertheless, the extraction of multiple sentiments is an almost untouched area in this language. Which involves identifying several feelings based on a single piece of text. Therefore, this study demonstrates a thorough method for constructing an annotated corpus based on scrapped data from Facebook to bridge the gaps in this subject area to overcome the challenges. To make this annotation more fruitful, the context-based approach has been used. Bidirectional Encoder Representations from Transformers (BERT), a well-known methodology of transformers, have been shown the best results of all methods implemented. Finally, a web application has been developed to demonstrate the performance of the pre-trained top-performer model (BERT) for multi-label ER in Bangla. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2305.11592 [pdf, ps, other]

IKDSumm: Incorporating Key-phrases into BERT for extractive Disaster Tweet Summarization

Authors: Piyush Kumar Garg, Roshni Chakraborty, Srishti Gupta, Sourav Kumar Dandapat

Abstract: Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Therefore, humanitarian organizations, government agencies, and volunteers rely on a summary of this information, i.e., tweets, for effective disaster management. Although there are several existing supervised and unsupervised approaches for automated tweet summary approaches… ▽ More Online social media platforms, such as Twitter, are one of the most valuable sources of information during disaster events. Therefore, humanitarian organizations, government agencies, and volunteers rely on a summary of this information, i.e., tweets, for effective disaster management. Although there are several existing supervised and unsupervised approaches for automated tweet summary approaches, these approaches either require extensive labeled information or do not incorporate specific domain knowledge of disasters. Additionally, the most recent approaches to disaster summarization have proposed BERT-based models to enhance the summary quality. However, for further improved performance, we introduce the utilization of domain-specific knowledge without any human efforts to understand the importance (salience) of a tweet which further aids in summary creation and improves summary quality. In this paper, we propose a disaster-specific tweet summarization framework, IKDSumm, which initially identifies the crucial and important information from each tweet related to a disaster through key-phrases of that tweet. We identify these key-phrases by utilizing the domain knowledge (using existing ontology) of disasters without any human intervention. Further, we utilize these key-phrases to automatically generate a summary of the tweets. Therefore, given tweets related to a disaster, IKDSumm ensures fulfillment of the summarization key objectives, such as information coverage, relevance, and diversity in summary without any human intervention. We evaluate the performance of IKDSumm with 8 state-of-the-art techniques on 12 disaster datasets. The evaluation results show that IKDSumm outperforms existing techniques by approximately 2-79% in terms of ROUGE-N F1-score. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.11536 [pdf, other]

PORTRAIT: a hybrid aPproach tO cReate extractive ground-TRuth summAry for dIsaster evenT

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: Disaster summarization approaches provide an overview of the important information posted during disaster events on social media platforms, such as, Twitter. However, the type of information posted significantly varies across disasters depending on several factors like the location, type, severity, etc. Verification of the effectiveness of disaster summarization approaches still suffer due to the… ▽ More Disaster summarization approaches provide an overview of the important information posted during disaster events on social media platforms, such as, Twitter. However, the type of information posted significantly varies across disasters depending on several factors like the location, type, severity, etc. Verification of the effectiveness of disaster summarization approaches still suffer due to the lack of availability of good spectrum of datasets along with the ground-truth summary. Existing approaches for ground-truth summary generation (ground-truth for extractive summarization) relies on the wisdom and intuition of the annotators. Annotators are provided with a complete set of input tweets from which a subset of tweets is selected by the annotators for the summary. This process requires immense human effort and significant time. Additionally, this intuition-based selection of the tweets might lead to a high variance in summaries generated across annotators. Therefore, to handle these challenges, we propose a hybrid (semi-automated) approach (PORTRAIT) where we partly automate the ground-truth summary generation procedure. This approach reduces the effort and time of the annotators while ensuring the quality of the created ground-truth summary. We validate the effectiveness of PORTRAIT on 5 disaster events through quantitative and qualitative comparisons of ground-truth summaries generated by existing intuitive approaches, a semi-automated approach, and PORTRAIT. We prepare and release the ground-truth summaries for 5 disaster events which consist of both natural and man-made disaster events belonging to 4 different countries. Finally, we provide a study about the performance of various state-of-the-art summarization approaches on the ground-truth summaries generated by PORTRAIT using ROUGE-N F1-scores. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2303.09352 [pdf, other]

Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings

Authors: Daniel J. Trosten, Rwiddhi Chakraborty, Sigurd Løkse, Kristoffer Knutsen Wickstrøm, Robert Jenssen, Michael C. Kampffmeyer

Abstract: Distance-based classification is frequently used in transductive few-shot learning (FSL). However, due to the high-dimensionality of image representations, FSL classifiers are prone to suffer from the hubness problem, where a few points (hubs) occur frequently in multiple nearest neighbour lists of other points. Hubness negatively impacts distance-based classification when hubs from one class appe… ▽ More Distance-based classification is frequently used in transductive few-shot learning (FSL). However, due to the high-dimensionality of image representations, FSL classifiers are prone to suffer from the hubness problem, where a few points (hubs) occur frequently in multiple nearest neighbour lists of other points. Hubness negatively impacts distance-based classification when hubs from one class appear often among the nearest neighbors of points from another class, degrading the classifier's performance. To address the hubness problem in FSL, we first prove that hubness can be eliminated by distributing representations uniformly on the hypersphere. We then propose two new approaches to embed representations on the hypersphere, which we prove optimize a tradeoff between uniformity and local similarity preservation -- reducing hubness while retaining class structure. Our experiments show that the proposed methods reduce hubness, and significantly improves transductive FSL accuracy for a wide range of classifiers. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: CVPR 2023

arXiv:2210.06354 [pdf, other]

Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity

Authors: Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

Abstract: Automatic Audio Captioning (AAC) refers to the task of translating an audio sample into a natural language (NL) text that describes the audio events, source of the events and their relationships. Unlike NL text generation tasks, which rely on metrics like BLEU, ROUGE, METEOR based on lexical semantics for evaluation, the AAC evaluation metric requires an ability to map NL text (phrases) that corre… ▽ More Automatic Audio Captioning (AAC) refers to the task of translating an audio sample into a natural language (NL) text that describes the audio events, source of the events and their relationships. Unlike NL text generation tasks, which rely on metrics like BLEU, ROUGE, METEOR based on lexical semantics for evaluation, the AAC evaluation metric requires an ability to map NL text (phrases) that correspond to similar sounds in addition lexical semantics. Current metrics used for evaluation of AAC tasks lack an understanding of the perceived properties of sound represented by text. In this paper, wepropose a novel metric based on Text-to-Audio Grounding (TAG), which is, useful for evaluating cross modal tasks like AAC. Experiments on publicly available AAC data-set shows our evaluation metric to perform better compared to existing metrics used in NL text and image captioning literature. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 9 pages, 8 figures,

arXiv:2207.09684 [pdf, other]

On the Versatile Uses of Partial Distance Correlation in Deep Learning

Authors: Xingjian Zhen, Zihang Meng, Rudrasis Chakraborty, Vikas Singh

Abstract: Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., comparing vision transformers to CNNs, systematic com… ▽ More Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., comparing vision transformers to CNNs, systematic comparison of function, especially across different networks, remains difficult and is often carried out layer by layer. Approaches such as canonical correlation analysis (CCA) are applicable in principle, but have been sparingly used so far. In this paper, we revisit a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions. We describe the steps necessary to carry out its deployment for large scale models -- this opens the door to a surprising array of applications ranging from conditioning one deep model w.r.t. another, learning disentangled representations as well as optimizing diverse models that would directly be more robust to adversarial attacks. Our experiments suggest a versatile regularizer (or constraint) with many advantages, which avoids some of the common difficulties one faces in such analyses. Code is at https://github.com/zhenxingjian/Partial_Distance_Correlation. △ Less

Submitted 8 November, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: This paper has been selected as best paper award for ECCV 2022!

Journal ref: ECCV 2022

arXiv:2203.15234 [pdf, other]

Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets

Authors: Vishnu Suresh Lokhande, Rudrasis Chakraborty, Sathya N. Ravi, Vikas Singh

Abstract: Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a {\em single} source of variability (e.g., different scanners), domain adaptation and matching the distributions of representations may suffice in many… ▽ More Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a {\em single} source of variability (e.g., different scanners), domain adaptation and matching the distributions of representations may suffice in many scenarios. But in the presence of {\em more than one} nuisance variable which concurrently influence the measurements, pooling datasets poses unique challenges, e.g., variations in the data can come from both the acquisition method as well as the demographics of participants (gender, age). Invariant representation learning, by itself, is ill-suited to fully model the data generation process. In this paper, we show how bringing recent results on equivariant representation learning (for studying symmetries in neural networks) instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. In particular, we demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted at 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2203.13812 [pdf, other]

Spatially Multi-conditional Image Generation

Authors: Ritika Chakraborty, Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool

Abstract: In most scenarios, conditional image generation can be thought of as an inversion of the image understanding process. Since generic image understanding involves solving multiple tasks, it is natural to aim at generating images via multi-conditioning. However, multi-conditional image generation is a very challenging problem due to the heterogeneity and the sparsity of the (in practice) available co… ▽ More In most scenarios, conditional image generation can be thought of as an inversion of the image understanding process. Since generic image understanding involves solving multiple tasks, it is natural to aim at generating images via multi-conditioning. However, multi-conditional image generation is a very challenging problem due to the heterogeneity and the sparsity of the (in practice) available conditioning labels. In this work, we propose a novel neural architecture to address the problem of heterogeneity and sparsity of the spatially multi-conditional labels. Our choice of spatial conditioning, such as by semantics and depth, is driven by the promise it holds for better control of the image generation process. The proposed method uses a transformer-like architecture operating pixel-wise, which receives the available labels as input tokens to merge them in a learned homogeneous space of labels. The merged labels are then used for image generation via conditional generative adversarial training. In this process, the sparsity of the labels is handled by simply dropping the input tokens corresponding to the missing labels at the desired locations, thanks to the proposed pixel-wise operating architecture. Our experiments on three benchmark datasets demonstrate the clear superiority of our method over the state-of-the-art and compared baselines. The source code will be made publicly available. △ Less

Submitted 14 July, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2203.01188 [pdf, ps, other]

EnDSUM: Entropy and Diversity based Disaster Tweet Summarization

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: The huge amount of information shared in Twitter during disaster events are utilized by government agencies and humanitarian organizations to ensure quick crisis response and provide situational updates. However, the huge number of tweets posted makes manual identification of the relevant tweets impossible. To address the information overload, there is a need to automatically generate summary of a… ▽ More The huge amount of information shared in Twitter during disaster events are utilized by government agencies and humanitarian organizations to ensure quick crisis response and provide situational updates. However, the huge number of tweets posted makes manual identification of the relevant tweets impossible. To address the information overload, there is a need to automatically generate summary of all the tweets which can highlight the important aspects of the disaster. In this paper, we propose an entropy and diversity based summarizer, termed as EnDSUM, specifically for disaster tweet summarization. Our comprehensive analysis on 6 datasets indicates the effectiveness of EnDSUM and additionally, highlights the scope of improvement of EnDSUM. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2202.09463 [pdf, other]

Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data

Authors: Jurijs Nazarovs, Rudrasis Chakraborty, Songwong Tasneeyapant, Sathya N. Ravi, Vikas Singh

Abstract: Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in such applications. The task of modeling not just… ▽ More Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in such applications. The task of modeling not just the observations but the hidden dynamics that are captured by the measurements poses interesting statistical/computational questions. We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing such panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms using MC based sampling methods and numerical ODE solvers. We demonstrate ME-NODE's utility on tasks spanning the spectrum from simulations and toy data to real longitudinal 3D imaging data from an Alzheimer's disease (AD) study, and study its performance in terms of accuracy of reconstruction for interpolation, uncertainty estimates and personalized prediction. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Journal ref: Proceedings of Machine Learning Research; PMLR 161:107-117, 2021

arXiv:2202.08504 [pdf, other]

Finding Representative Sampling Subsets in Sensor Graphs using Time Series Similarities

Authors: Roshni Chakraborty, Josefine Holm, Torben Bach Pedersen, Petar Popovski

Abstract: With the increasing use of IoT-enabled sensors, it is important to have effective methods for querying the sensors. For example, in a dense network of battery-driven temperature sensors, it is often possible to query (sample) just a subset of the sensors at any given time, since the values of the non-sampled sensors can be estimated from the sampled values. If we can divide the set of sensors into… ▽ More With the increasing use of IoT-enabled sensors, it is important to have effective methods for querying the sensors. For example, in a dense network of battery-driven temperature sensors, it is often possible to query (sample) just a subset of the sensors at any given time, since the values of the non-sampled sensors can be estimated from the sampled values. If we can divide the set of sensors into disjoint so-called representative sampling subsets that each represent the other sensors sufficiently well, we can alternate the sampling between the sampling subsets and thus, increase battery life significantly. In this paper, we formulate the problem of finding representative sampling subsets as a graph problem on a so-called sensor graph with the sensors as nodes. Our proposed solution, SubGraphSample, consists of two phases. In Phase-I, we create edges in the sensor graph based on the similarities between the time series of sensor values, analyzing six different techniques based on proven time series similarity metrics. In Phase-II, we propose two new techniques and extend four existing ones to find the maximal number of representative sampling subsets. Finally, we propose AutoSubGraphSample which auto-selects the best technique for Phase-I and Phase-II for a given dataset. Our extensive experimental evaluation shows that our approach can yield significant battery life improvements within realistic error bounds. △ Less

Submitted 18 February, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

arXiv:2202.03271 [pdf, ps, other]

Spectro Temporal EEG Biomarkers For Binary Emotion Classification

Authors: Upasana Tiwari, Rupayan Chakraborty, Sunil Kumar Kopparapu

Abstract: Electroencephalogram (EEG) is one of the most reliable physiological signal for emotion detection. Being non-stationary in nature, EEGs are better analysed by spectro temporal representations. Standard features like Discrete Wavelet Transformation (DWT) can represent temporal changes in spectral dynamics of an EEG, but is insufficient to extract information other way around, i.e. spectral changes… ▽ More Electroencephalogram (EEG) is one of the most reliable physiological signal for emotion detection. Being non-stationary in nature, EEGs are better analysed by spectro temporal representations. Standard features like Discrete Wavelet Transformation (DWT) can represent temporal changes in spectral dynamics of an EEG, but is insufficient to extract information other way around, i.e. spectral changes in temporal dynamics. On the other hand, Empirical mode decomposition (EMD) based features can be useful to bridge the above mentioned gap. Towards this direction, we extract two novel features on top of EMD, namely, (a) marginal hilbert spectrum (MHS) and (b) Holo-Hilbert spectral analysis (HHSA) based on EMD, to better represent emotions in 2D arousal-valence (A-V) space. The usefulness of these features for EEG emotion classification is investigated through extensive experiments using state-of-the-art classifiers. In addition, experiments conducted on DEAP dataset for binary emotion classification in both A-V space, reveal the efficacy of the proposed features over the standard set of temporal and spectral features. △ Less

Submitted 2 February, 2022; originally announced February 2022.

arXiv:2201.12352 [pdf, other]

Automatic Audio Captioning using Attention weighted Event based Embeddings

Authors: Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

Abstract: Automatic Audio Captioning (AAC) refers to the task of translating audio into a natural language that describes the audio events, source of the events and their relationships. The limited samples in AAC datasets at present, has set up a trend to incorporate transfer learning with Audio Event Detection (AED) as a parent task. Towards this direction, in this paper, we propose an encoder-decoder arch… ▽ More Automatic Audio Captioning (AAC) refers to the task of translating audio into a natural language that describes the audio events, source of the events and their relationships. The limited samples in AAC datasets at present, has set up a trend to incorporate transfer learning with Audio Event Detection (AED) as a parent task. Towards this direction, in this paper, we propose an encoder-decoder architecture with light-weight (i.e. with lesser learnable parameters) Bi-LSTM recurrent layers for AAC and compare the performance of two state-of-the-art pre-trained AED models as embedding extractors. Our results show that an efficient AED based embedding extractor combined with temporal attention and augmentation techniques is able to surpass existing literature with computationally intensive architectures. Further, we provide evidence of the ability of the non-uniform attention weighted encoding generated as a part of our model to facilitate the decoder glance over specific sections of the audio while generating each token. △ Less

Submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.07472 [pdf, other]

Detecting Stance in Tweets : A Signed Network based Approach

Authors: Roshni Chakraborty, Maitry Bhavsar, Sourav Kumar Dandapat, Joydeep Chandra

Abstract: Identifying user stance related to a political event has several applications, like determination of individual stance, shaping of public opinion, identifying popularity of government measures and many others. The huge volume of political discussions on social media platforms, like, Twitter, provide opportunities in developing automated mechanisms to identify individual stance and subsequently, sc… ▽ More Identifying user stance related to a political event has several applications, like determination of individual stance, shaping of public opinion, identifying popularity of government measures and many others. The huge volume of political discussions on social media platforms, like, Twitter, provide opportunities in developing automated mechanisms to identify individual stance and subsequently, scale to a large volume of users. However, issues like short text and huge variance in the vocabulary of the tweets make such exercise enormously difficult. Existing stance detection algorithms require either event specific training data or annotated twitter handles and therefore, are difficult to adapt to new events. In this paper, we propose a sign network based framework that use external information sources, like news articles to create a signed network of relevant entities with respect to a news event and subsequently use the same to detect stance of any tweet towards the event. Validation on 5,000 tweets related to 10 events indicates that the proposed approach can ensure over 6.5% increase in average F1 score compared to the existing stance detection approaches. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2201.06545 [pdf, ps, other]

OntoDSumm : Ontology based Tweet Summarization for Disaster Events

Authors: Piyush Kumar Garg, Roshni Chakraborty, Sourav Kumar Dandapat

Abstract: The huge popularity of social media platforms like Twitter attracts a large fraction of users to share real-time information and short situational messages during disasters. A summary of these tweets is required by the government organizations, agencies, and volunteers for efficient and quick disaster response. However, the huge influx of tweets makes it difficult to manually get a precise overvie… ▽ More The huge popularity of social media platforms like Twitter attracts a large fraction of users to share real-time information and short situational messages during disasters. A summary of these tweets is required by the government organizations, agencies, and volunteers for efficient and quick disaster response. However, the huge influx of tweets makes it difficult to manually get a precise overview of ongoing events. To handle this challenge, several tweet summarization approaches have been proposed. In most of the existing literature, tweet summarization is broken into a two-step process where in the first step, it categorizes tweets, and in the second step, it chooses representative tweets from each category. There are both supervised as well as unsupervised approaches found in literature to solve the problem of first step. Supervised approaches requires huge amount of labelled data which incurs cost as well as time. On the other hand, unsupervised approaches could not clusters tweet properly due to the overlapping keywords, vocabulary size, lack of understanding of semantic meaning etc. While, for the second step of summarization, existing approaches applied different ranking methods where those ranking methods are very generic which fail to compute proper importance of a tweet respect to a disaster. Both the problems can be handled far better with proper domain knowledge. In this paper, we exploited already existing domain knowledge by the means of ontology in both the steps and proposed a novel disaster summarization method OntoDSumm. We evaluate this proposed method with 4 state-of-the-art methods using 10 disaster datasets. Evaluation results reveal that OntoDSumm outperforms existing methods by approximately 2-66% in terms of ROUGE-1 F1 score. △ Less

Submitted 19 November, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

ACM Class: H.0

arXiv:2201.06437 [pdf, ps, other]

SigGAN : Adversarial Model for Learning Signed Relationships in Networks

Authors: Roshni Chakraborty, Ritwika Das, Joydeep Chandra

Abstract: Signed link prediction in graphs is an important problem that has applications in diverse domains. It is a binary classification problem that predicts whether an edge between a pair of nodes is positive or negative. Existing approaches for link prediction in unsigned networks cannot be directly applied for signed link prediction due to their inherent differences. Further, additional structural con… ▽ More Signed link prediction in graphs is an important problem that has applications in diverse domains. It is a binary classification problem that predicts whether an edge between a pair of nodes is positive or negative. Existing approaches for link prediction in unsigned networks cannot be directly applied for signed link prediction due to their inherent differences. Further, additional structural constraints, like, the structural balance property of the signed networks must be considered for signed link prediction. Recent signed link prediction approaches generate node representations using either generative models or discriminative models. Inspired by the recent success of Generative Adversarial Network (GAN) based models which comprises of a discriminator and generator in several applications, we propose a Generative Adversarial Network (GAN) based model for signed networks, SigGAN. It considers the requirements of signed networks, such as, integration of information from negative edges, high imbalance in number of positive and negative edges and structural balance theory. Comparing the performance with state of the art techniques on several real-world datasets validates the effectiveness of SigGAN. △ Less

Submitted 17 January, 2022; originally announced January 2022.

arXiv:2112.00305 [pdf, other]

Forward Operator Estimation in Generative Models with Kernel Transfer Operators

Authors: Zhichun Huang, Rudrasis Chakraborty, Vikas Singh

Abstract: Generative models which use explicit density modeling (e.g., variational autoencoders, flow-based generative models) involve finding a mapping from a known distribution, e.g. Gaussian, to the unknown input distribution. This often requires searching over a class of non-linear functions (e.g., representable by a deep neural network). While effective in practice, the associated runtime/memory costs… ▽ More Generative models which use explicit density modeling (e.g., variational autoencoders, flow-based generative models) involve finding a mapping from a known distribution, e.g. Gaussian, to the unknown input distribution. This often requires searching over a class of non-linear functions (e.g., representable by a deep neural network). While effective in practice, the associated runtime/memory costs can increase rapidly, usually as a function of the performance desired in an application. We propose a much cheaper (and simpler) strategy to estimate this mapping based on adapting known results in kernel transfer operators. We show that our formulation enables highly efficient distribution approximation and sampling, and offers surprisingly good empirical performance that compares favorably with powerful baselines, but with significant runtime savings. We show that the algorithm also performs well in small sample size settings (in brain imaging). △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2109.10213 [pdf, other]

Blockchain-based Covid Vaccination Registration and Monitoring

Authors: Shirajus Salekin Nabil, Md. Sabbir Alam Pran, Ali Abrar Al Haque, Narayan Ranjan Chakraborty, Mohammad Jabed Morshed Chowdhury, Md Sadek Ferdous

Abstract: Covid-19 (SARS-CoV-2) has changed almost all the aspects of our living. Governments around the world have imposed lockdown to slow down the transmissions. In the meantime, researchers worked hard to find the vaccine. Fortunately, we have found the vaccine, in fact a good number of them. However, managing the testing and vaccination process of the total population is a mammoth job. There are multip… ▽ More Covid-19 (SARS-CoV-2) has changed almost all the aspects of our living. Governments around the world have imposed lockdown to slow down the transmissions. In the meantime, researchers worked hard to find the vaccine. Fortunately, we have found the vaccine, in fact a good number of them. However, managing the testing and vaccination process of the total population is a mammoth job. There are multiple government and private sector organisations that are working together to ensure proper testing and vaccination. However, there is always delay or data silo problems in multi-organisational works. Therefore, streamlining this process is vital to improve the efficiency and save more lives. It is already proved that technology has a significant impact on the health sector, including blockchain. Blockchain provides a distributed system along with greater privacy, transparency and authenticity. In this article, we have presented a blockchain-based system that seamlessly integrates testing and vaccination system, allowing the system to be transparent. The instant verification of any tamper-proof result and a transparent and efficient vaccination system have been exhibited and implemented in the research. We have also implemented the system as "Digital Vaccine Passport" (DVP) and analysed its performance. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: 12 pages

arXiv:2106.15301 [pdf, other]

VolterraNet: A higher order convolutional network with group equivariance for homogeneous manifolds

Authors: Monami Banerjee, Rudrasis Chakraborty, Jose Bouza, Baba C. Vemuri

Abstract: Convolutional neural networks have been highly successful in image-based learning tasks due to their translation equivariance property. Recent work has generalized the traditional convolutional layer of a convolutional neural network to non-Euclidean spaces and shown group equivariance of the generalized convolution operation. In this paper, we present a novel higher order Volterra convolutional n… ▽ More Convolutional neural networks have been highly successful in image-based learning tasks due to their translation equivariance property. Recent work has generalized the traditional convolutional layer of a convolutional neural network to non-Euclidean spaces and shown group equivariance of the generalized convolution operation. In this paper, we present a novel higher order Volterra convolutional neural network (VolterraNet) for data defined as samples of functions on Riemannian homogeneous spaces. Analagous to the result for traditional convolutions, we prove that the Volterra functional convolutions are equivariant to the action of the isometry group admitted by the Riemannian homogeneous spaces, and under some restrictions, any non-linear equivariant function can be expressed as our homogeneous space Volterra convolution, generalizing the non-linear shift equivariant characterization of Volterra expansions in Euclidean space. We also prove that second order functional convolution operations can be represented as cascaded convolutions which leads to an efficient implementation. Beyond this, we also propose a dilated VolterraNet model. These advances lead to large parameter reductions relative to baseline non-Euclidean CNNs. To demonstrate the efficacy of the VolterraNet performance, we present several real data experiments involving classification tasks on spherical-MNIST, atomic energy, Shrec17 data sets, and group testing on diffusion MRI data. Performance comparisons to the state-of-the-art are also presented. △ Less

Submitted 5 June, 2021; originally announced June 2021.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)

arXiv:2106.07479 [pdf, other]

An Online Riemannian PCA for Stochastic Canonical Correlation Analysis

Authors: Zihang Meng, Rudrasis Chakraborty, Vikas Singh

Abstract: We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents an opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifolds. Our developments nicely complement existing me… ▽ More We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents an opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifolds. Our developments nicely complement existing methods for this problem which either require $O(d^3)$ time complexity per iteration with $O(\frac{1}{\sqrt{t}})$ convergence rate (where $d$ is the dimensionality) or only extract the top $1$ component with $O(\frac{1}{t})$ convergence rate. In contrast, our algorithm offers a strict improvement for this classical problem: it achieves $O(d^2k)$ runtime complexity per iteration for extracting the top $k$ canonical components with $O(\frac{1}{t})$ convergence rate. While the paper primarily focuses on the formulation and technical analysis of its properties, our experiments show that the empirical behavior on common datasets is quite promising. We also explore a potential application in training fair models where the label of protected attribute is missing or otherwise unavailable. △ Less

Submitted 8 June, 2021; originally announced June 2021.

arXiv:2105.06724 [pdf, other]

RC2020 Report: Learning De-biased Representations with Biased Representations

Authors: Rwiddhi Chakraborty, Shubhayu Das

Abstract: As part of the ML Reproducibility Challenge 2020, we investigated the ICML 2020 paper "Learning De-biased Representations with Biased Representations" by Bahng et al., where the authors formalize and attempt to tackle the so called "cross bias generalization" problem with a new approach they introduce called ReBias. This report contains results of our attempts at reproducing the work in the applic… ▽ More As part of the ML Reproducibility Challenge 2020, we investigated the ICML 2020 paper "Learning De-biased Representations with Biased Representations" by Bahng et al., where the authors formalize and attempt to tackle the so called "cross bias generalization" problem with a new approach they introduce called ReBias. This report contains results of our attempts at reproducing the work in the application area of Image Recognition, specifically on the datasets biased MNIST and ImageNet. We compare ReBias with other methods - Vanilla, Biased, RUBi (as implemented by the authors), and conclude with a discussion concerning the validity of the claims made by the paper. We were able to reproduce results reported for the biased MNIST dataset to within 1% of the original values reported in the paper. Like the authors, we report results averaged over 3 runs. However, in a later section, we provide some additional results that appear to weaken the central claim of the paper with regards to the biased MNIST dataset. We were not able to reproduce results for ImageNet as in the original paper, but based on communication with the authors, provide a discussion as to the reasons for the same. This work attempts to be useful to other researchers aiming to use ReBias for their own research purposes, advising on certain possible pitfalls that may be encountered in the process. △ Less

Submitted 14 May, 2021; originally announced May 2021.

Report number: 3742554

arXiv:2104.05888 [pdf, other]

Simpler Certified Radius Maximization by Propagating Covariances

Authors: Xingjian Zhen, Rudrasis Chakraborty, Vikas Singh

Abstract: One strategy for adversarially training a robust model is to maximize its certified radius -- the neighborhood around a given training sample for which the model's prediction remains unchanged. The scheme typically involves analyzing a "smoothed" classifier where one estimates the prediction corresponding to Gaussian samples in the neighborhood of each sample in the mini-batch, accomplished in pra… ▽ More One strategy for adversarially training a robust model is to maximize its certified radius -- the neighborhood around a given training sample for which the model's prediction remains unchanged. The scheme typically involves analyzing a "smoothed" classifier where one estimates the prediction corresponding to Gaussian samples in the neighborhood of each sample in the mini-batch, accomplished in practice by Monte Carlo sampling. In this paper, we investigate the hypothesis that this sampling bottleneck can potentially be mitigated by identifying ways to directly propagate the covariance matrix of the smoothed distribution through the network. To this end, we find that other than certain adjustments to the network, propagating the covariances must also be accompanied by additional accounting that keeps track of how the distributional moments transform and interact at each stage in the network. We show how satisfying these criteria yields an algorithm for maximizing the certified radius on datasets including Cifar-10, ImageNet, and Places365 while offering runtime savings on networks with moderate depth, with a small compromise in overall accuracy. We describe the details of the key modifications that enable practical use. Via various experiments, we evaluate when our simplifications are sensible, and what the key benefits and limitations are. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: This paper has been accepted by CVPR 2021 as an oral presentation. An introduction video can be found: https://youtu.be/m1ya2oNf5iE

arXiv:2103.13823 [pdf, ps, other]

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Authors: Ayush Tripathi, Rupayan Chakraborty, Sunil Kumar Kopparapu

Abstract: Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. This is primarily due to the tendency of the classifier to be biased towards the majority classes in the imbalanced dataset. In this paper, we propose a novel three step technique to address imbalanced data. As a first step we significantly oversample the… ▽ More Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. This is primarily due to the tendency of the classifier to be biased towards the majority classes in the imbalanced dataset. In this paper, we propose a novel three step technique to address imbalanced data. As a first step we significantly oversample the minority class distribution by employing the traditional Synthetic Minority OverSampling Technique (SMOTE) algorithm using the neighborhood of the minority class samples and in the next step we partition the generated samples using a Gaussian-Mixture Model based clustering algorithm. In the final step synthetic data samples are chosen based on the weight associated with the cluster, the weight itself being determined by the distribution of the majority class samples. Extensive experiments on several standard datasets from diverse domains shows the usefulness of the proposed technique in comparison with the original SMOTE and its state-of-the-art variants algorithms. △ Less

Submitted 26 March, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: 8 pages

Journal ref: ICPR 2020

arXiv:2103.05939 [pdf, other]

A Review and Refinement of Surprise Adequacy

Authors: Michael Weiss, Rwiddhi Chakraborty, Paolo Tonella

Abstract: Surprise Adequacy (SA) is one of the emerging and most promising adequacy criteria for Deep Learning (DL) testing. As an adequacy criterion, it has been used to assess the strength of DL test suites. In addition, it has also been used to find inputs to a Deep Neural Network (DNN) which were not sufficiently represented in the training data, or to select samples for DNN retraining. However, computa… ▽ More Surprise Adequacy (SA) is one of the emerging and most promising adequacy criteria for Deep Learning (DL) testing. As an adequacy criterion, it has been used to assess the strength of DL test suites. In addition, it has also been used to find inputs to a Deep Neural Network (DNN) which were not sufficiently represented in the training data, or to select samples for DNN retraining. However, computation of the SA metric for a test suite can be prohibitively expensive, as it involves a quadratic number of distance calculations. Hence, we developed and released a performance-optimized, but functionally equivalent, implementation of SA, reducing the evaluation time by up to 97\%. We also propose refined variants of the SA omputation algorithm, aiming to further increase the evaluation speed. We then performed an empirical study on MNIST, focused on the out-of-distribution detection capabilities of SA, which allowed us to reproduce parts of the results presented when SA was first released. The experiments show that our refined variants are substantially faster than plain SA, while producing comparable outcomes. Our experimental results exposed also an overlooked issue of SA: it can be highly sensitive to the non-determinism associated with the DNN training procedure. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: Accepted at DeepTest 2021 (ICSE Workshop)

arXiv:2102.08074 [pdf, other]

Semi Supervised Learning For Few-shot Audio Classification By Episodic Triplet Mining

Authors: Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

Abstract: Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. Prototypical networks incorporate few-shot metric learning, by constructing a class prototype in the form of a mean vector of the embedded support points within a class. The performance of prototypical networks in extreme few-shot scenarios (like one-shot) degrades drastically, mainl… ▽ More Few-shot learning aims to generalize unseen classes that appear during testing but are unavailable during training. Prototypical networks incorporate few-shot metric learning, by constructing a class prototype in the form of a mean vector of the embedded support points within a class. The performance of prototypical networks in extreme few-shot scenarios (like one-shot) degrades drastically, mainly due to the desuetude of variations within the clusters while constructing prototypes. In this paper, we propose to replace the typical prototypical loss function with an Episodic Triplet Mining (ETM) technique. The conventional triplet selection leads to overfitting, because of all possible combinations being used during training. We incorporate episodic training for mining the semi hard positive and the semi hard negative triplets to overcome the overfitting. We also propose an adaptation to make use of unlabeled training samples for better modeling. Experimenting on two different audio processing tasks, namely speaker recognition and audio event detection; show improved performances and hence the efficacy of ETM over the prototypical loss function and other meta-learning frameworks. Further, we show improved performances when unlabeled training samples are used. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: 5 pages

arXiv:2102.03902 [pdf, other]

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

Authors: Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh

Abstract: Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the input sequence length has limited its applicati… ▽ More Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the input sequence length has limited its application to longer sequences -- a topic being actively studied in the community. To address this limitation, we propose Nyströmformer -- a model that exhibits favorable scalability as a function of sequence length. Our idea is based on adapting the Nyström method to approximate standard self-attention with $O(n)$ complexity. The scalability of Nyströmformer enables application to longer sequences with thousands of tokens. We perform evaluations on multiple downstream tasks on the GLUE benchmark and IMDB reviews with standard sequence length, and find that our Nyströmformer performs comparably, or in a few cases, even slightly better, than standard self-attention. On longer sequence tasks in the Long Range Arena (LRA) benchmark, Nyströmformer performs favorably relative to other efficient self-attention methods. Our code is available at https://github.com/mlpen/Nystromformer. △ Less

Submitted 31 March, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

Comments: AAAI 2021; Code and supplement available at https://github.com/mlpen/Nystromformer

arXiv:2012.10013 [pdf, other]

Flow-based Generative Models for Learning Manifold to Manifold Mappings

Authors: Xingjian Zhen, Rudrasis Chakraborty, Liu Yang, Vikas Singh

Abstract: Many measurements or observations in computer vision and machine learning manifest as non-Euclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifold-valued data, and this has often provided strong improvements in performance, the literature on generative models for manifold data is quite sparse. Partly due to this gap, there… ▽ More Many measurements or observations in computer vision and machine learning manifest as non-Euclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifold-valued data, and this has often provided strong improvements in performance, the literature on generative models for manifold data is quite sparse. Partly due to this gap, there are also no modality transfer/translation models for manifold-valued data whereas numerous such methods based on generative models are available for natural images. This paper addresses this gap, motivated by a need in brain imaging -- in doing so, we expand the operating range of certain generative models (as well as generative models for modality transfer) from natural images to images with manifold-valued measurements. Our main result is the design of a two-stream version of GLOW (flow-based invertible generative models) that can synthesize information of a field of one type of manifold-valued measurements given another. On the theoretical side, we introduce three kinds of invertible layers for manifold-valued data, which are not only analogous to their functionality in flow-based generative models (e.g., GLOW) but also preserve the key benefits (determinants of the Jacobian are easy to calculate). For experiments, on a large dataset from the Human Connectome Project (HCP), we show promising results where we can reliably and accurately reconstruct brain images of a field of orientation distribution functions (ODF) from diffusion tensor images (DTI), where the latter has a $5\times$ faster acquisition time but at the expense of worse angular resolution. △ Less

Submitted 1 March, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: This paper has been accepted by AAAI 2021. A video introduction is on YouTube: https://youtu.be/0r96U0vXsCM The official GitHub repo is: https://github.com/zhenxingjian/Dual_Manifold_GLOW

arXiv:2006.12590 [pdf, other]

C-SURE: Shrinkage Estimator and Prototype Classifier for Complex-Valued Deep Learning

Authors: Yifei Xing, Rudrasis Chakraborty, Minxuan Duan, Stella Yu

Abstract: The James-Stein (JS) shrinkage estimator is a biased estimator that captures the mean of Gaussian random vectors.While it has a desirable statistical property of dominance over the maximum likelihood estimator (MLE) in terms of mean squared error (MSE), not much progress has been made on extending the estimator onto manifold-valued data. We propose C-SURE, a novel Stein's unbiased risk estimate… ▽ More The James-Stein (JS) shrinkage estimator is a biased estimator that captures the mean of Gaussian random vectors.While it has a desirable statistical property of dominance over the maximum likelihood estimator (MLE) in terms of mean squared error (MSE), not much progress has been made on extending the estimator onto manifold-valued data. We propose C-SURE, a novel Stein's unbiased risk estimate (SURE) of the JS estimator on the manifold of complex-valued data with a theoretically proven optimum over MLE. Adapting the architecture of the complex-valued SurReal classifier, we further incorporate C-SURE into a prototype convolutional neural network (CNN) classifier. We compare C-SURE with SurReal and a real-valued baseline on complex-valued MSTAR and RadioML datasets. C-SURE is more accurate and robust than SurReal, and the shrinkage estimator is always better than MLE for the same prototype classifier. Like SurReal, C-SURE is much smaller, outperforming the real-valued baseline on MSTAR (RadioML) with less than 1 percent (3 percent) of the baseline size △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: Submitted to CVPR PBVS workshop

arXiv:2003.13869 [pdf, other]

ManifoldNorm: Extending normalizations on Riemannian Manifolds

Authors: Rudrasis Chakraborty

Abstract: Many measurements in computer vision and machine learning manifest as non-Euclidean data samples. Several researchers recently extended a number of deep neural network architectures for manifold valued data samples. Researchers have proposed models for manifold valued spatial data which are common in medical image processing including processing of diffusion tensor imaging (DTI) where images are f… ▽ More Many measurements in computer vision and machine learning manifest as non-Euclidean data samples. Several researchers recently extended a number of deep neural network architectures for manifold valued data samples. Researchers have proposed models for manifold valued spatial data which are common in medical image processing including processing of diffusion tensor imaging (DTI) where images are fields of $3\times 3$ symmetric positive definite matrices or representation in terms of orientation distribution field (ODF) where the identification is in terms of field on hypersphere. There are other sequential models for manifold valued data that recently researchers have shown to be effective for group difference analysis in study for neuro-degenerative diseases. Although, several of these methods are effective to deal with manifold valued data, the bottleneck includes the instability in optimization for deeper networks. In order to deal with these instabilities, researchers have proposed residual connections for manifold valued data. One of the other remedies to deal with the instabilities including gradient explosion is to use normalization techniques including {\it batch norm} and {\it group norm} etc.. But, so far there is no normalization techniques applicable for manifold valued data. In this work, we propose a general normalization techniques for manifold valued data. We show that our proposed manifold normalization technique have special cases including popular batch norm and group norm techniques. On the experimental side, we focus on two types of manifold valued data including manifold of symmetric positive definite matrices and hypersphere. We show the performance gain in one synthetic experiment for moving MNIST dataset and one real brain image dataset where the representation is in terms of orientation distribution field (ODF). △ Less

Submitted 4 April, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

arXiv:2002.12788 [pdf, other]

Identification of Dementia Using Audio Biomarkers

Authors: Rupayan Chakraborty, Meghna Pandharipande, Chitralekha Bhat, Sunil Kumar Kopparapu

Abstract: Dementia is a syndrome, generally of a chronic nature characterized by a deterioration in cognitive function, especially in the geriatric population and is severe enough to impact their daily activities. Early diagnosis of dementia is essential to provide timely treatment to alleviate the effects and sometimes to slow the progression of dementia. Speech has been known to provide an indication of a… ▽ More Dementia is a syndrome, generally of a chronic nature characterized by a deterioration in cognitive function, especially in the geriatric population and is severe enough to impact their daily activities. Early diagnosis of dementia is essential to provide timely treatment to alleviate the effects and sometimes to slow the progression of dementia. Speech has been known to provide an indication of a person's cognitive state. The objective of this work is to use speech processing and machine learning techniques to automatically identify the stage of dementia such as mild cognitive impairment (MCI) or Alzheimers disease (AD). Non-linguistic acoustic parameters are used for this purpose, making this a language independent approach. We analyze the patients audio excerpts from a clinician-participant conversations taken from the Pitt corpus of DementiaBank database, to identify the speech parameters that best distinguish between MCI, AD and healthy (HC) speech. We analyze the contribution of various types of acoustic features such as spectral, temporal, cepstral their feature-level fusion and selection towards the identification of dementia stage. Additionally, we compare the performance of using feature-level fusion and score-level fusion. An accuracy of 82% is achieved using score-level fusion with an absolute improvement of 5% over feature-level fusion. △ Less

Submitted 27 February, 2020; originally announced February 2020.

Comments: 5 pages, 3 figures

arXiv:1912.11151 [pdf, other]

A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

Authors: Sri Harsha Dumpala, Imran Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu

Abstract: Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness… ▽ More Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness of an ASR system. The CycleGAN model is trained on non-parallel examples of perturbed and normal speech. Experiments on spontaneous laughter-speech and creaky-speech datasets show that the performance of four different ASR systems improve by using speech obtained from CycleGAN based front-end, as compared to directly using the original perturbed speech. Visualization of the features of the laughter perturbed speech and those generated by the proposed front-end further demonstrates the effectiveness of our approach. △ Less

Submitted 18 December, 2019; originally announced December 2019.

Comments: 7 pages, 3 figures, ICASSP-2019

arXiv:1911.12207 [pdf, other]

Orthogonal Convolutional Neural Networks

Authors: Jiayun Wang, Yubei Chen, Rudrasis Chakraborty, Stella X. Yu

Abstract: Deep convolutional neural networks are hindered by training instability and feature redundancy towards further performance improvement. A promising solution is to impose orthogonality on convolutional filters. We develop an efficient approach to impose filter orthogonality on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel instead of usi… ▽ More Deep convolutional neural networks are hindered by training instability and feature redundancy towards further performance improvement. A promising solution is to impose orthogonality on convolutional filters. We develop an efficient approach to impose filter orthogonality on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel instead of using the common kernel orthogonality approach, which we show is only necessary but not sufficient for ensuring orthogonal convolutions. Our proposed orthogonal convolution requires no additional parameters and little computational overhead. This method consistently outperforms the kernel orthogonality alternative on a wide range of tasks such as image classification and inpainting under supervised, semi-supervised and unsupervised settings. Further, it learns more diverse and expressive features with better training stability, robustness, and generalization. Our code is publicly available at https://github.com/samaonline/Orthogonal-Convolutional-Neural-Networks. △ Less

Submitted 8 April, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: To appear in CVPR 2020, project page: http://pwang.pw/ocnn.html

arXiv:1911.03443 [pdf, other]

An "augmentation-free" rotation invariant classification scheme on point-cloud and its application to neuroimaging

Authors: Liu Yang, Rudrasis Chakraborty

Abstract: Recent years have witnessed the emergence and increasing popularity of 3D medical imaging techniques with the development of 3D sensors and technology. However, achieving geometric invariance in the processing of 3D medical images is computationally expensive but nonetheless essential due to the presence of possible errors caused by rigid registration techniques. An alternative way to analyze medi… ▽ More Recent years have witnessed the emergence and increasing popularity of 3D medical imaging techniques with the development of 3D sensors and technology. However, achieving geometric invariance in the processing of 3D medical images is computationally expensive but nonetheless essential due to the presence of possible errors caused by rigid registration techniques. An alternative way to analyze medical imaging is by understanding the 3D shapes represented in terms of point-cloud. Though in the medical imaging community, 3D point-cloud processing is not a "go-to" choice, it is a canonical way to preserve rotation invariance. Unfortunately, due to the presence of discrete topology, one can not use the standard convolution operator on point-cloud. To the best of our knowledge, the existing ways to do "convolution" can not preserve the rotation invariance without explicit data augmentation. Therefore, we propose a rotation invariant convolution operator by inducing topology from hypersphere. Experimental validation has been performed on publicly available OASIS dataset in terms of classification accuracy between subjects with (without) dementia, demonstrating the usefulness of our proposed method in terms of model complexity, classification accuracy, and last but most important invariance to rotations. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1910.13050 and arXiv:1911.01705

arXiv:1911.01705 [pdf, other]

A GMM based algorithm to generate point-cloud and its application to neuroimaging

Authors: Liu Yang, Rudrasis Chakraborty

Abstract: Recent years have witnessed the emergence of 3D medical imaging techniques with the development of 3D sensors and technology. Due to the presence of noise in image acquisition, registration researchers focused on an alternative way to represent medical images. An alternative way to analyze medical imaging is by understanding the 3D shapes represented in terms of point-cloud. Though in the medical… ▽ More Recent years have witnessed the emergence of 3D medical imaging techniques with the development of 3D sensors and technology. Due to the presence of noise in image acquisition, registration researchers focused on an alternative way to represent medical images. An alternative way to analyze medical imaging is by understanding the 3D shapes represented in terms of point-cloud. Though in the medical imaging community, 3D point-cloud processing is not a ``go-to'' choice, it is a ``natural'' way to capture 3D shapes. However, as the number of samples for medical images are small, researchers have used pre-trained models to fine-tune on medical images. Furthermore, due to different modality in medical images, standard generative models can not be used to generate new samples of medical images. In this work, we use the advantage of point-cloud representation of 3D structures of medical images and propose a Gaussian mixture model-based generation scheme. Our proposed method is robust to outliers. Experimental validation has been performed to show that the proposed scheme can generate new 3D structures using interpolation techniques, i.e., given two 3D structures represented as point-clouds, we can generate point-clouds in between. We have also generated new point-clouds for subjects with and without dementia and show that the generated samples are indeed closely matched to the respective training samples from the same class. △ Less

Submitted 5 November, 2019; originally announced November 2019.

arXiv:1910.13050 [pdf, other]

POIRot: A rotation invariant omni-directional pointnet

Authors: Liu Yang, Rudrasis Chakraborty, Stella X. Yu

Abstract: Point-cloud is an efficient way to represent 3D world. Analysis of point-cloud deals with understanding the underlying 3D geometric structure. But due to the lack of smooth topology, and hence the lack of neighborhood structure, standard correlation can not be directly applied on point-cloud. One of the popular approaches to do point correlation is to partition the point-cloud into voxels and extr… ▽ More Point-cloud is an efficient way to represent 3D world. Analysis of point-cloud deals with understanding the underlying 3D geometric structure. But due to the lack of smooth topology, and hence the lack of neighborhood structure, standard correlation can not be directly applied on point-cloud. One of the popular approaches to do point correlation is to partition the point-cloud into voxels and extract features using standard 3D correlation. But this approach suffers from sparsity of point-cloud and hence results in multiple empty voxels. One possible solution to deal with this problem is to learn a MLP to map a point or its local neighborhood to a high dimensional feature space. All these methods suffer from a large number of parameters requirement and are susceptible to random rotations. A popular way to make the model "invariant" to rotations is to use data augmentation techniques with small rotations but the potential drawback includes \item more training samples \item susceptible to large rotations. In this work, we develop a rotation invariant point-cloud segmentation and classification scheme based on the omni-directional camera model (dubbed as {\bf POIRot$^1$}). Our proposed model is rotationally invariant and can preserve geometric shape of a 3D point-cloud. Because of the inherent rotation invariant property, our proposed framework requires fewer number of parameters (please see \cite{Iandola2017SqueezeNetAA} and the references therein for motivation of lean models). Several experiments have been performed to show that our proposed method can beat the state-of-the-art algorithms in classification and part segmentation applications. △ Less

Submitted 29 October, 2019; v1 submitted 28 October, 2019; originally announced October 2019.

arXiv:1910.11334 [pdf, ps, other]

SurReal: Complex-Valued Learning as Principled Transformations on a Scaling and Rotation Manifold

Authors: Rudrasis Chakraborty, Yifei Xing, Stella Yu

Abstract: Complex-valued data is ubiquitous in signal and image processing applications, and complex-valued representations in deep learning have appealing theoretical properties. While these aspects have long been recognized, complex-valued deep learning continues to lag far behind its real-valued counterpart. We propose a principled geometric approach to complex-valued deep learning. Complex-valued data… ▽ More Complex-valued data is ubiquitous in signal and image processing applications, and complex-valued representations in deep learning have appealing theoretical properties. While these aspects have long been recognized, complex-valued deep learning continues to lag far behind its real-valued counterpart. We propose a principled geometric approach to complex-valued deep learning. Complex-valued data could often be subject to arbitrary complex-valued scaling; as a result, real and imaginary components could co-vary. Instead of treating complex values as two independent channels of real values, we recognize their underlying geometry: We model the space of complex numbers as a product manifold of non-zero scaling and planar rotations. Arbitrary complex-valued scaling naturally becomes a group of transitive actions on this manifold. We propose to extend the property instead of the form of real-valued functions to the complex domain. We define convolution as weighted Fréchet mean on the manifold that is equivariant to the group of scaling/rotation actions, and define distance transform on the manifold that is invariant to the action group. The manifold perspective also allows us to define nonlinear activation functions such as tangent ReLU and G-transport, as well as residual connections on the manifold-valued data. We dub our model SurReal, as our experiments on MSTAR and RadioML deliver high performance with only a fractional size of real-valued and complex-valued baseline models. △ Less

Submitted 6 November, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: 12 pages, accepted to TNNLS journal

arXiv:1910.02206 [pdf, other]

Dilated Convolutional Neural Networks for Sequential Manifold-valued Data

Authors: Xingjian Zhen, Rudrasis Chakraborty, Nicholas Vogt, Barbara B. Bendlin, Vikas Singh

Abstract: Efforts are underway to study ways via which the power of deep neural networks can be extended to non-standard data types such as structured data (e.g., graphs) or manifold-valued data (e.g., unit vectors or special matrices). Often, sizable empirical improvements are possible when the geometry of such data spaces are incorporated into the design of the model, architecture, and the algorithms. Mot… ▽ More Efforts are underway to study ways via which the power of deep neural networks can be extended to non-standard data types such as structured data (e.g., graphs) or manifold-valued data (e.g., unit vectors or special matrices). Often, sizable empirical improvements are possible when the geometry of such data spaces are incorporated into the design of the model, architecture, and the algorithms. Motivated by neuroimaging applications, we study formulations where the data are {\em sequential manifold-valued measurements}. This case is common in brain imaging, where the samples correspond to symmetric positive definite matrices or orientation distribution functions. Instead of a recurrent model which poses computational/technical issues, and inspired by recent results showing the viability of dilated convolutional models for sequence prediction, we develop a dilated convolutional neural network architecture for this task. On the technical side, we show how the modules needed in our network can be derived while explicitly taking the Riemannian manifold structure into account. We show how the operations needed can leverage known results for calculating the weighted Fréchet Mean (wFM). Finally, we present scientific results for group difference analysis in Alzheimer's disease (AD) where the groups are derived using AD pathology load: here the model finds several brain fiber bundles that are related to AD even when the subjects are all still cognitively healthy. △ Less

Submitted 5 October, 2019; originally announced October 2019.

Journal ref: ICCV 2019

arXiv:1906.10887 [pdf, other]

doi 10.1109/TPAMI.2021.3070341

Spatial Transformer for 3D Point Clouds

Authors: Jiayun Wang, Rudrasis Chakraborty, Stella X. Yu

Abstract: Deep neural networks are widely used for understanding 3D point clouds. At each point convolution layer, features are computed from local neighborhoods of 3D points and combined for subsequent processing in order to extract semantic information. Existing methods adopt the same individual point neighborhoods throughout the network layers, defined by the same metric on the fixed input point coordina… ▽ More Deep neural networks are widely used for understanding 3D point clouds. At each point convolution layer, features are computed from local neighborhoods of 3D points and combined for subsequent processing in order to extract semantic information. Existing methods adopt the same individual point neighborhoods throughout the network layers, defined by the same metric on the fixed input point coordinates. This common practice is easy to implement but not necessarily optimal. Ideally, local neighborhoods should be different at different layers, as more latent information is extracted at deeper layers. We propose a novel end-to-end approach to learn different non-rigid transformations of the input point cloud so that optimal local neighborhoods can be adopted at each layer. We propose both linear (affine) and non-linear (projective and deformable) spatial transformers for 3D point clouds. With spatial transformers on the ShapeNet part segmentation dataset, the network achieves higher accuracy for all categories, with 8\% gain on earphones and rockets in particular. Our method also outperforms the state-of-the-art on other point cloud tasks such as classification, detection, and semantic segmentation. Visualizations show that spatial transformers can learn features more efficiently by dynamically altering local neighborhoods according to the geometry and semantics of 3D shapes in spite of their within-category variations. Our code is publicly available at https://github.com/samaonline/spatial-transformer-for-3d-point-clouds. △ Less

Submitted 29 March, 2021; v1 submitted 26 June, 2019; originally announced June 2019.

Comments: To appear in IEEE Transactions on PAMI, 2021

arXiv:1906.10048 [pdf, ps, other]

SurReal: Fréchet Mean and Distance Transform for Complex-Valued Deep Learning

Authors: Rudrasis Chakraborty, Jiayun Wang, Stella X. Yu

Abstract: We develop a novel deep learning architecture for naturally complex-valued data, which is often subject to complex scaling ambiguity. We treat each sample as a field in the space of complex numbers. With the polar form of a complex-valued number, the general group that acts in this space is the product of planar rotation and non-zero scaling. This perspective allows us to develop not only a novel… ▽ More We develop a novel deep learning architecture for naturally complex-valued data, which is often subject to complex scaling ambiguity. We treat each sample as a field in the space of complex numbers. With the polar form of a complex-valued number, the general group that acts in this space is the product of planar rotation and non-zero scaling. This perspective allows us to develop not only a novel convolution operator using weighted Fréchet mean (wFM) on a Riemannian manifold, but also a novel fully connected layer operator using the distance to the wFM, with natural equivariant properties to non-zero scaling and planar rotation for the former and invariance properties for the latter. Compared to the baseline approach of learning real-valued neural network models on the two-channel real-valued representation of complex-valued data, our method achieves surreal performance on two publicly available complex-valued datasets: MSTAR on SAR images and RadioML on radio frequency signals. On MSTAR, at 8% of the baseline model size and with fewer than 45,000 parameters, our model improves the target classification accuracy from 94% to 98% on this highly imbalanced dataset. On RadioML, our model achieves comparable RF modulation classification accuracy at 10% of the baseline model size. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: IEEE Computer Vision and Pattern Recognition Workshop on Perception Beyond the Visible Spectrum, Long Beach, California, 16 June 2019 Best Paper Award

arXiv:1901.09334 [pdf]

Predicting Tomorrow's Headline using Today's Twitter Deliberations

Authors: Roshni Chakraborty, Abhijeet Kharat, Apalak Khatua, Sourav Kumar Dandapat, Joydeep Chandra

Abstract: Predicting the popularity of news article is a challenging task. Existing literature mostly focused on article contents and polarity to predict popularity. However, existing research has not considered the users' preference towards a particular article. Understanding users' preference is an important aspect for predicting the popularity of news articles. Hence, we consider the social media data, f… ▽ More Predicting the popularity of news article is a challenging task. Existing literature mostly focused on article contents and polarity to predict popularity. However, existing research has not considered the users' preference towards a particular article. Understanding users' preference is an important aspect for predicting the popularity of news articles. Hence, we consider the social media data, from the Twitter platform, to address this research gap. In our proposed model, we have considered the users' involvement as well as the users' reaction towards an article to predict the popularity of the article. In short, we are predicting tomorrow's headline by probing today's Twitter discussion. We have considered 300 political news article from the New York Post, and our proposed approach has outperformed other baseline models. △ Less

Submitted 27 January, 2019; originally announced January 2019.

Comments: This paper was accepted in CIKM Workshop on News Recommendation and Analytics (INRA), 2018, Turin, Italy

arXiv:1809.06211 [pdf, other]

ManifoldNet: A Deep Network Framework for Manifold-valued Data

Authors: Rudrasis Chakraborty, Jose Bouza, Jonathan Manton, Baba C. Vemuri

Abstract: Deep neural networks have become the main work horse for many tasks involving learning from data in a variety of applications in Science and Engineering. Traditionally, the input to these networks lie in a vector space and the operations employed within the network are well defined on vector-spaces. In the recent past, due to technological advances in sensing, it has become possible to acquire man… ▽ More Deep neural networks have become the main work horse for many tasks involving learning from data in a variety of applications in Science and Engineering. Traditionally, the input to these networks lie in a vector space and the operations employed within the network are well defined on vector-spaces. In the recent past, due to technological advances in sensing, it has become possible to acquire manifold-valued data sets either directly or indirectly. Examples include but are not limited to data from omnidirectional cameras on automobiles, drones etc., synthetic aperture radar imaging, diffusion magnetic resonance imaging, elastography and conductance imaging in the Medical Imaging domain and others. Thus, there is need to generalize the deep neural networks to cope with input data that reside on curved manifolds where vector space operations are not naturally admissible. In this paper, we present a novel theoretical framework to generalize the widely popular convolutional neural networks (CNNs) to high dimensional manifold-valued data inputs. We call these networks, ManifoldNets. In ManifoldNets, convolution operation on data residing on Riemannian manifolds is achieved via a provably convergent recursive computation of the weighted Fréchet Mean (wFM) of the given data, where the weights makeup the convolution mask, to be learned. Further, we prove that the proposed wFM layer achieves a contraction mapping and hence ManifoldNet does not need the non-linear ReLU unit used in standard CNNs. We present experiments, using the ManifoldNet framework, to achieve dimensionality reduction by computing the principal linear subspaces that naturally reside on a Grassmannian. The experimental results demonstrate the efficacy of ManifoldNets in the context of classification and reconstruction accuracy. △ Less

Submitted 20 September, 2018; v1 submitted 10 September, 2018; originally announced September 2018.

Showing 1–50 of 65 results for author: Chakraborty, R