Search | arXiv e-print repository

HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction

Authors: Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, Dhagash Mehta

Abstract: Extraction and interpretation of intricate information from unstructured text data arising in financial applications, such as earnings call transcripts, present substantial challenges to large language models (LLMs) even using the current best practices to use Retrieval Augmented Generation (RAG) (referred to as VectorRAG techniques which utilize vector databases for information retrieval) due to… ▽ More Extraction and interpretation of intricate information from unstructured text data arising in financial applications, such as earnings call transcripts, present substantial challenges to large language models (LLMs) even using the current best practices to use Retrieval Augmented Generation (RAG) (referred to as VectorRAG techniques which utilize vector databases for information retrieval) due to challenges such as domain specific terminology and complex formats of the documents. We introduce a novel approach based on a combination, called HybridRAG, of the Knowledge Graphs (KGs) based RAG techniques (called GraphRAG) and VectorRAG techniques to enhance question-answer (Q&A) systems for information extraction from financial documents that is shown to be capable of generating accurate and contextually relevant answers. Using experiments on a set of financial earning call transcripts documents which come in the form of Q&A format, and hence provide a natural set of pairs of ground-truth Q&As, we show that HybridRAG which retrieves context from both vector database and KG outperforms both traditional VectorRAG and GraphRAG individually when evaluated at both the retrieval and generation stages in terms of retrieval accuracy and answer generation. The proposed technique has applications beyond the financial domain △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 9 pages, 2 figures, 5 tables

arXiv:2407.11149 [pdf]

BMR and BWR: Two simple metaphor-free optimization algorithms for solving real-life non-convex constrained and unconstrained problems

Authors: Ravipudi Venkata Rao, Ravikumar shah

Abstract: This paper presents two simple yet powerful optimization algorithms named Best-Mean-Random (BMR) and Best-Worst-Randam (BWR) algorithms to handle both constrained and unconstrained optimization problems. These algorithms are free of metaphors and algorithm-specific parameters. The BMR algorithm is based on the best, mean, and random solutions of the population generated for solving a given problem… ▽ More This paper presents two simple yet powerful optimization algorithms named Best-Mean-Random (BMR) and Best-Worst-Randam (BWR) algorithms to handle both constrained and unconstrained optimization problems. These algorithms are free of metaphors and algorithm-specific parameters. The BMR algorithm is based on the best, mean, and random solutions of the population generated for solving a given problem; and the BWR algorithm is based on the best, worst, and random solutions. The performances of the proposed two algorithms are investigated by implementing them on 26 real-life non-convex constrained optimization problems given in the Congress on Evolutionary Computation (CEC) 2020 competition and comparisons are made with those of the other prominent optimization algorithms. Furthermore, computational experiments are conducted on 30 unconstrained standard benchmark optimization problems including 5 recently developed benchmark problems having distinct characteristics. The results proved the better competitiveness and superiority of the proposed simple algorithms. The optimization research community may gain an advantage by adapting these algorithms to solve various constrained and unconstrained real-life optimization problems across various scientific and engineering disciplines. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 28 pages, 5 figures, original paper

ACM Class: C.1.3; I.2.6; I.5

arXiv:2407.04208 [pdf, other]

AMD: Automatic Multi-step Distillation of Large-scale Vision Models

Authors: Cheng Han, Qifan Wang, Sohail A. Dianat, Majid Rabbani, Raghuveer M. Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu

Abstract: Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real applications, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy… ▽ More Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real applications, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy when confronted with a large capacity gap between the teacher and the student, e.g, 10x compression rate. In this paper, we present a novel approach named Automatic Multi-step Distillation (AMD) for large-scale vision model compression. In particular, our distillation process unfolds across multiple steps. Initially, the teacher undergoes distillation to form an intermediate teacher-assistant model, which is subsequently distilled further to the student. An efficient and effective optimization framework is introduced to automatically identify the optimal teacher-assistant that leads to the maximal student performance. We conduct extensive experiments on multiple image classification datasets, including CIFAR-10, CIFAR-100, and ImageNet. The findings consistently reveal that our approach outperforms several established baselines, paving a path for future knowledge distillation methods on large-scale vision models. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 19 pages, 5 figures

arXiv:2406.17576 [pdf, other]

Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations

Authors: Cheng Wang, Christopher Redino, Ryan Clark, Abdul Rahman, Sal Aguinaga, Sathvik Murli, Dhruv Nandakumar, Roland Rao, Lanxiao Huang, Daniel Radke, Edward Bowen

Abstract: Ransomware presents a significant and increasing threat to individuals and organizations by encrypting their systems and not releasing them until a large fee has been extracted. To bolster preparedness against potential attacks, organizations commonly conduct red teaming exercises, which involve simulated attacks to assess existing security measures. This paper proposes a novel approach utilizing… ▽ More Ransomware presents a significant and increasing threat to individuals and organizations by encrypting their systems and not releasing them until a large fee has been extracted. To bolster preparedness against potential attacks, organizations commonly conduct red teaming exercises, which involve simulated attacks to assess existing security measures. This paper proposes a novel approach utilizing reinforcement learning (RL) to simulate ransomware attacks. By training an RL agent in a simulated environment mirroring real-world networks, effective attack strategies can be learned quickly, significantly streamlining traditional, manual penetration testing processes. The attack pathways revealed by the RL agent can provide valuable insights to the defense team, helping them identify network weak points and develop more resilient defensive measures. Experimental results on a 152-host example network confirm the effectiveness of the proposed approach, demonstrating the RL agent's capability to discover and orchestrate attacks on high-value targets while evading honeyfiles (decoy files strategically placed to detect unauthorized access). △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.01559 [pdf, other]

Prototypical Transformer as Unified Motion Learners

Authors: Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu

Abstract: In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature moti… ▽ More In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature motion patterns, providing transparency in understanding motion scenes. Second, Latent Synchronization guides feature representation learning via prototypes, effectively mitigating the problem of motion uncertainty. Empirical results demonstrate that our approach achieves competitive performance on popular motion tasks such as optical flow and scene depth. Furthermore, it exhibits generality across various downstream tasks, including object tracking and video stabilization. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 21 pages, 10 figures

arXiv:2405.18322 [pdf, other]

SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation

Authors: Kejia Yin, Varshanth R. Rao, Ruowei Jiang, Xudong Liu, Parham Aarabi, David B. Lindell

Abstract: Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations to identify sparse facial landmarks in the absence of annotated data. To tackle this task, existing state-of-the-art (SOTA) methods (1) extract coarse features from backbones that are trained with instance-level self-supervised learning (SSL) paradigms, which neglect the… ▽ More Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations to identify sparse facial landmarks in the absence of annotated data. To tackle this task, existing state-of-the-art (SOTA) methods (1) extract coarse features from backbones that are trained with instance-level self-supervised learning (SSL) paradigms, which neglect the dense prediction nature of the task, (2) aggregate them into memory-intensive hypercolumn formations, and (3) supervise lightweight projector networks to naively establish full local correspondences among all pairs of spatial features. In this paper, we introduce SCE-MAE, a framework that (1) leverages the MAE, a region-level SSL method that naturally better suits the landmark prediction task, (2) operates on the vanilla feature map instead of on expensive hypercolumns, and (3) employs a Correspondence Approximation and Refinement Block (CARB) that utilizes a simple density peak clustering algorithm and our proposed Locality-Constrained Repellence Loss to directly hone only select local correspondences. We demonstrate through extensive experiments that SCE-MAE is highly effective and robust, outperforming existing SOTA methods by large margins of approximately 20%-44% on the landmark matching and approximately 9%-15% on the landmark detection tasks. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted at CVPR 2024

arXiv:2405.08019 [pdf, other]

AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting

Authors: Shreyan Ganguly, Roshan Nayak, Rakshith Rao, Ujan Deb, Prathosh AP

Abstract: Knowledge distillation, a widely used model compression technique, works on the basis of transferring knowledge from a cumbersome teacher model to a lightweight student model. The technique involves jointly optimizing the task specific and knowledge distillation losses with a weight assigned to them. Despite these weights playing a crucial role in the performance of the distillation process, curre… ▽ More Knowledge distillation, a widely used model compression technique, works on the basis of transferring knowledge from a cumbersome teacher model to a lightweight student model. The technique involves jointly optimizing the task specific and knowledge distillation losses with a weight assigned to them. Despite these weights playing a crucial role in the performance of the distillation process, current methods provide equal weight to both losses, leading to suboptimal performance. In this paper, we propose Adaptive Knowledge Distillation, a novel technique inspired by curriculum learning to adaptively weigh the losses at instance level. This technique goes by the notion that sample difficulty increases with teacher loss. Our method follows a plug-and-play paradigm that can be applied on top of any task-specific and distillation objectives. Experiments show that our method performs better than conventional knowledge distillation method and existing instance-level loss functions. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2404.17212 [pdf]

Scrutinizing Data from Sky: An Examination of Its Veracity in Area Based Traffic Contexts

Authors: Yawar Ali, Krishnan K N, Debashis Ray Sarkar, K. Ramachandra Rao, Niladri Chatterjee, Ashish Bhaskar

Abstract: Traffic data collection has been an overwhelming task for researchers as well as authorities over the years. With the advancement in technology and introduction of various tools for processing and extracting traffic data the task has been made significantly convenient. Data from Sky (DFS) is one such tool, based on image processing and artificial intelligence (AI), that provides output for macrosc… ▽ More Traffic data collection has been an overwhelming task for researchers as well as authorities over the years. With the advancement in technology and introduction of various tools for processing and extracting traffic data the task has been made significantly convenient. Data from Sky (DFS) is one such tool, based on image processing and artificial intelligence (AI), that provides output for macroscopic as well as microscopic variables of the traffic streams. The company claims to provide 98 to 100 percent accuracy on the data exported using DFS tool. The tool is widely used in developed countries where the traffic is homogenous and has lane-based movements. In this study, authors have checked the veracity of DFS tool in heterogenous and area-based traffic movement that is prevailing in most developing countries. The validation is done using various methods using Classified Volume Count (CVC), Space Mean Speeds (SMS) of individual vehicle classes and microscopic trajectory of probe vehicle to verify DFS claim. The error for CVCs for each vehicle class present in the traffic stream is estimated. Mean Absolute Percentage Error (MAPE) values are calculated for average speeds of each vehicle class between manually and DFS extracted space mean speeds (SMSs), and the microscopic trajectories are validated using a GPS based tracker put on probe vehicles. The results are fairly accurate in the case of data taken from a bird eye view with least errors. The other configurations of data collection have some significant errors, that are majorly caused by the varied traffic composition, the view of camera angle, and the direction of traffic. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.10274 [pdf]

Sparse Attention Regression Network Based Soil Fertility Prediction With Ummaso

Authors: R V Raghavendra Rao, U Srinivasulu Reddy

Abstract: The challenge of imbalanced soil nutrient datasets significantly hampers accurate predictions of soil fertility. To tackle this, a new method is suggested in this research, combining Uniform Manifold Approximation and Projection (UMAP) with Least Absolute Shrinkage and Selection Operator (LASSO). The main aim is to counter the impact of uneven data distribution and improve soil fertility models' p… ▽ More The challenge of imbalanced soil nutrient datasets significantly hampers accurate predictions of soil fertility. To tackle this, a new method is suggested in this research, combining Uniform Manifold Approximation and Projection (UMAP) with Least Absolute Shrinkage and Selection Operator (LASSO). The main aim is to counter the impact of uneven data distribution and improve soil fertility models' predictive precision. The model introduced uses Sparse Attention Regression, effectively incorporating pertinent features from the imbalanced dataset. UMAP is utilized initially to reduce data complexity, unveiling hidden structures and important patterns. Following this, LASSO is applied to refine features and enhance the model's interpretability. The experimental outcomes highlight the effectiveness of the UMAP and LASSO hybrid approach. The proposed model achieves outstanding performance metrics, reaching a predictive accuracy of 98%, demonstrating its capability in accurate soil fertility predictions. Additionally, it showcases a Precision of 91.25%, indicating its adeptness in identifying fertile soil instances accurately. The Recall metric stands at 90.90%, emphasizing the model's ability to capture true positive cases effectively. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.17998 [pdf, other]

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

Authors: Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao

Abstract: The increasing prevalence of video clips has sparked growing interest in text-video retrieval. Recent advances focus on establishing a joint embedding space for text and video, relying on consistent embedding representations to compute similarity. However, the text content in existing datasets is generally short and concise, making it hard to fully describe the redundant semantics of a video. Corr… ▽ More The increasing prevalence of video clips has sparked growing interest in text-video retrieval. Recent advances focus on establishing a joint embedding space for text and video, relying on consistent embedding representations to compute similarity. However, the text content in existing datasets is generally short and concise, making it hard to fully describe the redundant semantics of a video. Correspondingly, a single text embedding may be less expressive to capture the video embedding and empower the retrieval. In this study, we propose a new stochastic text modeling method T-MASS, i.e., text is modeled as a stochastic embedding, to enrich text embedding with a flexible and resilient semantic range, yielding a text mass. To be specific, we introduce a similarity-aware radius module to adapt the scale of the text mass upon the given text-video pairs. Plus, we design and develop a support text regularization to further control the text mass during the training. The inference pipeline is also tailored to fully exploit the text mass for accurate retrieval. Empirical evidence suggests that T-MASS not only effectively attracts relevant text-video pairs while distancing irrelevant ones, but also enables the determination of precise text embeddings for relevant pairs. Our experimental results show a substantial improvement of T-MASS over baseline (3% to 6.3% by R@1). Also, T-MASS achieves state-of-the-art performance on five benchmark datasets, including MSRVTT, LSMDC, DiDeMo, VATEX, and Charades. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024, code and model are available at https://github.com/Jiamian-Wang/T-MASS-text-video-retrieval

arXiv:2403.00975 [pdf, other]

Equipment Health Assessment: Time Series Analysis for Wind Turbine Performance

Authors: Jana Backhus, Aniruddha Rajendra Rao, Chandrasekar Venkatraman, Abhishek Padmanabhan, A. Vinoth Kumar, Chetan Gupta

Abstract: In this study, we leverage SCADA data from diverse wind turbines to predict power output, employing advanced time series methods, specifically Functional Neural Networks (FNN) and Long Short-Term Memory (LSTM) networks. A key innovation lies in the ensemble of FNN and LSTM models, capitalizing on their collective learning. This ensemble approach outperforms individual models, ensuring stable and a… ▽ More In this study, we leverage SCADA data from diverse wind turbines to predict power output, employing advanced time series methods, specifically Functional Neural Networks (FNN) and Long Short-Term Memory (LSTM) networks. A key innovation lies in the ensemble of FNN and LSTM models, capitalizing on their collective learning. This ensemble approach outperforms individual models, ensuring stable and accurate power output predictions. Additionally, machine learning techniques are applied to detect wind turbine performance deterioration, enabling proactive maintenance strategies and health assessment. Crucially, our analysis reveals the uniqueness of each wind turbine, necessitating tailored models for optimal predictions. These insight underscores the importance of providing automatized customization for different turbines to keep human modeling effort low. Importantly, the methodologies developed in this analysis are not limited to wind turbines; they can be extended to predict and optimize performance in various machinery, highlighting the versatility and applicability of our research across diverse industrial contexts. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 19 Pages, 17 Figures, 3 Tables, Submitted at Applied Sciences (MDPI)

arXiv:2401.14498 [pdf, other]

Predictive Analysis for Optimizing Port Operations

Authors: Aniruddha Rajendra Rao, Haiyan Wang, Chetan Gupta

Abstract: Maritime transport is a pivotal logistics mode for the long-distance and bulk transportation of goods. However, the intricate planning involved in this mode is often hindered by uncertainties, including weather conditions, cargo diversity, and port dynamics, leading to increased costs. Consequently, accurately estimating vessel total (stay) time at port and potential delays becomes imperative for… ▽ More Maritime transport is a pivotal logistics mode for the long-distance and bulk transportation of goods. However, the intricate planning involved in this mode is often hindered by uncertainties, including weather conditions, cargo diversity, and port dynamics, leading to increased costs. Consequently, accurately estimating vessel total (stay) time at port and potential delays becomes imperative for effective planning and scheduling in port operations. This study aims to develop a port operation solution with competitive prediction and classification capabilities for estimating vessel Total and Delay times. This research addresses a significant gap in port analysis models for vessel Stay and Delay times, offering a valuable contribution to the field of maritime logistics. The proposed solution is designed to assist decision-making in port environments and predict service delays. This is demonstrated through a case study on Brazil ports. Additionally, feature analysis is used to understand the key factors impacting maritime logistics, enhancing the overall understanding of the complexities involved in port operations. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 13 pages, 9 figures, 4 Tables. Submitted at IEEE IJCNN 2024

arXiv:2401.12340 [pdf, other]

doi 10.1109/TAES.2023.3337768

Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target Annotation

Authors: Shoaib Meraj Sami, Md Mahedi Hasan, Nasser M. Nasrabadi, Raghuveer Rao

Abstract: Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled information of the source domain images. The transductive transfer learning (TTL) method that incorporates a CycleGAN-based unpaired domain translation n… ▽ More Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled information of the source domain images. The transductive transfer learning (TTL) method that incorporates a CycleGAN-based unpaired domain translation network has been previously proposed in the literature for effective ATR annotation. Although this method demonstrates great potential for ATR, it severely suffers from lower annotation performance, higher Fréchet Inception Distance (FID) score, and the presence of visual artifacts in the synthetic images. To address these issues, we propose a hybrid contrastive learning base unpaired domain translation (H-CUT) network that achieves a significantly lower FID score. It incorporates both attention and entropy to emphasize the domain-specific region, a noisy feature mixup module to generate high variational synthetic negative patches, and a modulated noise contrastive estimation (MoNCE) loss to reweight all negative patches using optimal transport for better performance. Our proposed contrastive learning and cycle-consistency-based TTL (C3TTL) framework consists of two H-CUT networks and two classifiers. It simultaneously optimizes cycle-consistency, MoNCE, and identity losses. In C3TTL, two H-CUT networks have been employed through a bijection mapping to feed the reconstructed source domain images into a pretrained classifier to guide the optimal target domain classifier. Extensive experimental analysis conducted on three ATR datasets demonstrates that the proposed C3TTL method is effective in annotating civilian and military vehicles, as well as ship targets. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: This Paper is Accepted in IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS. This Arxiv version is an older version than the reviewed version

arXiv:2401.09742 [pdf, other]

Image Translation as Diffusion Visual Programmers

Authors: Cheng Han, James C. Liang, Qifan Wang, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Ying Nian Wu, Dongfang Liu

Abstract: We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion model within the GPT architecture, orchestrating a coherent sequence of visual programs (i.e., computer vision models) for various pro-symbolic steps, which span RoI identification, style transfer, and position manipulation, facil… ▽ More We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion model within the GPT architecture, orchestrating a coherent sequence of visual programs (i.e., computer vision models) for various pro-symbolic steps, which span RoI identification, style transfer, and position manipulation, facilitating transparent and controllable image translation processes. Extensive experiments demonstrate DVP's remarkable performance, surpassing concurrent arts. This success can be attributed to several key features of DVP: First, DVP achieves condition-flexible translation via instance normalization, enabling the model to eliminate sensitivity caused by the manual guidance and optimally focus on textual descriptions for high-quality content generation. Second, the framework enhances in-context reasoning by deciphering intricate high-dimensional concepts in feature spaces into more accessible low-dimensional symbols (e.g., [Prompt], [RoI object]), allowing for localized, context-free editing while maintaining overall coherence. Last but not least, DVP improves systemic controllability and explainability by offering explicit symbolic representations at each programming stage, empowering users to intuitively interpret and modify results. Our research marks a substantial step towards harmonizing artificial image translation processes with cognitive intelligence, promising broader applications. △ Less

Submitted 30 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 25 pages, 20 figures

arXiv:2312.17479 [pdf, other]

Culturally-Attuned Moral Machines: Implicit Learning of Human Value Systems by AI through Inverse Reinforcement Learning

Authors: Nigini Oliveira, Jasmine Li, Koosha Khalvati, Rodolfo Cortes Barragan, Katharina Reinecke, Andrew N. Meltzoff, Rajesh P. N. Rao

Abstract: Constructing a universal moral code for artificial intelligence (AI) is difficult or even impossible, given that different human cultures have different definitions of morality and different societal norms. We therefore argue that the value system of an AI should be culturally attuned: just as a child raised in a particular culture learns the specific values and norms of that culture, we propose t… ▽ More Constructing a universal moral code for artificial intelligence (AI) is difficult or even impossible, given that different human cultures have different definitions of morality and different societal norms. We therefore argue that the value system of an AI should be culturally attuned: just as a child raised in a particular culture learns the specific values and norms of that culture, we propose that an AI agent operating in a particular human community should acquire that community's moral, ethical, and cultural codes. How AI systems might acquire such codes from human observation and interaction has remained an open question. Here, we propose using inverse reinforcement learning (IRL) as a method for AI agents to acquire a culturally-attuned value system implicitly. We test our approach using an experimental paradigm in which AI agents use IRL to learn different reward functions, which govern the agents' moral values, by observing the behavior of different cultural groups in an online virtual world requiring real-time decision making. We show that an AI agent learning from the average behavior of a particular cultural group can acquire altruistic characteristics reflective of that group's behavior, and this learned value system can generalize to new scenarios requiring altruistic judgments. Our results provide, to our knowledge, the first demonstration that AI agents could potentially be endowed with the ability to continually learn their values and norms from observing and interacting with humans, thereby becoming attuned to the culture they are operating in. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2311.02535

TokenMotion: Motion-Guided Vision Transformer for Video Camouflaged Object Detection Via Learnable Token Selection

Authors: Zifan Yu, Erfan Bank Tavakoli, Meida Chen, Suya You, Raghuveer Rao, Sanjeev Agarwal, Fengbo Ren

Abstract: The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects and camera movement. In this paper, we introduce TokenMotion (TMNet), which employs a transformer-based model to enhance VCOD by extracting motion-guide… ▽ More The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects and camera movement. In this paper, we introduce TokenMotion (TMNet), which employs a transformer-based model to enhance VCOD by extracting motion-guided features using a learnable token selection. Evaluated on the challenging MoCA-Mask dataset, TMNet achieves state-of-the-art performance in VCOD. It outperforms the existing state-of-the-art method by a 12.8% improvement in weighted F-measure, an 8.4% enhancement in S-measure, and a 10.7% boost in mean IoU. The results demonstrate the benefits of utilizing motion-guided features via learnable token selection within a transformer-based framework to tackle the intricate task of VCOD. △ Less

Submitted 1 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

Comments: Revising Needed

arXiv:2308.11809 [pdf, other]

Expressive probabilistic sampling in recurrent neural networks

Authors: Shirui Chen, Linxing Preston Jiang, Rajesh P. N. Rao, Eric Shea-Brown

Abstract: In sampling-based Bayesian models of brain function, neural activities are assumed to be samples from probability distributions that the brain uses for probabilistic computation. However, a comprehensive understanding of how mechanistic models of neural dynamics can sample from arbitrary distributions is still lacking. We use tools from functional analysis and stochastic differential equations to… ▽ More In sampling-based Bayesian models of brain function, neural activities are assumed to be samples from probability distributions that the brain uses for probabilistic computation. However, a comprehensive understanding of how mechanistic models of neural dynamics can sample from arbitrary distributions is still lacking. We use tools from functional analysis and stochastic differential equations to explore the minimum architectural requirements for $\textit{recurrent}$ neural circuits to sample from complex distributions. We first consider the traditional sampling model consisting of a network of neurons whose outputs directly represent the samples (sampler-only network). We argue that synaptic current and firing-rate dynamics in the traditional model have limited capacity to sample from a complex probability distribution. We show that the firing rate dynamics of a recurrent neural circuit with a separate set of output units can sample from an arbitrary probability distribution. We call such circuits reservoir-sampler networks (RSNs). We propose an efficient training procedure based on denoising score matching that finds recurrent and output weights such that the RSN implements Langevin sampling. We empirically demonstrate our model's ability to sample from several complex data distributions using the proposed neural dynamics and discuss its applicability to developing the next generation of sampling-based brain models. △ Less

Submitted 14 November, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.08803 [pdf]

doi 10.32985/ijeces.14.4.6

An Effective Deep Learning Based Multi-Class Classification of DoS and DDoS Attack Detection

Authors: Arun Kumar Silivery, Kovvur Ram Mohan Rao, L K Suresh Kumar

Abstract: In the past few years, cybersecurity is becoming very important due to the rise in internet users. The internet attacks such as Denial of service (DoS) and Distributed Denial of Service (DDoS) attacks severely harm a website or server and make them unavailable to other users. Network Monitoring and control systems have found it challenging to identify the many classes of DoS and DDoS attacks since… ▽ More In the past few years, cybersecurity is becoming very important due to the rise in internet users. The internet attacks such as Denial of service (DoS) and Distributed Denial of Service (DDoS) attacks severely harm a website or server and make them unavailable to other users. Network Monitoring and control systems have found it challenging to identify the many classes of DoS and DDoS attacks since each operates uniquely. Hence a powerful technique is required for attack detection. Traditional machine learning techniques are inefficient in handling extensive network data and cannot extract high-level features for attack detection. Therefore, an effective deep learning-based intrusion detection system is developed in this paper for DoS and DDoS attack classification. This model includes various phases and starts with the Deep Convolutional Generative Adversarial Networks (DCGAN) based technique to address the class imbalance issue in the dataset. Then a deep learning algorithm based on ResNet-50 extracts the critical features for each class in the dataset. After that, an optimized AlexNet-based classifier is implemented for detecting the attacks separately, and the essential parameters of the classifier are optimized using the Atom search optimization algorithm. The proposed approach was evaluated on benchmark datasets, CCIDS2019 and UNSW-NB15, using key classification metrics and achieved 99.37% accuracy for the UNSW-NB15 dataset and 99.33% for the CICIDS2019 dataset. The investigational results demonstrate that the suggested approach performs superior to other competitive techniques in identifying DoS and DDoS attacks. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.07870 [pdf, other]

Brain-Inspired Computational Intelligence via Predictive Coding

Authors: Tommaso Salvatori, Ankur Mali, Christopher L. Buckley, Thomas Lukasiewicz, Rajesh P. N. Rao, Karl Friston, Alexander Ororbia

Abstract: Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying unc… ▽ More Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying uncertainty, lack of robustness, unreliability, and biological implausibility. It is possible that addressing these limitations may require schemes that are inspired and guided by neuroscience theories. One such theory, called predictive coding (PC), has shown promising performance in machine intelligence tasks, exhibiting exciting properties that make it potentially valuable for the machine learning community: PC can model information processing in different brain areas, can be used in cognitive control and robotics, and has a solid mathematical grounding in variational inference, offering a powerful inversion scheme for a specific class of continuous-state generative models. With the hope of foregrounding research in this direction, we survey the literature that has contributed to this perspective, highlighting the many ways that PC might play a role in the future of machine learning and computational intelligence at large. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 37 Pages, 9 Figures

arXiv:2306.17316 [pdf, other]

Triangle Fees

Authors: Rithvik Rao, Nihar Shah

Abstract: Triangle fees are a novel fee structure for AMMs, in which marginal fees are decreasing in a trade's size. That decline is proportional to the movement in the AMM's implied price, i.e. for every basis point the trade moves the ratio of assets, the marginal fee declines by a basis point. These fees create incentives that protect against price staleness, while still allowing the AMM to earn meaningf… ▽ More Triangle fees are a novel fee structure for AMMs, in which marginal fees are decreasing in a trade's size. That decline is proportional to the movement in the AMM's implied price, i.e. for every basis point the trade moves the ratio of assets, the marginal fee declines by a basis point. These fees create incentives that protect against price staleness, while still allowing the AMM to earn meaningful fee revenue. Triangle fees can strictly improve the Pareto frontier of price accuracy versus losses generated by the status quo of constant fee mechanisms. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: 7 pages

arXiv:2306.09239 [pdf, ps, other]

Exploiting the Brain's Network Structure for Automatic Identification of ADHD Subjects

Authors: Soumyabrata Dey, Ravishankar Rao, Mubarak Shah

Abstract: Attention Deficit Hyperactive Disorder (ADHD) is a common behavioral problem affecting children. In this work, we investigate the automatic classification of ADHD subjects using the resting state Functional Magnetic Resonance Imaging (fMRI) sequences of the brain. We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from cont… ▽ More Attention Deficit Hyperactive Disorder (ADHD) is a common behavioral problem affecting children. In this work, we investigate the automatic classification of ADHD subjects using the resting state Functional Magnetic Resonance Imaging (fMRI) sequences of the brain. We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from control subjects. We compute the pairwise correlation of brain voxels' activity over the time frame of the experimental protocol which helps to model the function of a brain as a network. Different network features are computed for each of the voxels constructing the network. The concatenation of the network features of all the voxels in a brain serves as the feature vector. Feature vectors from a set of subjects are then used to train a PCA-LDA (principal component analysis-linear discriminant analysis) based classifier. We hypothesized that ADHD-related differences lie in some specific regions of the brain and using features only from those regions is sufficient to discriminate ADHD and control subjects. We propose a method to create a brain mask that includes the useful regions only and demonstrate that using the feature from the masked regions improves classification accuracy on the test data set. We train our classifier with 776 subjects and test on 171 subjects provided by The Neuro Bureau for the ADHD-200 challenge. We demonstrate the utility of graph-motif features, specifically the maps that represent the frequency of participation of voxels in network cycles of length 3. The best classification performance (69.59%) is achieved using 3-cycle map features with masking. Our proposed approach holds promise in being able to diagnose and understand the disorder. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.13886 [pdf, other]

Deep Transductive Transfer Learning for Automatic Target Recognition

Authors: Shoaib M. Sami, Nasser M. Nasrabadi, Raghuveer Rao

Abstract: One of the major obstacles in designing an automatic target recognition (ATR) algorithm, is that there are often labeled images in one domain (i.e., infrared source domain) but no annotated images in the other target domains (i.e., visible, SAR, LIDAR). Therefore, automatically annotating these images is essential to build a robust classifier in the target domain based on the labeled images of the… ▽ More One of the major obstacles in designing an automatic target recognition (ATR) algorithm, is that there are often labeled images in one domain (i.e., infrared source domain) but no annotated images in the other target domains (i.e., visible, SAR, LIDAR). Therefore, automatically annotating these images is essential to build a robust classifier in the target domain based on the labeled images of the source domain. Transductive transfer learning is an effective way to adapt a network to a new target domain by utilizing a pretrained ATR network in the source domain. We propose an unpaired transductive transfer learning framework where a CycleGAN model and a well-trained ATR classifier in the source domain are used to construct an ATR classifier in the target domain without having any labeled data in the target domain. We employ a CycleGAN model to transfer the mid-wave infrared (MWIR) images to visible (VIS) domain images (or visible to MWIR domain). To train the transductive CycleGAN, we optimize a cost function consisting of the adversarial, identity, cycle-consistency, and categorical cross-entropy loss for both the source and target classifiers. In this paper, we perform a detailed experimental analysis on the challenging DSIAC ATR dataset. The dataset consists of ten classes of vehicles at different poses and distances ranging from 1-5 kilometers on both the MWIR and VIS domains. In our experiment, we assume that the images in the VIS domain are the unlabeled target dataset. We first detect and crop the vehicles from the raw images and then project them into a common distance of 2 kilometers. Our proposed transductive CycleGAN achieves 71.56% accuracy in classifying the visible domain vehicles in the DSIAC ATR dataset. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 10 pages, 5 figures

Journal ref: SPIE Defense & Commercial Sensing 2023, Conference 12521, Automatic target recognition XXXIII, Orlando, Florida

arXiv:2305.05532 [pdf, other]

doi 10.1109/ICPHM57936.2023.10194112

An ensemble of convolution-based methods for fault detection using vibration signals

Authors: Xian Yeow Lee, Aman Kumar, Lasitha Vidyaratne, Aniruddha Rajendra Rao, Ahmed Farahat, Chetan Gupta

Abstract: This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent st… ▽ More This paper focuses on solving a fault detection problem using multivariate time series of vibration signals collected from planetary gearboxes in a test rig. Various traditional machine learning and deep learning methods have been proposed for multivariate time-series classification, including distance-based, functional data-oriented, feature-driven, and convolution kernel-based methods. Recent studies have shown using convolution kernel-based methods like ROCKET, and 1D convolutional neural networks with ResNet and FCN, have robust performance for multivariate time-series data classification. We propose an ensemble of three convolution kernel-based methods and show its efficacy on this fault detection problem by outperforming other approaches and achieving an accuracy of more than 98.8\%. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: 12 Pages, 9 Figures, 2 Tables. Accepted at ICPHM 2023

Journal ref: 2023 IEEE International Conference on Prognostics and Health Management (ICPHM)

arXiv:2304.02208 [pdf]

doi 10.1007/s42979-021-00871-7

PIKS: A Technique to Identify Actionable Trends for Policy-Makers Through Open Healthcare Data

Authors: A. Ravishankar Rao, Subrata Garai, Soumyabrata Dey, Hang Peng

Abstract: With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly a… ▽ More With calls for increasing transparency, governments are releasing greater amounts of data in multiple domains including finance, education and healthcare. The efficient exploratory analysis of healthcare data constitutes a significant challenge. Key concerns in public health include the quick identification and analysis of trends, and the detection of outliers. This allows policies to be rapidly adapted to changing circumstances. We present an efficient outlier detection technique, termed PIKS (Pruned iterative-k means searchlight), which combines an iterative k-means algorithm with a pruned searchlight based scan. We apply this technique to identify outliers in two publicly available healthcare datasets from the New York Statewide Planning and Research Cooperative System, and California's Office of Statewide Health Planning and Development. We provide a comparison of our technique with three other existing outlier detection techniques, consisting of auto-encoders, isolation forests and feature bagging. We identified outliers in conditions including suicide rates, immunity disorders, social admissions, cardiomyopathies, and pregnancy in the third trimester. We demonstrate that the PIKS technique produces results consistent with other techniques such as the auto-encoder. However, the auto-encoder needs to be trained, which requires several parameters to be tuned. In comparison, the PIKS technique has far fewer parameters to tune. This makes it advantageous for fast, "out-of-the-box" data exploration. The PIKS technique is scalable and can readily ingest new datasets. Hence, it can provide valuable, up-to-date insights to citizens, patients and policy-makers. We have made our code open source, and with the availability of open data, other researchers can easily reproduce and extend our work. This will help promote a deeper understanding of healthcare policies and public health issues. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Journal ref: SN COMPUT. SCI. 2, 477 (2021)

arXiv:2304.02191 [pdf]

doi 10.1109/ICHI48887.2020.9374348

Building predictive models of healthcare costs with open healthcare data

Authors: A. Ravishankar Rao, Subrata Garai, Soumyabrata Dey, Hang Peng

Abstract: Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. W… ▽ More Due to rapidly rising healthcare costs worldwide, there is significant interest in controlling them. An important aspect concerns price transparency, as preliminary efforts have demonstrated that patients will shop for lower costs, driving efficiency. This requires the data to be made available, and models that can predict healthcare costs for a wide range of patient demographics and conditions. We present an approach to this problem by developing a predictive model using machine-learning techniques. We analyzed de-identified patient data from New York State SPARCS (statewide planning and research cooperative system), consisting of 2.3 million records in 2016. We built models to predict costs from patient diagnoses and demographics. We investigated two model classes consisting of sparse regression and decision trees. We obtained the best performance by using a decision tree with depth 10. We obtained an R-square value of 0.76 which is better than the values reported in the literature for similar problems. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 2020 IEEE International Conference on Healthcare Informatics (ICHI)

arXiv:2304.02189 [pdf]

doi 10.1109/IJCNN.2018.8489448

A system for exploring big data: an iterative k-means searchlight for outlier detection on open health data

Authors: A. Ravishankar Rao, Daniel Clarke, Subrata Garai, Soumyabrata Dey

Abstract: The interactive exploration of large and evolving datasets is challenging as relationships between underlying variables may not be fully understood. There may be hidden trends and patterns in the data that are worthy of further exploration and analysis. We present a system that methodically explores multiple combinations of variables using a searchlight technique and identifies outliers. An iterat… ▽ More The interactive exploration of large and evolving datasets is challenging as relationships between underlying variables may not be fully understood. There may be hidden trends and patterns in the data that are worthy of further exploration and analysis. We present a system that methodically explores multiple combinations of variables using a searchlight technique and identifies outliers. An iterative k-means clustering algorithm is applied to features derived through a split-apply-combine paradigm used in the database literature. Outliers are identified as singleton or small clusters. This algorithm is swept across the dataset in a searchlight manner. The dimensions that contain outliers are combined in pairs with other dimensions using a susbset scan technique to gain further insight into the outliers. We illustrate this system by anaylzing open health care data released by New York State. We apply our iterative k-means searchlight followed by subset scanning. Several anomalous trends in the data are identified, including cost overruns at specific hospitals, and increases in diagnoses such as suicides. These constitute novel findings in the literature, and are of potential use to regulatory agencies, policy makers and concerned citizens. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 2018 International Joint Conference on Neural Networks (IJCNN)

arXiv:2303.00915 [pdf, other]

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Authors: Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Matthew P. Lungren, Tristan Naumann, Sheng Wang, Hoifung Poon

Abstract: Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel d… ▽ More Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets such as MIMIC-CXR, and spans a diverse range of biomedical image types. PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles. Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing. We conducted extensive experiments and ablation studies on standard biomedical imaging tasks from retrieval to classification to visual question-answering (VQA). BiomedCLIP achieved new state-of-the-art results in a wide range of standard datasets, substantially outperforming prior approaches. Intriguingly, by large-scale pretraining on diverse biomedical image types, BiomedCLIP even outperforms state-of-the-art radiology-specific models such as BioViL in radiology-specific tasks such as RSNA pneumonia detection. In summary, BiomedCLIP is a fully open-access foundation model that achieves state-of-the-art performance on various biomedical tasks, paving the way for transformative multimodal biomedical discovery and applications. We release our models at https://aka.ms/biomedclip to facilitate future research in multimodal biomedical AI. △ Less

Submitted 16 January, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: The models are released at https://aka.ms/biomedclip

arXiv:2302.08594 [pdf, other]

TransUPR: A Transformer-based Uncertain Point Refiner for LiDAR Point Cloud Semantic Segmentation

Authors: Zifan Yu, Meida Chen, Zhikang Zhang, Suya You, Raghuveer Rao, Sanjeev Agarwal, Fengbo Ren

Abstract: Common image-based LiDAR point cloud semantic segmentation (LiDAR PCSS) approaches have bottlenecks resulting from the boundary-blurring problem of convolution neural networks (CNNs) and quantitation loss of spherical projection. In this work, we propose a transformer-based plug-and-play uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in a learnable manner, which leads… ▽ More Common image-based LiDAR point cloud semantic segmentation (LiDAR PCSS) approaches have bottlenecks resulting from the boundary-blurring problem of convolution neural networks (CNNs) and quantitation loss of spherical projection. In this work, we propose a transformer-based plug-and-play uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in a learnable manner, which leads to an improved segmentation performance. Uncertain points are sampled from coarse semantic segmentation results of 2D image segmentation where uncertain points are located close to the object boundaries in the 2D range image representation and 3D spherical projection background points. Following that, the geometry and coarse semantic features of uncertain points are aggregated by neighbor points in 3D space without adding expensive computation and memory footprint. Finally, the transformer-based refiner, which contains four stacked self-attention layers, along with an MLP module, is utilized for uncertain point classification on the concatenated features of self-attention layers. As the proposed refiner is independent of 2D CNNs, our TransUPR can be easily integrated into any existing image-based LiDAR PCSS approaches, e.g., CENet. Our TransUPR with the CENet achieves state-of-the-art performance, i.e., 68.2% mean Intersection over Union (mIoU) on the Semantic KITTI benchmark, which provides a performance improvement of 0.6% on the mIoU compared to the original CENet. △ Less

Submitted 12 October, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: 6 pages; Accepted by 2023 IROS

arXiv:2301.00357 [pdf, other]

doi 10.1109/BigData55660.2022.10020482

A Functional approach for Two Way Dimension Reduction in Time Series

Authors: Aniruddha Rajendra Rao, Haiyan Wang, Chetan Gupta

Abstract: The rise in data has led to the need for dimension reduction techniques, especially in the area of non-scalar variables, including time series, natural language processing, and computer vision. In this paper, we specifically investigate dimension reduction for time series through functional data analysis. Current methods for dimension reduction in functional data are functional principal component… ▽ More The rise in data has led to the need for dimension reduction techniques, especially in the area of non-scalar variables, including time series, natural language processing, and computer vision. In this paper, we specifically investigate dimension reduction for time series through functional data analysis. Current methods for dimension reduction in functional data are functional principal component analysis and functional autoencoders, which are limited to linear mappings or scalar representations for the time series, which is inefficient. In real data applications, the nature of the data is much more complex. We propose a non-linear function-on-function approach, which consists of a functional encoder and a functional decoder, that uses continuous hidden layers consisting of continuous neurons to learn the structure inherent in functional data, which addresses the aforementioned concerns in the existing approaches. Our approach gives a low dimension latent representation by reducing the number of functional features as well as the timepoints at which the functions are observed. The effectiveness of the proposed model is demonstrated through multiple simulations and real data examples. △ Less

Submitted 1 January, 2023; originally announced January 2023.

Comments: 10 pages, 4 figures, 4 tables

Journal ref: IEEE BigData 2022

arXiv:2212.08762 [pdf, other]

doi 10.1109/TVT.2023.3347891

Iterative RNDOP-Optimal Anchor Placement for Beyond Convex Hull ToA-based Localization: Performance Bounds and Heuristic Algorithms

Authors: Raghunandan M. Rao, Don-Roberts Emenonye

Abstract: Localizing targets outside the anchors' convex hull is an understudied but prevalent scenario in vehicle-centric, UAV-based, and self-localization applications. Considering such scenarios, this paper studies the optimal anchor placement problem for Time-of-Arrival (ToA)-based localization schemes such that the worst-case Dilution of Precision (DOP) is minimized. Building on prior results on DOP sc… ▽ More Localizing targets outside the anchors' convex hull is an understudied but prevalent scenario in vehicle-centric, UAV-based, and self-localization applications. Considering such scenarios, this paper studies the optimal anchor placement problem for Time-of-Arrival (ToA)-based localization schemes such that the worst-case Dilution of Precision (DOP) is minimized. Building on prior results on DOP scaling laws for beyond convex hull ToA-based localization, we propose a novel metric termed the Range-Normalized DOP (RNDOP). We show that the worst-case DOP-optimal anchor placement problem simplifies to a min-max RNDOP-optimal anchor placement problem. Unfortunately, this formulation results in a non-convex and intractable problem under realistic constraints. To overcome this, we propose iterative anchor addition schemes, which result in a tractable albeit non-convex problem. By exploiting the structure arising from the resultant rank-1 update, we devise three heuristic schemes with varying performance-complexity tradeoffs. In addition, we also derive the upper and lower bounds for scenarios where we are placing anchors to optimize the worst-case (a) 3D positioning error and (b) 2D positioning error. We build on these results to design a cohesive iterative algorithmic framework for robust anchor placement, characterize the impact of anchor position uncertainty, and then discuss the computational complexity of the proposed schemes. Using numerical results, we validate the accuracy of our theoretical results. We also present comprehensive Monte-Carlo simulation results to compare the positioning error and execution time performance of each iterative scheme, discuss the tradeoffs, and provide valuable system design insights for beyond convex hull localization scenarios. △ Less

Submitted 17 February, 2024; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: 16 pages. To appear in a future issue of the IEEE Transactions on Vehicular Technology

arXiv:2211.03932 [pdf, other]

Enhanced Low-resolution LiDAR-Camera Calibration Via Depth Interpolation and Supervised Contrastive Learning

Authors: Zhikang Zhang, Zifan Yu, Suya You, Raghuveer Rao, Sanjeev Agarwal, Fengbo Ren

Abstract: Motivated by the increasing application of low-resolution LiDAR recently, we target the problem of low-resolution LiDAR-camera calibration in this work. The main challenges are two-fold: sparsity and noise in point clouds. To address the problem, we propose to apply depth interpolation to increase the point density and supervised contrastive learning to learn noise-resistant features. The experime… ▽ More Motivated by the increasing application of low-resolution LiDAR recently, we target the problem of low-resolution LiDAR-camera calibration in this work. The main challenges are two-fold: sparsity and noise in point clouds. To address the problem, we propose to apply depth interpolation to increase the point density and supervised contrastive learning to learn noise-resistant features. The experiments on RELLIS-3D demonstrate that our approach achieves an average mean absolute rotation/translation errors of 0.15cm/0.33\textdegree on 32-channel LiDAR point cloud data, which significantly outperforms all reference methods. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.03365 [pdf, other]

Polynomial Kernels for Generalized Domination Problems

Authors: Pradeesha Ashok, Rajath Rao, Avi Tomar

Abstract: In this paper, we study the parameterized complexity of a generalized domination problem called the [$σ, ρ$] Dominating Set problem. This problem generalizes a large number of problems including the Minimum Dominating Set problem and its many variants. The parameterized complexity of the [$σ, ρ$] Dominating Set problem parameterized by treewidth is well studied. Here the properties of the sets… ▽ More In this paper, we study the parameterized complexity of a generalized domination problem called the [$σ, ρ$] Dominating Set problem. This problem generalizes a large number of problems including the Minimum Dominating Set problem and its many variants. The parameterized complexity of the [$σ, ρ$] Dominating Set problem parameterized by treewidth is well studied. Here the properties of the sets $σ$ and $ρ$ that make the problem tractable are identified [1]. We consider a larger parameter and investigate the existence of polynomial sized kernels. When $σ$ and $ρ$ are finite, we identify the exact condition when the [$σ, ρ$] Dominating Set problem parameterized by vertex cover admits polynomial kernels. Our lower and upper bound results can also be extended to more general conditions and provably smaller parameters as well. △ Less

Submitted 9 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: 19 pages, 6 figures

arXiv:2211.00519 [pdf, other]

Learning Neural Implicit Representations with Surface Signal Parameterizations

Authors: Yanran Guan, Andrei Chubarau, Ruby Rao, Derek Nowrouzezahrai

Abstract: Neural implicit surface representations have recently emerged as popular alternative to explicit 3D object encodings, such as polygonal meshes, tabulated points, or voxels. While significant work has improved the geometric fidelity of these representations, much less attention is given to their final appearance. Traditional explicit object representations commonly couple the 3D shape data with aux… ▽ More Neural implicit surface representations have recently emerged as popular alternative to explicit 3D object encodings, such as polygonal meshes, tabulated points, or voxels. While significant work has improved the geometric fidelity of these representations, much less attention is given to their final appearance. Traditional explicit object representations commonly couple the 3D shape data with auxiliary surface-mapped image data, such as diffuse color textures and fine-scale geometric details in normal maps that typically require a mapping of the 3D surface onto a plane, i.e., a surface parameterization; implicit representations, on the other hand, cannot be easily textured due to lack of configurable surface parameterization. Inspired by this digital content authoring methodology, we design a neural network architecture that implicitly encodes the underlying surface parameterization suitable for appearance data. As such, our model remains compatible with existing mesh-based digital content with appearance data. Motivated by recent work that overfits compact networks to individual 3D objects, we present a new weight-encoded neural implicit representation that extends the capability of neural implicit surfaces to enable various common and important applications of texture mapping. Our method outperforms reasonable baselines and state-of-the-art alternatives. △ Less

Submitted 25 June, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.13461 [pdf, other]

Active Predictive Coding: A Unified Neural Framework for Learning Hierarchical World Models for Perception and Planning

Authors: Rajesh P. N. Rao, Dimitrios C. Gklezakos, Vishwas Sathish

Abstract: Predictive coding has emerged as a prominent model of how the brain learns through predictions, anticipating the importance accorded to predictive learning in recent AI architectures such as transformers. Here we propose a new framework for predictive coding called active predictive coding which can learn hierarchical world models and solve two radically different open problems in AI: (1) how do w… ▽ More Predictive coding has emerged as a prominent model of how the brain learns through predictions, anticipating the importance accorded to predictive learning in recent AI architectures such as transformers. Here we propose a new framework for predictive coding called active predictive coding which can learn hierarchical world models and solve two radically different open problems in AI: (1) how do we learn compositional representations, e.g., part-whole hierarchies, for equivariant vision? and (2) how do we solve large-scale planning problems, which are hard for traditional reinforcement learning, by composing complex action sequences from primitive policies? Our approach exploits hypernetworks, self-supervised learning and reinforcement learning to learn hierarchical world models that combine task-invariant state transition networks and task-dependent policy networks at multiple abstraction levels. We demonstrate the viability of our approach on a variety of vision datasets (MNIST, FashionMNIST, Omniglot) as well as on a scalable hierarchical planning problem. Our results represent, to our knowledge, the first demonstration of a unified solution to the part-whole learning problem posed by Hinton, the nested reference frames problem posed by Hawkins, and the integrated state-action hierarchy learning problem in reinforcement learning. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: 15 pages, 10 figures, 2 supplementary figures

arXiv:2210.11478 [pdf, other]

Neural Co-Processors for Restoring Brain Function: Results from a Cortical Model of Grasping

Authors: Matthew J. Bryan, Linxing Preston Jiang, Rajesh P N Rao

Abstract: Objective: A major challenge in designing closed-loop brain-computer interfaces is finding optimal stimulation patterns as a function of ongoing neural activity for different subjects and objectives. Approach: To achieve goal-directed closed-loop neurostimulation, we propose "neural co-processors" which use artificial neural networks and deep learning to learn optimal closed-loop stimulation polic… ▽ More Objective: A major challenge in designing closed-loop brain-computer interfaces is finding optimal stimulation patterns as a function of ongoing neural activity for different subjects and objectives. Approach: To achieve goal-directed closed-loop neurostimulation, we propose "neural co-processors" which use artificial neural networks and deep learning to learn optimal closed-loop stimulation policies, shaping neural activity and bridging injured neural circuits for targeted repair and rehabilitation. The co-processor adapts the stimulation policy as the biological circuit itself adapts to the stimulation, achieving a form of brain-device co-adaptation. Here we use simulations to lay the groundwork for future in vivo tests of neural co-processors. We leverage a cortical model of grasping, to which we applied various forms of simulated lesions, allowing us to develop the critical learning algorithms and study adaptations to non-stationarity. Main results: Our simulations show the ability of a neural co-processor to learn a stimulation policy using a supervised learning approach, and to adapt that policy as the underlying brain and sensors change. Our co-processor successfully co-adapted with the simulated brain to accomplish the reach-and-grasp task after a variety of lesions were applied, achieving recovery towards healthy function. Significance: Our results provide the first proof-of-concept demonstration of a co-processor for adaptive activity-dependent closed-loop neurostimulation, optimizing for a rehabilitation goal. While a gap remains between simulations and applications, our results provide insights on how co-processors may be developed for learning complex adaptive stimulation policies for a variety of neural rehabilitation and neuroprosthetic applications. △ Less

Submitted 20 March, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 45 pages, 19 figures. Submitted the IOP Journal of Neural Engineering

arXiv:2209.02930 [pdf, other]

doi 10.1145/3540250.3560879

Reflections on Software Failure Analysis

Authors: Paschal C. Amusuo, Aishwarya Sharma, Siddharth R. Rao, Abbey Vincent, James C. Davis

Abstract: Failure studies are important in revealing the root causes, behaviors, and life cycle of defects in software systems. These studies either focus on understanding the characteristics of defects in specific classes of systems or the characteristics of a specific type of defect in the systems it manifests in. Failure studies have influenced various software engineering research directions, especially… ▽ More Failure studies are important in revealing the root causes, behaviors, and life cycle of defects in software systems. These studies either focus on understanding the characteristics of defects in specific classes of systems or the characteristics of a specific type of defect in the systems it manifests in. Failure studies have influenced various software engineering research directions, especially in the area of software evolution, defect detection, and program repair. In this paper, we reflect on the conduct of failure studies in software engineering. We reviewed a sample of 52 failure study papers. We identified several recurring problems in these studies, some of which hinder the ability of the engineering community to trust or replicate the results. Based on our findings, we suggest future research directions, including identifying and analyzing failure causal chains, standardizing the conduct of failure studies, and tool support for faster defect analysis. △ Less

Submitted 21 September, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: 6 pages, 4 figures To be published in: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE '22)

arXiv:2207.03593 [pdf, other]

Hyper-Universal Policy Approximation: Learning to Generate Actions from a Single Image using Hypernets

Authors: Dimitrios C. Gklezakos, Rishi Jha, Rajesh P. N. Rao

Abstract: Inspired by Gibson's notion of object affordances in human vision, we ask the question: how can an agent learn to predict an entire action policy for a novel object or environment given only a single glimpse? To tackle this problem, we introduce the concept of Universal Policy Functions (UPFs) which are state-to-action mappings that generalize not only to new goals but most importantly to novel, u… ▽ More Inspired by Gibson's notion of object affordances in human vision, we ask the question: how can an agent learn to predict an entire action policy for a novel object or environment given only a single glimpse? To tackle this problem, we introduce the concept of Universal Policy Functions (UPFs) which are state-to-action mappings that generalize not only to new goals but most importantly to novel, unseen environments. Specifically, we consider the problem of efficiently learning such policies for agents with limited computational and communication capacity, constraints that are frequently encountered in edge devices. We propose the Hyper-Universal Policy Approximator (HUPA), a hypernetwork-based model to generate small task- and environment-conditional policy networks from a single image, with good generalization properties. Our results show that HUPAs significantly outperform an embedding-based alternative for generated policies that are size-constrained. Although this work is restricted to a simple map-based navigation task, future work includes applying the principles behind HUPAs to learning more general affordances for objects and environments. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2207.00559 [pdf, other]

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Authors: Elham E Khoda, Dylan Rankin, Rafael Teixeira de Lima, Philip Harris, Scott Hauck, Shih-Chieh Hsu, Michael Kagan, Vladimir Loncar, Chaitanya Paikara, Richa Rao, Sioni Summers, Caterina Vernieri, Aaron Wang

Abstract: Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neura… ▽ More Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: 12 pages, 6 figures, 5 tables

arXiv:2206.08462 [pdf, other]

Recursive Neural Programs: Variational Learning of Image Grammars and Part-Whole Hierarchies

Authors: Ares Fisher, Rajesh P. N. Rao

Abstract: Human vision involves parsing and representing objects and scenes using structured representations based on part-whole hierarchies. Computer vision and machine learning researchers have recently sought to emulate this capability using capsule networks, reference frames and active predictive coding, but a generative model formulation has been lacking. We introduce Recursive Neural Programs (RNPs),… ▽ More Human vision involves parsing and representing objects and scenes using structured representations based on part-whole hierarchies. Computer vision and machine learning researchers have recently sought to emulate this capability using capsule networks, reference frames and active predictive coding, but a generative model formulation has been lacking. We introduce Recursive Neural Programs (RNPs), which, to our knowledge, is the first neural generative model to address the part-whole hierarchy learning problem. RNPs model images as hierarchical trees of probabilistic sensory-motor programs that recursively reuse learned sensory-motor primitives to model an image within different reference frames, forming recursive image grammars. We express RNPs as structured variational autoencoders (sVAEs) for inference and sampling, and demonstrate parts-based parsing, sampling and one-shot transfer learning for MNIST, Omniglot and Fashion-MNIST datasets, demonstrating the model's expressive power. Our results show that RNPs provide an intuitive and explainable way of composing objects and scenes, allowing rich compositionality and intuitive interpretations of objects in terms of part-whole hierarchies. △ Less

Submitted 25 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Comments: 9 pages, 6 figures. fixed LaTeX typo for algorithm reference

arXiv:2204.03055 [pdf]

To Participate Or Not To Participate: An Investigation Of Strategic Participation In Standards

Authors: Paras Bhatt, Claire Vishik, Govind Hariharan, H. Raghav Rao

Abstract: Essential functionality in the ICT (Information and Communication Technology) space draws from standards such as HTTP (IETF RFC 2616, Bluetooth (IEEE 802.15) and various telecommunication standards (4G, 5G). They have fuelled rapid growth of ICT sector in the last decades by ensuring interoperability and consistency in computing environment. Research shows that firms that backed ICT standards and… ▽ More Essential functionality in the ICT (Information and Communication Technology) space draws from standards such as HTTP (IETF RFC 2616, Bluetooth (IEEE 802.15) and various telecommunication standards (4G, 5G). They have fuelled rapid growth of ICT sector in the last decades by ensuring interoperability and consistency in computing environment. Research shows that firms that backed ICT standards and participated in standards development, have emerged as industry innovators. Standards development thus clearly has benefits for participating companies as well as technology development and innovation in general. However, significant costs are also associated with development of standards and need to be better understood to support investment in standardization necessary for todays ICT environment. We present a conceptual model that considers the potential for market innovation across a standards lifecycle and efficiency from standardization work, to build a forward-looking decision model that can guide an organizations standards development activities. We investigate and formalize motivations that drive firms to participate in standardization, specifically, changes in market innovation. Our model can serve as a strategic decision framework to drive assessments of a firms participation in standards development. We test our model with a use case on an established access control approach that was standardized more than two decades ago, Role Based Access Control (RBAC) using historical data. The investigation of the case study shows that change in market innovation is a significant indicator of success in standards development and are viable criteria to model a firms decision to participate (or not to participate) in a specific area of standardization. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: https://www.researchgate.net/publication/359415466_To_Participate_Or_Not_To_Participate_An_Investigation_Of_Strategic_Participation_In_Standards

Journal ref: 11th International Conference on Standardisation and Innovation in Information Technology (SIIT) - The Past, Present and Future of ICT Standardisation, Sep, 2021

arXiv:2203.10677 [pdf, other]

doi 10.1145/3510003.3512764

Repairing Brain-Computer Interfaces with Fault-Based Data Acquisition

Authors: Cailin Winston, Caleb Winston, Chloe N Winston, Claris Winston, Cleah Winston, Rajesh PN Rao, René Just

Abstract: Brain-computer interfaces (BCIs) decode recorded neural signals from the brain and/or stimulate the brain with encoded neural signals. BCIs span both hardware and software and have a wide range of applications in restorative medicine, from restoring movement through prostheses and robotic limbs to restoring sensation and communication through spellers. BCIs also have applications in diagnostic med… ▽ More Brain-computer interfaces (BCIs) decode recorded neural signals from the brain and/or stimulate the brain with encoded neural signals. BCIs span both hardware and software and have a wide range of applications in restorative medicine, from restoring movement through prostheses and robotic limbs to restoring sensation and communication through spellers. BCIs also have applications in diagnostic medicine, e.g., providing clinicians with data for detecting seizures, sleep patterns, or emotions. Despite their promise, BCIs have not yet been adopted for long-term, day-to-day use because of challenges related to reliability and robustness, which are needed for safe operation in all scenarios. Ensuring safe operation currently requires hours of manual data collection and recalibration, involving both patients and clinicians. However, data collection is not targeted at eliminating specific faults in a BCI. This paper presents a new methodology for characterizing, detecting, and localizing faults in BCIs. Specifically, it proposes partial test oracles as a method for detecting faults and slice functions as a method for localizing faults to characteristic patterns in the input data or relevant tasks performed by the user. Through targeted data acquisition and retraining, the proposed methodology improves the correctness of BCIs. We evaluated the proposed methodology on five BCI applications. The results show that the proposed methodology (1) precisely localizes faults and (2) can significantly reduce the frequency of faults through retraining based on targeted, fault-based data acquisition. These results suggest that the proposed methodology is a promising step towards repairing faulty BCIs. △ Less

Submitted 20 March, 2022; originally announced March 2022.

Comments: Accepted at International Conference on Software Engineering (ICSE-2022)

arXiv:2203.10442 [pdf, other]

Towards Structuring Real-World Data at Scale: Deep Learning for Extracting Key Oncology Information from Clinical Text with Patient-Level Supervision

Authors: Sam Preston, Mu Wei, Rajesh Rao, Robert Tinn, Naoto Usuyama, Michael Lucas, Roshanthi Weerasinghe, Soohee Lee, Brian Piening, Paul Tittel, Naveen Valluri, Tristan Naumann, Carlo Bifulco, Hoifung Poon

Abstract: Objective: The majority of detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time-consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. Materials and Methods: Traditional rule-based systems are vulnerable… ▽ More Objective: The majority of detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time-consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. Materials and Methods: Traditional rule-based systems are vulnerable to the prevalent linguistic variations and ambiguities in clinical text, and prior applications of machine-learning methods typically require sentence-level or report-level labeled examples that are hard to produce at scale. We propose leveraging patient-level supervision from medical registries, which are often readily available and capture key patient information, for general RWD applications. To combat the lack of sentence-level or report-level annotations, we explore advanced deep-learning methods by combining domain-specific pretraining, recurrent neural networks, and hierarchical attention. Results: We conduct an extensive study on 135,107 patients from the cancer registry of a large integrated delivery network (IDN) comprising healthcare systems in five western US states. Our deep learning methods attain test AUROC of 94-99% for key tumor attributes and comparable performance on held-out data from separate health systems and states. Discussion and Conclusion: Ablation results demonstrate clear superiority of these advanced deep-learning methods over prior approaches. Error analysis shows that our NLP system sometimes even corrects errors in registrar labels. We also conduct a preliminary investigation in accelerating registry curation and general RWD structuring via assisted curation for over 1.2 million cancer patients in this healthcare network. △ Less

Submitted 19 March, 2022; originally announced March 2022.

arXiv:2203.07664 [pdf, other]

Can you even tell left from right? Presenting a new challenge for VQA

Authors: Sai Raam Venkatraman, Rishi Rao, S. Balasubramanian, Chandra Sekhar Vorugunti, R. Raghunatha Sarma

Abstract: Visual Question Answering (VQA) needs a means of evaluating the strengths and weaknesses of models. One aspect of such an evaluation is the evaluation of compositional generalisation, or the ability of a model to answer well on scenes whose scene-setups are different from the training set. Therefore, for this purpose, we need datasets whose train and test sets differ significantly in composition.… ▽ More Visual Question Answering (VQA) needs a means of evaluating the strengths and weaknesses of models. One aspect of such an evaluation is the evaluation of compositional generalisation, or the ability of a model to answer well on scenes whose scene-setups are different from the training set. Therefore, for this purpose, we need datasets whose train and test sets differ significantly in composition. In this work, we present several quantitative measures of compositional separation and find that popular datasets for VQA are not good evaluators. To solve this, we present Uncommon Objects in Unseen Configurations (UOUC), a synthetic dataset for VQA. UOUC is at once fairly complex while also being well-separated, compositionally. The object-class of UOUC consists of 380 clasess taken from 528 characters from the Dungeons and Dragons game. The train set of UOUC consists of 200,000 scenes; whereas the test set consists of 30,000 scenes. In order to study compositional generalisation, simple reasoning and memorisation, each scene of UOUC is annotated with up to 10 novel questions. These deal with spatial relationships, hypothetical changes to scenes, counting, comparison, memorisation and memory-based reasoning. In total, UOUC presents over 2 million questions. UOUC also finds itself as a strong challenge to well-performing models for VQA. Our evaluation of recent models for VQA shows poor compositional generalisation, and comparatively lower ability towards simple reasoning. These results suggest that UOUC could lead to advances in research by being a strong benchmark for VQA. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2201.08813 [pdf, other]

Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies

Authors: Dimitrios C. Gklezakos, Rajesh P. N. Rao

Abstract: We introduce Active Predictive Coding Networks (APCNs), a new class of neural networks that solve a major problem posed by Hinton and others in the fields of artificial intelligence and brain modeling: how can neural networks learn intrinsic reference frames for objects and parse visual scenes into part-whole hierarchies by dynamically allocating nodes in a parse tree? APCNs address this problem b… ▽ More We introduce Active Predictive Coding Networks (APCNs), a new class of neural networks that solve a major problem posed by Hinton and others in the fields of artificial intelligence and brain modeling: how can neural networks learn intrinsic reference frames for objects and parse visual scenes into part-whole hierarchies by dynamically allocating nodes in a parse tree? APCNs address this problem by using a novel combination of ideas: (1) hypernetworks are used for dynamically generating recurrent neural networks that predict parts and their locations within intrinsic reference frames conditioned on higher object-level embedding vectors, and (2) reinforcement learning is used in conjunction with backpropagation for end-to-end learning of model parameters. The APCN architecture lends itself naturally to multi-level hierarchical learning and is closely related to predictive coding models of cortical function. Using the MNIST, Fashion-MNIST and Omniglot datasets, we demonstrate that APCNs can (a) learn to parse images into part-whole hierarchies, (b) learn compositional representations, and (c) transfer their knowledge to unseen classes of objects. With their ability to dynamically generate parse trees with part locations for objects, APCNs offer a new framework for explainable AI that leverages advances in deep learning while retaining interpretability and compositionality. △ Less

Submitted 14 January, 2022; originally announced January 2022.

arXiv:2112.11547 [pdf, other]

Decompose the Sounds and Pixels, Recompose the Events

Authors: Varshanth R. Rao, Md Ibrahim Khalil, Haoda Li, Peng Dai, Juwei Lu

Abstract: In this paper, we propose a framework centering around a novel architecture called the Event Decomposition Recomposition Network (EDRNet) to tackle the Audio-Visual Event (AVE) localization problem in the supervised and weakly supervised settings. AVEs in the real world exhibit common unravelling patterns (termed as Event Progress Checkpoints (EPC)), which humans can perceive through the cooperati… ▽ More In this paper, we propose a framework centering around a novel architecture called the Event Decomposition Recomposition Network (EDRNet) to tackle the Audio-Visual Event (AVE) localization problem in the supervised and weakly supervised settings. AVEs in the real world exhibit common unravelling patterns (termed as Event Progress Checkpoints (EPC)), which humans can perceive through the cooperation of their auditory and visual senses. Unlike earlier methods which attempt to recognize entire event sequences, the EDRNet models EPCs and inter-EPC relationships using stacked temporal convolutions. Based on the postulation that EPC representations are theoretically consistent for an event category, we introduce the State Machine Based Video Fusion, a novel augmentation technique that blends source videos using different EPC template sequences. Additionally, we design a new loss function called the Land-Shore-Sea loss to compactify continuous foreground and background representations. Lastly, to alleviate the issue of confusing events during weak supervision, we propose a prediction stabilization method called Bag to Instance Label Correction. Experiments on the AVE dataset show that our collective framework outperforms the state-of-the-art by a sizable margin. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: Accepted at AAAI 2022

arXiv:2110.08429 [pdf, other]

TorchEsegeta: Framework for Interpretability and Explainability of Image-based Deep Learning Models

Authors: Soumick Chatterjee, Arnab Das, Chirag Mandal, Budhaditya Mukhopadhyay, Manish Vipinraj, Aniruddh Shukla, Rajatha Nagaraja Rao, Chompunuch Sarasaen, Oliver Speck, Andreas Nürnberger

Abstract: Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and… ▽ More Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and explain the results of deep learning algorithms by depicting the anatomical areas which influence the decision of the algorithm most. Moreover, this research presents a unified framework, TorchEsegeta, for applying various interpretability and explainability techniques for deep learning models and generate visual interpretations and explanations for clinicians to corroborate their clinical findings. In addition, this will aid in gaining confidence in such methods. The framework builds on existing interpretability and explainability techniques that are currently focusing on classification models, extending them to segmentation tasks. In addition, these methods have been adapted to 3D models for volumetric analysis. The proposed framework provides methods to quantitatively compare visual explanations using infidelity and sensitivity metrics. This framework can be used by data scientists to perform post-hoc interpretations and explanations of their models, develop more explainable tools and present the findings to clinicians to increase their faith in such models. The proposed framework was evaluated based on a use case scenario of vessel segmentation models trained on Time-of-fight (TOF) Magnetic Resonance Angiogram (MRA) images of the human brain. Quantitative and qualitative results of a comparative study of different models and interpretability methods are presented. Furthermore, this paper provides an extensive overview of several existing interpretability and explainability methods. △ Less

Submitted 7 February, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

arXiv:2109.12434 [pdf, other]

Emergent behavior and neural dynamics in artificial agents tracking turbulent plumes

Authors: Satpreet Harcharan Singh, Floris van Breugel, Rajesh P. N. Rao, Bingni Wen Brunton

Abstract: Tracking a turbulent plume to locate its source is a complex control problem because it requires multi-sensory integration and must be robust to intermittent odors, changing wind direction, and variable plume statistics. This task is routinely performed by flying insects, often over long distances, in pursuit of food or mates. Several aspects of this remarkable behavior have been studied in detail… ▽ More Tracking a turbulent plume to locate its source is a complex control problem because it requires multi-sensory integration and must be robust to intermittent odors, changing wind direction, and variable plume statistics. This task is routinely performed by flying insects, often over long distances, in pursuit of food or mates. Several aspects of this remarkable behavior have been studied in detail in many experimental studies. Here, we take a complementary in silico approach, using artificial agents trained with reinforcement learning to develop an integrated understanding of the behaviors and neural computations that support plume tracking. Specifically, we use deep reinforcement learning (DRL) to train recurrent neural network (RNN) agents to locate the source of simulated turbulent plumes. Interestingly, the agents' emergent behaviors resemble those of flying insects, and the RNNs learn to represent task-relevant variables, such as head direction and time since last odor encounter. Our analyses suggest an intriguing experimentally testable hypothesis for tracking plumes in changing wind direction -- that agents follow local plume shape rather than the current wind direction. While reflexive short-memory behaviors are sufficient for tracking plumes in constant wind, longer timescales of memory are essential for tracking plumes that switch direction. At the level of neural dynamics, the RNNs' population activity is low-dimensional and organized into distinct dynamical structures, with some correspondence to behavioral modules. Our in silico approach provides key intuitions for turbulent plume tracking strategies and motivates future targeted experimental and theoretical developments. △ Less

Submitted 17 December, 2021; v1 submitted 25 September, 2021; originally announced September 2021.

ACM Class: I.2.6; I.2.0; I.5.1

arXiv:2108.03936 [pdf, other]

3D Human Reconstruction in the Wild with Collaborative Aerial Cameras

Authors: Cherie Ho, Andrew Jong, Harry Freeman, Rohan Rao, Rogerio Bonatti, Sebastian Scherer

Abstract: Aerial vehicles are revolutionizing applications that require capturing the 3D structure of dynamic targets in the wild, such as sports, medicine, and entertainment. The core challenges in developing a motion-capture system that operates in outdoors environments are: (1) 3D inference requires multiple simultaneous viewpoints of the target, (2) occlusion caused by obstacles is frequent when trackin… ▽ More Aerial vehicles are revolutionizing applications that require capturing the 3D structure of dynamic targets in the wild, such as sports, medicine, and entertainment. The core challenges in developing a motion-capture system that operates in outdoors environments are: (1) 3D inference requires multiple simultaneous viewpoints of the target, (2) occlusion caused by obstacles is frequent when tracking moving targets, and (3) the camera and vehicle state estimation is noisy. We present a real-time aerial system for multi-camera control that can reconstruct human motions in natural environments without the use of special-purpose markers. We develop a multi-robot coordination scheme that maintains the optimal flight formation for target reconstruction quality amongst obstacles. We provide studies evaluating system performance in simulation, and validate real-world performance using two drones while a target performs activities such as jogging and playing soccer. Supplementary video: https://youtu.be/jxt91vx0cns △ Less

Submitted 9 August, 2021; originally announced August 2021.

Comments: 7 pages, 11 figures, IROS 2021

arXiv:2107.14151 [pdf, other]

doi 10.1007/s11222-023-10299-z

Modern Non-Linear Function-on-Function Regression

Authors: Aniruddha Rajendra Rao, Matthew Reimherr

Abstract: We introduce a new class of non-linear function-on-function regression models for functional data using neural networks. We propose a framework using a hidden layer consisting of continuous neurons, called a continuous hidden layer, for functional response modeling and give two model fitting strategies, Functional Direct Neural Network (FDNN) and Functional Basis Neural Network (FBNN). Both are de… ▽ More We introduce a new class of non-linear function-on-function regression models for functional data using neural networks. We propose a framework using a hidden layer consisting of continuous neurons, called a continuous hidden layer, for functional response modeling and give two model fitting strategies, Functional Direct Neural Network (FDNN) and Functional Basis Neural Network (FBNN). Both are designed explicitly to exploit the structure inherent in functional data and capture the complex relations existing between the functional predictors and the functional response. We fit these models by deriving functional gradients and implement regularization techniques for more parsimonious results. We demonstrate the power and flexibility of our proposed method in handling complex functional models through extensive simulation studies as well as real data examples. △ Less

Submitted 7 October, 2023; v1 submitted 29 July, 2021; originally announced July 2021.

Comments: 6 figures, 6 tables (including supplementary material), 16 pages (including supplementary material). arXiv admin note: text overlap with arXiv:2104.09371

Journal ref: Statistics and Computing 2023

arXiv:2106.12033 [pdf, other]

Strategic Liquidity Provision in Uniswap v3

Authors: Zhou Fan, Francisco Marmolejo-Cossío, Daniel J. Moroz, Michael Neuder, Rithvik Rao, David C. Parkes

Abstract: Uniswap v3 is the largest decentralized exchange for digital currencies. A novelty of its design is that it allows a liquidity provider (LP) to allocate liquidity to one or more closed intervals of the price of an asset instead of the full range of possible prices. An LP earns fee rewards proportional to the amount of its liquidity allocation when prices move in this interval. This induces the pro… ▽ More Uniswap v3 is the largest decentralized exchange for digital currencies. A novelty of its design is that it allows a liquidity provider (LP) to allocate liquidity to one or more closed intervals of the price of an asset instead of the full range of possible prices. An LP earns fee rewards proportional to the amount of its liquidity allocation when prices move in this interval. This induces the problem of {\em strategic liquidity provision}: smaller intervals result in higher concentration of liquidity and correspondingly larger fees when the price remains in the interval, but with higher risk as prices may exit the interval leaving the LP with no fee rewards. Although reallocating liquidity to new intervals can mitigate this loss, it comes at a cost, as LPs must expend gas fees to do so. We formalize the dynamic liquidity provision problem and focus on a general class of strategies for which we provide a neural network-based optimization framework for maximizing LP earnings. We model a single LP that faces an exogenous sequence of price changes that arise from arbitrage and non-arbitrage trades in the decentralized exchange. We present experimental results informed by historical price data that demonstrate large improvements in LP earnings over existing allocation strategy baselines. Moreover we provide insight into qualitative differences in optimal LP behaviour in different economic environments. △ Less

Submitted 16 August, 2024; v1 submitted 22 June, 2021; originally announced June 2021.

Comments: In Proceedings of AFT '23

Showing 1–50 of 129 results for author: Rao, R