Search | arXiv e-print repository

CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Authors: Shubham Bharti, Shiyun Cheng, Jihyun Rho, Martina Rao, Xiaojin Zhu

Abstract: We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal large language models. CHARTOM consists of specially designed data visualizing charts. Given a chart, a language model needs to not only correctly comprehend the chart (the FACT question) but also judge if the chart will be misleading to a human reader (the MIND question). Both questions have significant societal benefits. We d… ▽ More We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal large language models. CHARTOM consists of specially designed data visualizing charts. Given a chart, a language model needs to not only correctly comprehend the chart (the FACT question) but also judge if the chart will be misleading to a human reader (the MIND question). Both questions have significant societal benefits. We detail the construction of the CHARTOM benchmark including its calibration on human performance. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2407.04208 [pdf, other]

AMD: Automatic Multi-step Distillation of Large-scale Vision Models

Authors: Cheng Han, Qifan Wang, Sohail A. Dianat, Majid Rabbani, Raghuveer M. Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu

Abstract: Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real applications, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy… ▽ More Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real applications, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy when confronted with a large capacity gap between the teacher and the student, e.g, 10x compression rate. In this paper, we present a novel approach named Automatic Multi-step Distillation (AMD) for large-scale vision model compression. In particular, our distillation process unfolds across multiple steps. Initially, the teacher undergoes distillation to form an intermediate teacher-assistant model, which is subsequently distilled further to the student. An efficient and effective optimization framework is introduced to automatically identify the optimal teacher-assistant that leads to the maximal student performance. We conduct extensive experiments on multiple image classification datasets, including CIFAR-10, CIFAR-100, and ImageNet. The findings consistently reveal that our approach outperforms several established baselines, paving a path for future knowledge distillation methods on large-scale vision models. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 19 pages, 5 figures

arXiv:2406.01559 [pdf, other]

Prototypical Transformer as Unified Motion Learners

Authors: Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu

Abstract: In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature moti… ▽ More In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature motion patterns, providing transparency in understanding motion scenes. Second, Latent Synchronization guides feature representation learning via prototypes, effectively mitigating the problem of motion uncertainty. Empirical results demonstrate that our approach achieves competitive performance on popular motion tasks such as optical flow and scene depth. Furthermore, it exhibits generality across various downstream tasks, including object tracking and video stabilization. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 21 pages, 10 figures

arXiv:2406.00314 [pdf, other]

CASE: Efficient Curricular Data Pre-training for Building Assistive Psychology Expert Models

Authors: Sarthak Harne, Monjoy Narayan Choudhury, Madhav Rao, TK Srikanth, Seema Mehrotra, Apoorva Vashisht, Aarushi Basu, Manjit Sodhi

Abstract: The limited availability of psychologists necessitates efficient identification of individuals requiring urgent mental healthcare. This study explores the use of Natural Language Processing (NLP) pipelines to analyze text data from online mental health forums used for consultations. By analyzing forum posts, these pipelines can flag users who may require immediate professional attention. A crucial… ▽ More The limited availability of psychologists necessitates efficient identification of individuals requiring urgent mental healthcare. This study explores the use of Natural Language Processing (NLP) pipelines to analyze text data from online mental health forums used for consultations. By analyzing forum posts, these pipelines can flag users who may require immediate professional attention. A crucial challenge in this domain is data privacy and scarcity. To address this, we propose utilizing readily available curricular texts used in institutes specializing in mental health for pre-training the NLP pipelines. This helps us mimic the training process of a psychologist. Our work presents CASE-BERT that flags potential mental health disorders based on forum text. CASE-BERT demonstrates superior performance compared to existing methods, achieving an f1 score of 0.91 for Depression and 0.88 for Anxiety, two of the most commonly reported mental health disorders. Our code is publicly available. △ Less

Submitted 16 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

arXiv:2403.19786 [pdf, other]

Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition

Authors: Mingxing Rao, Yinhong Qin, Soheil Kolouri, Jie Ying Wu, Daniel Moyer

Abstract: Purpose: In order to produce a surgical gesture recognition system that can support a wide variety of procedures, either a very large annotated dataset must be acquired, or fitted models must generalize to new labels (so called "zero-shot" capability). In this paper we investigate the feasibility of latter option. Methods: Leveraging the Bridge-Prompt framework, we prompt-tune a pre-trained vision… ▽ More Purpose: In order to produce a surgical gesture recognition system that can support a wide variety of procedures, either a very large annotated dataset must be acquired, or fitted models must generalize to new labels (so called "zero-shot" capability). In this paper we investigate the feasibility of latter option. Methods: Leveraging the Bridge-Prompt framework, we prompt-tune a pre-trained vision-text model (CLIP) for gesture recognition in surgical videos. This can utilize extensive outside video data such as text, but also make use of label meta-data and weakly supervised contrastive losses. Results: Our experiments show that prompt-based video encoder outperforms standard encoders in surgical gesture recognition tasks. Notably, it displays strong performance in zero-shot scenarios, where gestures/tasks that were not provided during the encoder training phase are included in the prediction phase. Additionally, we measure the benefit of inclusion text descriptions in the feature extractor training schema. Conclusion Bridge-Prompt and similar pre-trained+prompt-tuned video encoder models present significant visual representation for surgical robotics, especially in gesture recognition tasks. Given the diverse range of surgical tasks (gestures), the ability of these models to zero-shot transfer without the need for any task (gesture) specific retraining makes them invaluable. △ Less

Submitted 21 August, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: 17 pages,4 figures, 7 tables, IPCAI 2024 & IJCARS

arXiv:2312.08267 [pdf, other]

TABSurfer: a Hybrid Deep Learning Architecture for Subcortical Segmentation

Authors: Aaron Cao, Vishwanatha M. Rao, Kejia Liu, Xinru Liu, Andrew F. Laine, Jia Guo

Abstract: Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propo… ▽ More Subcortical segmentation remains challenging despite its important applications in quantitative structural analysis of brain MRI scans. The most accurate method, manual segmentation, is highly labor intensive, so automated tools like FreeSurfer have been adopted to handle this task. However, these traditional pipelines are slow and inefficient for processing large datasets. In this study, we propose TABSurfer, a novel 3D patch-based CNN-Transformer hybrid deep learning model designed for superior subcortical segmentation compared to existing state-of-the-art tools. To evaluate, we first demonstrate TABSurfer's consistent performance across various T1w MRI datasets with significantly shorter processing times compared to FreeSurfer. Then, we validate against manual segmentations, where TABSurfer outperforms FreeSurfer based on the manual ground truth. In each test, we also establish TABSurfer's advantage over a leading deep learning benchmark, FastSurferVINN. Together, these studies highlight TABSurfer's utility as a powerful tool for fully automated subcortical segmentation with high fidelity. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 5 pages, 3 figures, 2 tables

arXiv:2311.17705 [pdf, other]

Q-PAC: Automated Detection of Quantum Bug-Fix Patterns

Authors: Pranav K. Nayak, Krishn V. Kher, M. Bharat Chandra, M. V. Panduranga Rao, Lei Zhang

Abstract: Context: Bug-fix pattern detection has been investigated in the past in the context of classical software. However, while quantum software is developing rapidly, the literature still lacks automated methods and tools to identify, analyze, and detect bug-fix patterns. To the best of our knowledge, our work previously published in SEKE'23 was the first to leverage classical techniques to detect bug-… ▽ More Context: Bug-fix pattern detection has been investigated in the past in the context of classical software. However, while quantum software is developing rapidly, the literature still lacks automated methods and tools to identify, analyze, and detect bug-fix patterns. To the best of our knowledge, our work previously published in SEKE'23 was the first to leverage classical techniques to detect bug-fix patterns in quantum code. Objective: To extend our previous effort, we present a research agenda (Q-Repair), including a series of testing and debugging methodologies, to improve the quality of quantum software. The ultimate goal is to utilize machine learning techniques to automatically predict fix patterns for existing quantum bugs. Method: As part of the first stage of the agenda, we extend our initial study and propose a more comprehensive automated framework, called Q-PAC, for detecting bug-fix patterns in IBM Qiskit quantum code. In the framework, we develop seven bug-fix pattern detectors using abstract syntax trees, syntactic filters, and semantic checks. Results: To demonstrate our method, we run Q-PAC on a variety of quantum bug-fix patterns using both real-world and handcrafted examples of bugs and fixes. The experimental results show that Q-PAC can effectively identify bug-fix patterns in IBM Qiskit. Conclusion: We hope our initial study on quantum bug-fix detection can bring awareness of quantum software engineering to both researchers and practitioners. Thus, we also publish Q-PAC as an open-source software on GitHub. We would like to encourage other researchers to work on research directions (such as Q-Repair) to improve the quality of the quantum programming. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 16 pages, 2 figures

arXiv:2311.15072 [pdf, other]

Introducing SSBD+ Dataset with a Convolutional Pipeline for detecting Self-Stimulatory Behaviours in Children using raw videos

Authors: Vaibhavi Lokegaonkar, Vijay Jaisankar, Pon Deepika, Madhav Rao, T K Srikanth, Sarbani Mallick, Manjit Sodhi

Abstract: Conventionally, evaluation for the diagnosis of Autism spectrum disorder is done by a trained specialist through questionnaire-based formal assessments and by observation of behavioral cues under various settings to capture the early warning signs of autism. These evaluation techniques are highly subjective and their accuracy relies on the experience of the specialist. In this regard, machine lear… ▽ More Conventionally, evaluation for the diagnosis of Autism spectrum disorder is done by a trained specialist through questionnaire-based formal assessments and by observation of behavioral cues under various settings to capture the early warning signs of autism. These evaluation techniques are highly subjective and their accuracy relies on the experience of the specialist. In this regard, machine learning-based methods for automated capturing of early signs of autism from the recorded videos of the children is a promising alternative. In this paper, the authors propose a novel pipelined deep learning architecture to detect certain self-stimulatory behaviors that help in the diagnosis of autism spectrum disorder (ASD). The authors also supplement their tool with an augmented version of the Self Stimulatory Behavior Dataset (SSBD) and also propose a new label in SSBD Action detection: no-class. The deep learning model with the new dataset is made freely available for easy adoption to the researchers and developers community. An overall accuracy of around 81% was achieved from the proposed pipeline model that is targeted for real-time and hands-free automated diagnosis. All of the source code, data, licenses of use, and other relevant material is made freely available in https://github.com/sarl-iiitb/ △ Less

Submitted 25 November, 2023; originally announced November 2023.

Comments: Copyright 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2311.12310 [pdf, other]

IEKM: A Model Incorporating External Keyword Matrices

Authors: Cheng Luo, Qin Li, Zhao Yan, Mengliang Rao, Yunbo Cao

Abstract: A customer service platform system with a core text semantic similarity (STS) task faces two urgent challenges: Firstly, one platform system needs to adapt to different domains of customers, i.e., different domains adaptation (DDA). Secondly, it is difficult for the model of the platform system to distinguish sentence pairs that are literally close but semantically different, i.e., hard negative s… ▽ More A customer service platform system with a core text semantic similarity (STS) task faces two urgent challenges: Firstly, one platform system needs to adapt to different domains of customers, i.e., different domains adaptation (DDA). Secondly, it is difficult for the model of the platform system to distinguish sentence pairs that are literally close but semantically different, i.e., hard negative samples. In this paper, we propose an incorporation external keywords matrices model (IEKM) to address these challenges. The model uses external tools or dictionaries to construct external matrices and fuses them to the self-attention layers of the Transformer structure through gating units, thus enabling flexible corrections to the model results. We evaluate the method on multiple datasets and the results show that our method has improved performance on all datasets. To demonstrate that our method can effectively solve all the above challenges, we conduct a flexible correction experiment, which results in an increase in the F1 value from 56.61 to 73.53. Our code will be publicly available. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.06261 [pdf, other]

With ChatGPT, do we have to rewrite our learning objectives -- CASE study in Cybersecurity

Authors: Peter Jamieson, Suman Bhunia, Dhananjai M. Rao

Abstract: With the emergence of Artificial Intelligent chatbot tools such as ChatGPT and code writing AI tools such as GitHub Copilot, educators need to question what and how we should teach our courses and curricula in the future. In reality, automated tools may result in certain academic fields being deeply reduced in the number of employable people. In this work, we make a case study of cybersecurity und… ▽ More With the emergence of Artificial Intelligent chatbot tools such as ChatGPT and code writing AI tools such as GitHub Copilot, educators need to question what and how we should teach our courses and curricula in the future. In reality, automated tools may result in certain academic fields being deeply reduced in the number of employable people. In this work, we make a case study of cybersecurity undergrad education by using the lens of ``Understanding by Design'' (UbD). First, we provide a broad understanding of learning objectives (LOs) in cybersecurity from a computer science perspective. Next, we dig a little deeper into a curriculum with an undergraduate emphasis on cybersecurity and examine the major courses and their LOs for our cybersecurity program at Miami University. With these details, we perform a thought experiment on how attainable the LOs are with the above-described tools, asking the key question ``what needs to be enduring concepts?'' learned in this process. If an LO becomes something that the existence of automation tools might be able to do, we then ask ``what level is attainable for the LO that is not a simple query to the tools?''. With this exercise, we hope to establish an example of how to prompt ChatGPT to accelerate students in their achievements of LOs given the existence of these new AI tools, and our goal is to push all of us to leverage and teach these tools as powerful allies in our quest to improve human existence and knowledge. △ Less

Submitted 26 September, 2023; originally announced November 2023.

arXiv:2308.08803 [pdf]

doi 10.32985/ijeces.14.4.6

An Effective Deep Learning Based Multi-Class Classification of DoS and DDoS Attack Detection

Authors: Arun Kumar Silivery, Kovvur Ram Mohan Rao, L K Suresh Kumar

Abstract: In the past few years, cybersecurity is becoming very important due to the rise in internet users. The internet attacks such as Denial of service (DoS) and Distributed Denial of Service (DDoS) attacks severely harm a website or server and make them unavailable to other users. Network Monitoring and control systems have found it challenging to identify the many classes of DoS and DDoS attacks since… ▽ More In the past few years, cybersecurity is becoming very important due to the rise in internet users. The internet attacks such as Denial of service (DoS) and Distributed Denial of Service (DDoS) attacks severely harm a website or server and make them unavailable to other users. Network Monitoring and control systems have found it challenging to identify the many classes of DoS and DDoS attacks since each operates uniquely. Hence a powerful technique is required for attack detection. Traditional machine learning techniques are inefficient in handling extensive network data and cannot extract high-level features for attack detection. Therefore, an effective deep learning-based intrusion detection system is developed in this paper for DoS and DDoS attack classification. This model includes various phases and starts with the Deep Convolutional Generative Adversarial Networks (DCGAN) based technique to address the class imbalance issue in the dataset. Then a deep learning algorithm based on ResNet-50 extracts the critical features for each class in the dataset. After that, an optimized AlexNet-based classifier is implemented for detecting the attacks separately, and the essential parameters of the classifier are optimized using the Atom search optimization algorithm. The proposed approach was evaluated on benchmark datasets, CCIDS2019 and UNSW-NB15, using key classification metrics and achieved 99.37% accuracy for the UNSW-NB15 dataset and 99.33% for the CICIDS2019 dataset. The investigational results demonstrate that the suggested approach performs superior to other competitive techniques in identifying DoS and DDoS attacks. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.02013 [pdf, other]

Federated Representation Learning for Automatic Speech Recognition

Authors: Guruprasad V Ramesh, Gopinath Chennupati, Milind Rao, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo

Abstract: Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respec… ▽ More Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. We use the speaker and chapter information in the unlabeled speech dataset, Libri-Light, to simulate non-IID speaker-siloed data distributions and pre-train an LSTM encoder with the Contrastive Predictive Coding framework with FedSGD. We show that the pre-trained ASR encoder in FL performs as well as a centrally pre-trained model and produces an improvement of 12-15% (WER) compared to no pre-training. We further adapt the federated pre-trained models to a new language, French, and show a 20% (WER) improvement over no pre-training. △ Less

Submitted 7 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: Accepted at ISCA SPSC Symposium 3rd Symposium on Security and Privacy in Speech Communication, 2023

arXiv:2307.03968 [pdf, other]

Multi-Level Power Series Solution for Large Surface and Volume Electric Field Integral Equation

Authors: Y. K. Negi, N. Balakrishnan, S. M. Rao

Abstract: In this paper, we propose a new multilevel power series solution method for solving a large surface and volume electric field integral equation based H-Matrix. The proposed solution method converges in a fixed number of iterations and is solved at each level of the H-Matrix computation.The solution method avoids the computation of a full matrix, as it can be solved independently at each level, sta… ▽ More In this paper, we propose a new multilevel power series solution method for solving a large surface and volume electric field integral equation based H-Matrix. The proposed solution method converges in a fixed number of iterations and is solved at each level of the H-Matrix computation.The solution method avoids the computation of a full matrix, as it can be solved independently at each level, starting from the leaf level. Solution at each level can be used as the final solution, thus saving the matrix computation time for full H-Matrix. The paper shows that the leaf level matrix computation and solution with power series gives an accurate results as full H-Matrix iterative solver method. The method results in considerable time and memory savings compared to the H-Matrix iterative solver. Further, the proposed method retains the O(NlogN) solution complexity △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: 8 pages. The Applied Computational Electromagnetics Society Journal (ACES) 2023

arXiv:2306.12015 [pdf, other]

Federated Self-Learning with Weak Supervision for Speech Recognition

Authors: Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

Abstract: Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy. We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine t… ▽ More Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy. We study the problem of federated continual incremental learning for recurrent neural network-transducer (RNN-T) ASR models in the privacy-enhancing scheme of learning on-device, without access to ground truth human transcripts or machine transcriptions from a stronger ASR model. In particular, we study the performance of a self-learning based scheme, with a paired teacher model updated through an exponential moving average of ASR models. Further, we propose using possibly noisy weak-supervision signals such as feedback scores and natural language understanding semantics determined from user behavior across multiple turns in a session of interactions with the conversational agent. These signals are leveraged in a multi-task policy-gradient training approach to improve the performance of self-learning for ASR. Finally, we show how catastrophic forgetting can be mitigated by combining on-device learning with a memory-replay approach using selected historical datasets. These innovations allow for 10% relative improvement in WER on new use cases with minimal degradation on other test sets in the absence of strong-supervision signals such as ground-truth transcriptions. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: Proceedings of ICASSP 2023

arXiv:2306.12012 [pdf, other]

doi 10.21437/Interspeech.2023-2205

Learning When to Trust Which Teacher for Weakly Supervised ASR

Authors: Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath Chennupati, Andreas Stolcke

Abstract: Automatic speech recognition (ASR) training can utilize multiple experts as teacher models, each trained on a specific domain or accent. Teacher models may be opaque in nature since their architecture may be not be known or their training cadence is different from that of the student ASR model. Still, the student models are updated incrementally using the pseudo-labels generated independently by t… ▽ More Automatic speech recognition (ASR) training can utilize multiple experts as teacher models, each trained on a specific domain or accent. Teacher models may be opaque in nature since their architecture may be not be known or their training cadence is different from that of the student ASR model. Still, the student models are updated incrementally using the pseudo-labels generated independently by the expert teachers. In this paper, we exploit supervision from multiple domain experts in training student ASR models. This training strategy is especially useful in scenarios where few or no human transcriptions are available. To that end, we propose a Smart-Weighter mechanism that selects an appropriate expert based on the input audio, and then trains the student model in an unsupervised setting. We show the efficacy of our approach using LibriSpeech and LibriLight benchmarks and find an improvement of 4 to 25\% over baselines that uniformly weight all the experts, use a single expert model, or combine experts using ROVER. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: Proceedings of INTERSPEECH 2023

Journal ref: Proc. Interspeech, Aug. 2023, pp. 381-385

arXiv:2305.19444 [pdf]

Pixelated Interactions: Exploring Pixel Art for Graphical Primitives on a Tactile Display

Authors: Tigmanshu Bhatnagar, Vikas Upadhyay, Anchal Sharma, P V Madhusudhan Rao, Mark Miodownik, Nicolai Marquardt, Catherine Holloway

Abstract: Two-dimensional pin array tactile displays enable access to tactile graphics that are important for the education of students with visual impairments. Due to their prohibitive cost, limited access, and limited research within HCI, the rules to design graphical primitives on these low-resolution tactile displays are unclear. In this paper, eight tactile readers with visual impairments qualitatively… ▽ More Two-dimensional pin array tactile displays enable access to tactile graphics that are important for the education of students with visual impairments. Due to their prohibitive cost, limited access, and limited research within HCI, the rules to design graphical primitives on these low-resolution tactile displays are unclear. In this paper, eight tactile readers with visual impairments qualitatively evaluate the implementation of Pixel Art to create tactile graphical primitives on a pin array display. Every pin of the pin array is assumed to be a pixel on a pixel grid. Our findings suggest that Pixel Art tactile graphics on a pin array are clear and comprehensible to tactile readers, positively confirming its use to design basic tactile shapes and line segments. The guidelines provide a framework to create tactile media which implies that the guidelines can be used to downsize basic shapes for refreshable pin-array displays. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 25 pages, 10 figures. To appear in DIS'23 Designing Interactive Systems Conference, July 10 to 14, 2023, Pittsburgh, PA, USA

arXiv:2303.02043 [pdf, other]

An Integrated Real-time UAV Trajectory Optimization with Potential Field Approach for Dynamic Collision Avoidance

Authors: D. M. K. K. Venkateswara Rao, Hamed Habibi, Jose Luis Sanchez-Lopez, Holger Voos

Abstract: This paper presents an integrated approach that combines trajectory optimization and Artificial Potential Field (APF) method for real-time optimal Unmanned Aerial Vehicle (UAV) trajectory planning and dynamic collision avoidance. A minimum-time trajectory optimization problem is formulated with initial and final positions as boundary conditions and collision avoidance as constraints. It is transcr… ▽ More This paper presents an integrated approach that combines trajectory optimization and Artificial Potential Field (APF) method for real-time optimal Unmanned Aerial Vehicle (UAV) trajectory planning and dynamic collision avoidance. A minimum-time trajectory optimization problem is formulated with initial and final positions as boundary conditions and collision avoidance as constraints. It is transcribed into a nonlinear programming problem using Chebyshev pseudospectral method. The state and control histories are approximated by using Lagrange polynomials and the collocation points are used to satisfy constraints. A novel sigmoid-type collision avoidance constraint is proposed to overcome the drawbacks of Lagrange polynomial approximation in pseudospectral methods that only guarantees inequality constraint satisfaction only at nodal points. Automatic differentiation of cost function and constraints is used to quickly determine their gradient and Jacobian, respectively. An APF method is used to update the optimal control inputs for guaranteeing collision avoidance. The trajectory optimization and APF method are implemented in a closed-loop fashion continuously, but in parallel at moderate and high frequencies, respectively. The initial guess for the optimization is provided based on the previous solution. The proposed approach is tested and validated through indoor experiments. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2212.08762 [pdf, other]

doi 10.1109/TVT.2023.3347891

Iterative RNDOP-Optimal Anchor Placement for Beyond Convex Hull ToA-based Localization: Performance Bounds and Heuristic Algorithms

Authors: Raghunandan M. Rao, Don-Roberts Emenonye

Abstract: Localizing targets outside the anchors' convex hull is an understudied but prevalent scenario in vehicle-centric, UAV-based, and self-localization applications. Considering such scenarios, this paper studies the optimal anchor placement problem for Time-of-Arrival (ToA)-based localization schemes such that the worst-case Dilution of Precision (DOP) is minimized. Building on prior results on DOP sc… ▽ More Localizing targets outside the anchors' convex hull is an understudied but prevalent scenario in vehicle-centric, UAV-based, and self-localization applications. Considering such scenarios, this paper studies the optimal anchor placement problem for Time-of-Arrival (ToA)-based localization schemes such that the worst-case Dilution of Precision (DOP) is minimized. Building on prior results on DOP scaling laws for beyond convex hull ToA-based localization, we propose a novel metric termed the Range-Normalized DOP (RNDOP). We show that the worst-case DOP-optimal anchor placement problem simplifies to a min-max RNDOP-optimal anchor placement problem. Unfortunately, this formulation results in a non-convex and intractable problem under realistic constraints. To overcome this, we propose iterative anchor addition schemes, which result in a tractable albeit non-convex problem. By exploiting the structure arising from the resultant rank-1 update, we devise three heuristic schemes with varying performance-complexity tradeoffs. In addition, we also derive the upper and lower bounds for scenarios where we are placing anchors to optimize the worst-case (a) 3D positioning error and (b) 2D positioning error. We build on these results to design a cohesive iterative algorithmic framework for robust anchor placement, characterize the impact of anchor position uncertainty, and then discuss the computational complexity of the proposed schemes. Using numerical results, we validate the accuracy of our theoretical results. We also present comprehensive Monte-Carlo simulation results to compare the positioning error and execution time performance of each iterative scheme, discuss the tradeoffs, and provide valuable system design insights for beyond convex hull localization scenarios. △ Less

Submitted 17 February, 2024; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: 16 pages. To appear in a future issue of the IEEE Transactions on Vehicular Technology

arXiv:2212.07112 [pdf, other]

DialogQAE: N-to-N Question Answer Pair Extraction from Customer Service Chatlog

Authors: Xin Zheng, Tianyu Liu, Haoran Meng, Xu Wang, Yufan Jiang, Mengliang Rao, Binghuai Lin, Zhifang Sui, Yunbo Cao

Abstract: Harvesting question-answer (QA) pairs from customer service chatlog in the wild is an efficient way to enrich the knowledge base for customer service chatbots in the cold start or continuous integration scenarios. Prior work attempts to obtain 1-to-1 QA pairs from growing customer service chatlog, which fails to integrate the incomplete utterances from the dialog context for composite QA retrieval… ▽ More Harvesting question-answer (QA) pairs from customer service chatlog in the wild is an efficient way to enrich the knowledge base for customer service chatbots in the cold start or continuous integration scenarios. Prior work attempts to obtain 1-to-1 QA pairs from growing customer service chatlog, which fails to integrate the incomplete utterances from the dialog context for composite QA retrieval. In this paper, we propose N-to-N QA extraction task in which the derived questions and corresponding answers might be separated across different utterances. We introduce a suite of generative/discriminative tagging based methods with end-to-end and two-stage variants that perform well on 5 customer service datasets and for the first time setup a benchmark for N-to-N DialogQAE with utterance and session level evaluation metrics. With a deep dive into extracted QA pairs, we find that the relations between and inside the QA pairs can be indicators to analyze the dialogue structure, e.g. information seeking, clarification, barge-in and elaboration. We also show that the proposed models can adapt to different domains and languages, and reduce the labor cost of knowledge accumulation in the real-world product dialogue platform. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Preprint version; The first three authors contribute equally

arXiv:2210.12689 [pdf, other]

Face Emotion Recognization Using Dataset Augmentation Based on Neural Network

Authors: Mengyu Rao, Ruyi Bao, Liangshun Dong

Abstract: Facial expression is one of the most external indications of a person's feelings and emotions. In daily conversation, according to the psychologist, only 7% and 38% of information is communicated through words and sounds respective, while up to 55% is through facial expression. It plays an important role in coordinating interpersonal relationships. Ekman and Friesen recognized six essential emotio… ▽ More Facial expression is one of the most external indications of a person's feelings and emotions. In daily conversation, according to the psychologist, only 7% and 38% of information is communicated through words and sounds respective, while up to 55% is through facial expression. It plays an important role in coordinating interpersonal relationships. Ekman and Friesen recognized six essential emotions in the nineteenth century depending on a cross-cultural study, which indicated that people feel each basic emotion in the same fashion despite culture. As a branch of the field of analyzing sentiment, facial expression recognition offers broad application prospects in a variety of domains, including the interaction between humans and computers, healthcare, and behavior monitoring. Therefore, many researchers have devoted themselves to facial expression recognition. In this paper, an effective hybrid data augmentation method is used. This approach is operated on two public datasets, and four benchmark models see some remarkable results. △ Less

Submitted 21 November, 2022; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: 5 pages, 8 figures, 3 tables

arXiv:2207.09078 [pdf, other]

doi 10.1145/3534678.3539174

ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale

Authors: Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure

Abstract: Incremental learning is one paradigm to enable model building and updating at scale with streaming data. For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge. Motivated by these challenges, in this paper we use a cloud based framework for production systems… ▽ More Incremental learning is one paradigm to enable model building and updating at scale with streaming data. For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge. Motivated by these challenges, in this paper we use a cloud based framework for production systems to demonstrate insights from privacy preserving incremental learning for automatic speech recognition (ILASR). By privacy preserving, we mean, usage of ephemeral data which are not human annotated. This system is a step forward for production levelASR models for incremental/continual learning that offers near real-time test-bed for experimentation in the cloud for end-to-end ASR, while adhering to privacy-preserving policies. We show that the proposed system can improve the production models significantly(3%) over a new time period of six months even in the absence of human annotated labels with varying levels of weak supervision and large batch sizes in incremental learning. This improvement is 20% over test sets with new words and phrases in the new time period. We demonstrate the effectiveness of model building in a privacy-preserving incremental fashion for ASR while further exploring the utility of having an effective teacher model and use of large batch sizes. △ Less

Submitted 22 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: 9 pages

arXiv:2207.07033 [pdf, other]

Developing a Series of AI Challenges for the United States Department of the Air Force

Authors: Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron , et al. (17 additional authors not shown)

Abstract: Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requireme… ▽ More Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requirements. Several projects supported by the DAF-MIT AI Accelerator are developing public challenge problems that address numerous Federal AI research priorities. These challenges target priorities by making large, AI-ready datasets publicly available, incentivizing open-source solutions, and creating a demand signal for dual use technologies that can stimulate further research. In this article, we describe these public challenges being developed and how their application contributes to scientific advances. △ Less

Submitted 14 July, 2022; originally announced July 2022.

arXiv:2206.12980 [pdf]

Detecting Schizophrenia with 3D Structural Brain MRI Using Deep Learning

Authors: Junhao Zhang, Vishwanatha M. Rao, Ye Tian, Yanting Yang, Nicolas Acosta, Zihan Wan, Pin-Yu Lee, Chloe Zhang, Lawrence S. Kegeles, Scott A. Small, Jia Guo

Abstract: Schizophrenia is a chronic neuropsychiatric disorder that causes distinct structural alterations within the brain. We hypothesize that deep learning applied to a structural neuroimaging dataset could detect disease-related alteration and improve classification and diagnostic accuracy. We tested this hypothesis using a single, widely available, and conventional T1-weighted MRI scan, from which we e… ▽ More Schizophrenia is a chronic neuropsychiatric disorder that causes distinct structural alterations within the brain. We hypothesize that deep learning applied to a structural neuroimaging dataset could detect disease-related alteration and improve classification and diagnostic accuracy. We tested this hypothesis using a single, widely available, and conventional T1-weighted MRI scan, from which we extracted the 3D whole-brain structure using standard post-processing methods. A deep learning model was then developed, optimized, and evaluated on three open datasets with T1-weighted MRI scans of patients with schizophrenia. Our proposed model outperformed the benchmark model, which was also trained with structural MR images using a 3D CNN architecture. Our model is capable of almost perfectly (area under the ROC curve = 0.987) distinguishing schizophrenia patients from healthy controls on unseen structural MRI scans. Regional analysis localized subcortical regions and ventricles as the most predictive brain regions. Subcortical structures serve a pivotal role in cognitive, affective, and social functions in humans, and structural abnormalities of these regions have been associated with schizophrenia. Our finding corroborates that schizophrenia is associated with widespread alterations in subcortical brain structure and the subcortical structural information provides prominent features in diagnostic classification. Together, these results further demonstrate the potential of deep learning to improve schizophrenia diagnosis and identify its structural neuroimaging signatures from a single, standard T1-weighted brain MRI. △ Less

Submitted 7 July, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

Comments: 13 pages, 6 figures

arXiv:2204.08811 [pdf, other]

SmartSales: Sales Script Extraction and Analysis from Sales Chatlog

Authors: Hua Liang, Tianyu Liu, Peiyi Wang, Mengliang Rao, Yunbo Cao

Abstract: In modern sales applications, automatic script extraction and management greatly decrease the need for human labor to collect the winning sales scripts, which largely boost the success rate for sales and can be shared across the sales teams. In this work, we present the SmartSales system to serve both the sales representatives and managers to attain the sales insights from the large-scale sales ch… ▽ More In modern sales applications, automatic script extraction and management greatly decrease the need for human labor to collect the winning sales scripts, which largely boost the success rate for sales and can be shared across the sales teams. In this work, we present the SmartSales system to serve both the sales representatives and managers to attain the sales insights from the large-scale sales chatlog. SmartSales consists of three modules: 1) Customer frequently asked questions (FAQ) extraction aims to enrich the FAQ knowledge base by harvesting high quality customer question-answer pairs from the chatlog. 2) Customer objection response assists the salespeople to figure out the typical customer objections and corresponding winning sales scripts, as well as search for proper sales responses for a certain customer objection. 3) Sales manager dashboard helps sales managers to monitor whether a specific sales representative or team follows the sales standard operating procedures (SOP). The proposed prototype system is empowered by the state-of-the-art conversational intelligence techniques and has been running on the Tencent Cloud to serve the sales teams from several different areas. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: Work in progress. The first two authors contribute equally

arXiv:2203.06583 [pdf]

Bi-Sampling Approach to Classify Music Mood leveraging Raga-Rasa Association in Indian Classical Music

Authors: Mohan Rao B C, Vinayak Arkachaari, Harsha M N, Sushmitha M N, Gayathri Ramesh K K, Ullas M S, Pathi Mohan Rao, Sudha G, Narayana Darapaneni

Abstract: The impact of Music on the mood or emotion of the listener is a well-researched area in human psychology and behavioral science. In Indian classical music, ragas are the melodic structure that defines the various styles and forms of the music. Each raga has been found to evoke a specific emotion in the listener. With the advent of advanced capabilities of audio signal processing and the applicatio… ▽ More The impact of Music on the mood or emotion of the listener is a well-researched area in human psychology and behavioral science. In Indian classical music, ragas are the melodic structure that defines the various styles and forms of the music. Each raga has been found to evoke a specific emotion in the listener. With the advent of advanced capabilities of audio signal processing and the application of machine learning, the demand for intelligent music classifiers and recommenders has received increased attention, especially in the 'Music as a service' cloud applications. This paper explores a novel framework to leverage the raga-rasa association in Indian classical Music to build an intelligent classifier and its application in music recommendation system based on user's current mood and the mood they aspire to be in. △ Less

Submitted 13 March, 2022; originally announced March 2022.

arXiv:2201.08741 [pdf]

doi 10.3389/fnimg.2022.1023481

Improving Across-Dataset Brain Tissue Segmentation Using Transformer

Authors: Vishwanatha M. Rao, Zihan Wan, Soroush Arabshahi, David J. Ma, Pin-Yu Lee, Ye Tian, Xuzhe Zhang, Andrew F. Laine, Jia Guo

Abstract: Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentati… ▽ More Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentation tool. Despite the recent success of deep convolutional neural networks (CNNs) for brain tissue segmentation, many such solutions do not generalize well to new datasets, which is critical for a reliable solution. Transformers have demonstrated success in natural image segmentation and have recently been applied to 3D medical image segmentation tasks due to their ability to capture long-distance relationships in the input where the local receptive fields of CNNs struggle. This study introduces a novel CNN-Transformer hybrid architecture designed for brain tissue segmentation. We validate our model's performance across four multi-site T1w MRI datasets, covering different vendors, field strengths, scan parameters, time points, and neuropsychiatric conditions. In all situations, our model achieved the greatest generality and reliability. Out method is inherently robust and can serve as a valuable tool for brain-related T1w MRI studies. The code for the TABS network is available at: https://github.com/raovish6/TABS. △ Less

Submitted 31 January, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

ACM Class: I.4.6

arXiv:2112.03259 [pdf]

Novel Local Radiomic Bayesian Classifiers for Non-Invasive Prediction of MGMT Methylation Status in Glioblastoma

Authors: Mihir Rao

Abstract: Glioblastoma, an aggressive brain cancer, is amongst the most lethal of all cancers. Expression of the O6-methylguanine-DNA-methyltransferase (MGMT) gene in glioblastoma tumor tissue is of clinical importance as it has a significant effect on the efficacy of Temozolomide, the primary chemotherapy treatment administered to glioblastoma patients. Currently, MGMT methylation is determined through an… ▽ More Glioblastoma, an aggressive brain cancer, is amongst the most lethal of all cancers. Expression of the O6-methylguanine-DNA-methyltransferase (MGMT) gene in glioblastoma tumor tissue is of clinical importance as it has a significant effect on the efficacy of Temozolomide, the primary chemotherapy treatment administered to glioblastoma patients. Currently, MGMT methylation is determined through an invasive brain biopsy and subsequent genetic analysis of the extracted tumor tissue. In this work, we present novel Bayesian classifiers that make probabilistic predictions of MGMT methylation status based on radiomic features extracted from FLAIR-sequence magnetic resonance imagery (MRIs). We implement local radiomic techniques to produce radiomic activation maps and analyze MRIs for the MGMT biomarker based on statistical features of raw voxel-intensities. We demonstrate the ability for simple Bayesian classifiers to provide a boost in predictive performance when modelling local radiomic data rather than global features. The presented techniques provide a non-invasive MRI-based approach to determining MGMT methylation status in glioblastoma patients. △ Less

Submitted 29 November, 2021; originally announced December 2021.

arXiv:2106.15919 [pdf, other]

On joint training with interfaces for spoken language understanding

Authors: Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow

Abstract: Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances. SLU systems usually consist of (1) an automatic speech recognition (ASR) module, (2) an interface module that exposes relevant outputs from ASR, and (3) a natural language understanding (NLU) module. Interfaces in SLU systems carry information on t… ▽ More Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances. SLU systems usually consist of (1) an automatic speech recognition (ASR) module, (2) an interface module that exposes relevant outputs from ASR, and (3) a natural language understanding (NLU) module. Interfaces in SLU systems carry information on text transcriptions or richer information like neural embeddings from ASR to NLU. In this paper, we study how interfaces affect joint-training for spoken language understanding. Most notably, we obtain the state-of-the-art results on the publicly available 50-hr SLURP dataset. We first leverage large-size pretrained ASR and NLU models that are connected by a text interface, and then jointly train both models via a sequence loss function. For scenarios where pretrained models are not utilized, the best results are obtained through a joint sequence loss training using richer neural interfaces. Finally, we show the overall diminishing impact of leveraging pretrained models with increased training data size. △ Less

Submitted 25 July, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

Comments: Proc. Interspeech 2022

arXiv:2106.02357 [pdf, ps, other]

Adiabatic Quantum Feature Selection for Sparse Linear Regression

Authors: Surya Sai Teja Desu, P. K. Srijith, M. V. Panduranga Rao, Naveen Sivadasan

Abstract: Linear regression is a popular machine learning approach to learn and predict real valued outputs or dependent variables from independent variables or features. In many real world problems, its beneficial to perform sparse linear regression to identify important features helpful in predicting the dependent variable. It not only helps in getting interpretable results but also avoids overfitting whe… ▽ More Linear regression is a popular machine learning approach to learn and predict real valued outputs or dependent variables from independent variables or features. In many real world problems, its beneficial to perform sparse linear regression to identify important features helpful in predicting the dependent variable. It not only helps in getting interpretable results but also avoids overfitting when the number of features is large, and the amount of data is small. The most natural way to achieve this is by using `best subset selection' which penalizes non-zero model parameters by adding $\ell_0$ norm over parameters to the least squares loss. However, this makes the objective function non-convex and intractable even for a small number of features. This paper aims to address the intractability of sparse linear regression with $\ell_0$ norm using adiabatic quantum computing, a quantum computing paradigm that is particularly useful for solving optimization problems faster. We formulate the $\ell_0$ optimization problem as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solve it using the D-Wave adiabatic quantum computer. We study and compare the quality of QUBO solution on synthetic and real world datasets. The results demonstrate the effectiveness of the proposed adiabatic quantum computing approach in finding the optimal solution. The QUBO solution matches the optimal solution for a wide range of sparsity penalty values across the datasets. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Comments: 8 pages, 2 tables

arXiv:2105.07071 [pdf, other]

doi 10.21437/Interspeech.2021-836

Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End

Authors: Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo

Abstract: Comprehending the overall intent of an utterance helps a listener recognize the individual words spoken. Inspired by this fact, we perform a novel study of the impact of explicitly incorporating intent representations as additional information to improve a recurrent neural network-transducer (RNN-T) based automatic speech recognition (ASR) system. An audio-to-intent (A2I) model encodes the intent… ▽ More Comprehending the overall intent of an utterance helps a listener recognize the individual words spoken. Inspired by this fact, we perform a novel study of the impact of explicitly incorporating intent representations as additional information to improve a recurrent neural network-transducer (RNN-T) based automatic speech recognition (ASR) system. An audio-to-intent (A2I) model encodes the intent of the utterance in the form of embeddings or posteriors, and these are used as auxiliary inputs for RNN-T training and inference. Experimenting with a 50k-hour far-field English speech corpus, this study shows that when running the system in non-streaming mode, where intent representation is extracted from the entire utterance and then used to bias streaming RNN-T search from the start, it provides a 5.56% relative word error rate reduction (WERR). On the other hand, a streaming system using per-frame intent posteriors as extra inputs for the RNN-T ASR system yields a 3.33% relative WERR. A further detailed analysis of the streaming system indicates that our proposed method brings especially good gain on media-playing related intents (e.g. 9.12% relative WERR on PlayMusicIntent). △ Less

Submitted 16 June, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

Comments: To appear in Interspeech 2021

Journal ref: Proc. Interspeech, Sept. 2021, pp. 3455-3459

arXiv:2103.11890 [pdf, other]

Coexistence of Communications and Cognitive MIMO Radar: Waveform Design and Prototype

Authors: Mohammad Alaee-Kerahroodi, Ehsan Raei, Sumit Kumar, Bhavani Shankar Mysore Rama Rao

Abstract: New generation of radar systems will need to coexist with other radio frequency (RF) systems, anticipating their behavior and reacting appropriately to avoid interference. In light of this requirement, this paper designs, implements, and evaluates the performance of phase-only sequences (with constant power) for intelligent spectrum utilization using the custom built cognitive Multiple Input Multi… ▽ More New generation of radar systems will need to coexist with other radio frequency (RF) systems, anticipating their behavior and reacting appropriately to avoid interference. In light of this requirement, this paper designs, implements, and evaluates the performance of phase-only sequences (with constant power) for intelligent spectrum utilization using the custom built cognitive Multiple Input Multiple Output (MIMO) radar prototype. The proposed transmit waveforms avoid the frequency bands occupied by narrowband interferers or communication links, while simultaneously have a small cross-correlation among each other to enable their separability at the MIMO radar receiver. The performance of the optimized set of sequences obtained through solving a non-convex bi-objective optimization problem, is compared with the state-of-the-art counterparts, and its applicability is illustrated by the developed prototype. A realistic Long Term Evolution (LTE) downlink is used for the communications, and the real-time system implementation is validated and evaluated through the throughput calculations for communications and the detection performance measurement for the radar system. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 13 pages, 17 figures,

arXiv:2102.06750 [pdf, other]

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

Authors: Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Abstract: Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems. SLU models, which either directly extract semantics from audio or are composed of pipelined automatic speech recognition (ASR) and natural language understanding (NLU) models, are typically trained via differentia… ▽ More Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems. SLU models, which either directly extract semantics from audio or are composed of pipelined automatic speech recognition (ASR) and natural language understanding (NLU) models, are typically trained via differentiable cross-entropy losses, even when the relevant performance metrics of interest are word or semantic error rates. In this work, we propose non-differentiable sequence losses based on SLU metrics as a proxy for semantic error and use the REINFORCE trick to train ASR and SLU models with this loss. We show that custom sequence loss training is the state-of-the-art on open SLU datasets and leads to 6% relative improvement in both ASR and NLU performance metrics on large proprietary datasets. We also demonstrate how the semantic sequence loss training paradigm can be used to update ASR and SLU models without transcripts, using semantic feedback alone. △ Less

Submitted 12 February, 2021; originally announced February 2021.

Comments: Proc. IEEE ICASSP 2021

arXiv:2101.02573 [pdf, other]

RANK: AI-assisted End-to-End Architecture for Detecting Persistent Attacks in Enterprise Networks

Authors: Hazem M. Soliman, Geoff Salmon, Dušan Sovilj, Mohan Rao

Abstract: Advanced Persistent Threats (APTs) are sophisticated multi-step attacks, planned and executed by skilled adversaries targeting modern government and enterprise networks. Intrusion Detection Systems (IDSs) and User and Entity Behavior Analytics (UEBA) are commonly employed to aid a security analyst in the detection of APTs. The prolonged nature of APTs, combined with the granular focus of UEBA and… ▽ More Advanced Persistent Threats (APTs) are sophisticated multi-step attacks, planned and executed by skilled adversaries targeting modern government and enterprise networks. Intrusion Detection Systems (IDSs) and User and Entity Behavior Analytics (UEBA) are commonly employed to aid a security analyst in the detection of APTs. The prolonged nature of APTs, combined with the granular focus of UEBA and IDS, results in overwhelming the analyst with an increasingly impractical number of alerts. Consequent to this abundance of data, and together with the crucial importance of the problem as well as the high cost of the skilled personnel involved, the problem of APT detection becomes a perfect candidate for automation through Artificial Intelligence (AI). In this paper, we provide, up to our knowledge, the first study and implementation of an end-to-end AI-assisted architecture for detecting APTs -- RANK. The goal of the system is not to replace the analyst, rather, it is to automate the complete pipeline from data sources to a final set of incidents for analyst review. The architecture is composed of four consecutive steps: 1) alert templating and merging, 2) alert graph construction, 3) alert graph partitioning into incidents, and 4) incident scoring and ordering. We evaluate our architecture against the 2000 DARPA Intrusion Detection dataset, as well as a read-world private dataset from a medium-scale enterprise. Extensive results are provided showing a three order of magnitude reduction in the amount of data to be reviewed by the analyst, innovative extraction of incidents and security-wise scoring of extracted incidents. △ Less

Submitted 6 January, 2021; originally announced January 2021.

arXiv:2011.06455 [pdf]

doi 10.1098/rsos.210429

Optimal governance and implementation of vaccination programmes to contain the COVID-19 pandemic

Authors: Mahendra Piraveenan, Shailendra Sawleshwarkar, Michael Walsh, Iryna Zablotska, Samit Bhattacharyya, Habib Hassan Farooqui, Tarun Bhatnagar, Anup Karan, Manoj Murhekar, Sanjay Zodpey, K. S. Mallikarjuna Rao, Philippa Pattison, Albert Zomaya, Matjaz Perc

Abstract: Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their… ▽ More Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their availability and the human resources needed to run the vaccination programmes have been scarce in many countries. Vaccine hesitancy is also being encountered from some sections of the general public. We emphasize that decision-making under uncertainty and imperfect information, and with only conditionally optimal outcomes, is a unique forte of established game-theoretic modelling. Therefore, we can use this approach to obtain the best framework for modelling and simulating vaccination prioritization and uptake that will be readily available to inform important policy decisions for the optimal control of the COVID-19 pandemic. △ Less

Submitted 9 June, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

Comments: 15 pages, 1 figure; published in Royal Society Open Science

Journal ref: R. Soc. Open Sci. 8, 210429 (2021)

arXiv:2010.11692 [pdf, other]

Conversion and Implementation of State-of-the-Art Deep Learning Algorithms for the Classification of Diabetic Retinopathy

Authors: Mihir Rao, Michelle Zhu, Tianyang Wang

Abstract: Diabetic retinopathy (DR) is a retinal microvascular condition that emerges in diabetic patients. DR will continue to be a leading cause of blindness worldwide, with a predicted 191.0 million globally diagnosed patients in 2030. Microaneurysms, hemorrhages, exudates, and cotton wool spots are common signs of DR. However, they can be small and hard for human eyes to detect. Early detection of DR is… ▽ More Diabetic retinopathy (DR) is a retinal microvascular condition that emerges in diabetic patients. DR will continue to be a leading cause of blindness worldwide, with a predicted 191.0 million globally diagnosed patients in 2030. Microaneurysms, hemorrhages, exudates, and cotton wool spots are common signs of DR. However, they can be small and hard for human eyes to detect. Early detection of DR is crucial for effective clinical treatment. Existing methods to classify images require much time for feature extraction and selection, and are limited in their performance. Convolutional Neural Networks (CNNs), as an emerging deep learning (DL) method, have proven their potential in image classification tasks. In this paper, comprehensive experimental studies of implementing state-of-the-art CNNs for the detection and classification of DR are conducted in order to determine the top performing classifiers for the task. Five CNN classifiers, namely Inception-V3, VGG19, VGG16, ResNet50, and InceptionResNetV2, are evaluated through experiments. They categorize medical images into five different classes based on DR severity. Data augmentation and transfer learning techniques are applied since annotated medical images are limited and imbalanced. Experimental results indicate that the ResNet50 classifier has top performance for binary classification and that the InceptionResNetV2 classifier has top performance for multi-class DR classification. △ Less

Submitted 7 October, 2020; originally announced October 2020.

Comments: Pre-print version (in-review)

arXiv:2010.04777 [pdf, other]

A Graph Neural Network Approach for Scalable and Dynamic IP Similarity in Enterprise Networks

Authors: Hazem M. Soliman, Geoff Salmon, Dusan Sovilij, Mohan Rao

Abstract: Measuring similarity between IP addresses is an important task in the daily operations of any enterprise network. Applications that depend on an IP similarity measure include measuring correlation between security alerts, building baselines for behavioral modelling, debugging network failures and tracking persistent attacks. However, IPs do not have a natural similarity measure by definition. Deep… ▽ More Measuring similarity between IP addresses is an important task in the daily operations of any enterprise network. Applications that depend on an IP similarity measure include measuring correlation between security alerts, building baselines for behavioral modelling, debugging network failures and tracking persistent attacks. However, IPs do not have a natural similarity measure by definition. Deep Learning architectures are a promising solution here since they are able to learn numerical representations for IPs directly from data, allowing various distance measures to be applied on the calculated representations. Current works have utilized Natural Language Processing (NLP) techniques for learning IP embeddings. However, these approaches have no proper way to handle out-of-vocabulary (OOV) IPs not seen during training. In this paper, we propose a novel approach for IP embedding using an adapted graph neural network (GNN) architecture. This approach has the advantages of working on the raw data, scalability and, most importantly, induction, i.e. the ability to measure similarity between previously unseen IPs. Using data from an enterprise network, our approach is able to identify similarities between local DNS servers and root DNS servers even though some of these machines are never encountered during the training phase. △ Less

Submitted 9 October, 2020; originally announced October 2020.

arXiv:2008.06173 [pdf, other]

doi 10.21437/Interspeech.2020-2976

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

Authors: Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow

Abstract: We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening… ▽ More We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening up the opportunity to deploy on hardware constrained scenarios like devices enabling voice assistants to work offline, in a privacy preserving manner, whilst also reducing server costs. We first present models that extract utterance intent directly from speech without intermediate text output. We then present a compositional model, which generates the transcript using the Listen Attend Spell ASR system and then extracts interpretation using a neural NLU model. Finally, we contrast these methods to a jointly trained end-to-end joint SLU model, consisting of ASR and NLU subsystems which are connected by a neural network based interface instead of text, that produces transcripts as well as NLU interpretation. We show that the jointly trained model shows improvements to ASR incorporating semantic information from NLU and also improves NLU by exposing it to ASR confusion encoded in the hidden layer. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: Proceedings of INTERSPEECH

ACM Class: I.2.7

Journal ref: Proc. Interspeech 2020, 876-880 (2020)

arXiv:2008.02100 [pdf, other]

doi 10.1109/TWC.2021.3081458

Underlay Radar-Massive MIMO Spectrum Sharing: Modeling Fundamentals and Performance Analysis

Authors: Raghunandan M. Rao, Harpreet S. Dhillon, Vuk Marojevic, Jeffrey H. Reed

Abstract: In this work, we study underlay radar-massive MIMO cellular coexistence in LoS/near-LoS channels, where both systems have 3D beamforming capabilities. Using mathematical tools from stochastic geometry, we derive an upper bound on the average interference power at the radar due to the 3D massive MIMO cellular downlink under the worst-case `cell-edge beamforming' conditions. To overcome the technica… ▽ More In this work, we study underlay radar-massive MIMO cellular coexistence in LoS/near-LoS channels, where both systems have 3D beamforming capabilities. Using mathematical tools from stochastic geometry, we derive an upper bound on the average interference power at the radar due to the 3D massive MIMO cellular downlink under the worst-case `cell-edge beamforming' conditions. To overcome the technical challenges imposed by asymmetric and arbitrarily large cells, we devise a novel construction in which each Poisson Voronoi (PV) cell is bounded by its circumcircle to bound the effect of the random cell shapes on average interference. Since this model is intractable for further analysis due to the correlation between adjacent PV cells' shapes and sizes, we propose a tractable nominal interference model, where we model each PV cell as a circular disk with an area equal to the average area of the typical cell. We quantify the gap in the average interference power between these two models and show that the upper bound is tight for realistic deployment parameters. We also compare them with a more practical but intractable MU-MIMO scheduling model to show that our worst-case interference models show the same trends and do not deviate significantly from realistic scheduler models. Under the nominal interference model, we characterize the interference distribution using the dominant interferer approximation by deriving the equi-interference contour expression when the typical receiver uses 3D beamforming. Finally, we use tractable expressions for the interference distribution to characterize radar's spatial probability of false alarm/detection in a quasi-static target tracking scenario. Our results reveal useful trends in the average interference as a function of the deployment parameters (BS density, exclusion zone radius, antenna height, transmit power of each BS, etc.). △ Less

Submitted 16 May, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: This arXiv manuscript subsumes the contents of the conference paper presented at the 2019 IEEE Global Communications Conference (Globecom), Waikoloa, HI. The conference version is available at arXiv:1907.09536

arXiv:2006.01327 [pdf, other]

doi 10.1109/TVT.2020.3001911

Semi-Blind Post-Equalizer SINR Estimation and Dual CSI Feedback for Radar-Cellular Coexistence

Authors: Raghunandan M. Rao, Vuk Marojevic, Jeffrey H. Reed

Abstract: Current cellular systems use pilot-aided statistical-channel state information (S-CSI) estimation and limited feedback schemes to aid in link adaptation and scheduling decisions. However, in the presence of pulsed radar signals, pilot-aided S-CSI is inaccurate since interference statistics on pilot and non-pilot resources can be different. Moreover, the channel will be bimodal as a result of the p… ▽ More Current cellular systems use pilot-aided statistical-channel state information (S-CSI) estimation and limited feedback schemes to aid in link adaptation and scheduling decisions. However, in the presence of pulsed radar signals, pilot-aided S-CSI is inaccurate since interference statistics on pilot and non-pilot resources can be different. Moreover, the channel will be bimodal as a result of the periodic interference. In this paper, we propose a max-min heuristic to estimate the post-equalizer SINR in the case of non-pilot pulsed radar interference, and characterize its distribution as a function of noise variance and interference power. We observe that the proposed heuristic incurs low computational complexity, and is robust beyond a certain SINR threshold for different modulation schemes, especially for QPSK. This enables us to develop a comprehensive semi-blind framework to estimate the wideband SINR metric that is commonly used for S-CSI quantization in 3GPP Long-Term Evolution (LTE) and New Radio (NR) networks. Finally, we propose dual CSI feedback for practical radar-cellular spectrum sharing, to enable accurate CSI acquisition in the bimodal channel. We demonstrate significant improvements in throughput, block error rate and retransmission-induced latency for LTE-Advanced Pro when compared to conventional pilot-aided S-CSI estimation and limited feedback schemes. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: 33 pages, 26 figures

arXiv:2005.00122 [pdf, other]

Probability of Pilot Interference in Pulsed Radar-Cellular Coexistence: Fundamental Insights on Demodulation and Limited CSI Feedback

Authors: Raghunandan M. Rao, Vuk Marojevic, Jeffrey H. Reed

Abstract: This paper considers an underlay pulsed radar-cellular spectrum sharing scenario, where the cellular system uses pilot-aided demodulation, statistical channel state information (S-CSI) estimation and limited feedback schemes. Under a realistic system model, upper and lower bounds are derived on the probability that at least a specified number of pilot signals are interfered by a radar pulse train… ▽ More This paper considers an underlay pulsed radar-cellular spectrum sharing scenario, where the cellular system uses pilot-aided demodulation, statistical channel state information (S-CSI) estimation and limited feedback schemes. Under a realistic system model, upper and lower bounds are derived on the probability that at least a specified number of pilot signals are interfered by a radar pulse train in a finite CSI estimation window. Exact probabilities are also derived for important special cases which reveal operational regimes where the lower bound is achieved. Using these results, this paper (a) provides insights on pilot interference-minimizing schemes for accurate coherent symbol demodulation, and (b) demonstrates that pilot-aided methods fail to accurately estimate S-CSI of the pulsed radar interference channel for a wide range of radar repetition intervals. △ Less

Submitted 30 April, 2020; originally announced May 2020.

Comments: 13 pages, 5 figures

arXiv:2002.04638 [pdf, other]

A polynomial time parallel algorithm for graph isomorphism using a quasipolynomial number of processors

Authors: Duc Hung Pham, Krishna V. Palem, M. V. Panduranga Rao

Abstract: The Graph Isomorphism (GI) problem is a theoretically interesting problem because it has not been proven to be in P nor to be NP-complete. Babai made a breakthrough in 2015 when announcing a quasipolynomial time algorithm for GI problem. Babai's work gives the most theoretically efficient algorithm for GI, as well as a strong evidence favoring the idea that class GI $\ne$ NP and thus P $\ne$ NP. B… ▽ More The Graph Isomorphism (GI) problem is a theoretically interesting problem because it has not been proven to be in P nor to be NP-complete. Babai made a breakthrough in 2015 when announcing a quasipolynomial time algorithm for GI problem. Babai's work gives the most theoretically efficient algorithm for GI, as well as a strong evidence favoring the idea that class GI $\ne$ NP and thus P $\ne$ NP. Based on Babai's algorithm, we prove that GI can further be solved by a parallel algorithm that runs in polynomial time using a quasipolynomial number of processors. We achieve that result by identifying the bottlenecks in Babai's algorithms and parallelizing them. In particular, we prove that color refinement can be computed in parallel logarithmic time using a polynomial number of processors, and the $k$-dimensional WL refinement can be computed in parallel polynomial time using a quasipolynomial number of processors. Our work suggests that Graph Isomorphism and GI-complete problems can be computed efficiently in a parallel computer, and provides insights on speeding up parallel GI programs in practice. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: ICALP conference submission preprint

arXiv:1911.10478 [pdf, other]

The Bouquet Algorithm for Model Checking Unbounded Until

Authors: Shiraj Arora, M. V. Panduranga Rao

Abstract: The problem of verifying the "Unbounded Until" fragment in temporal logic formulas has been studied extensively in the past, especially in the context of statistical model checking. Statistical model checking, a computationally inexpensive sampling based alternative to the more expensive numerical model checking technique, presents the following decision dilemma -- what length of the sample is eno… ▽ More The problem of verifying the "Unbounded Until" fragment in temporal logic formulas has been studied extensively in the past, especially in the context of statistical model checking. Statistical model checking, a computationally inexpensive sampling based alternative to the more expensive numerical model checking technique, presents the following decision dilemma -- what length of the sample is enough in general? In this paper, we discuss an algorithm for this problem that combines ideas from graph theory, statistical model checking and numerical model checking. We analyze the algorithm and show through experiments that this approach outperforms the standard statistical model checking algorithm for verifying unbounded until for low density Discrete Time Markov Chains. △ Less

Submitted 24 November, 2019; originally announced November 2019.

arXiv:1908.04902 [pdf, ps, other]

doi 10.1016/j.disc.2022.112986

Planar graphs without normally adjacent short cycles

Authors: Fangyao Lu, Mengjiao Rao, Qianqian Wang, Tao Wang

Abstract: Let $\mathscr{G}$ be the class of plane graphs without triangles normally adjacent to $8^{-}$-cycles, without $4$-cycles normally adjacent to $6^{-}$-cycles, and without normally adjacent $5$-cycles. In this paper, it is shown that every graph in $\mathscr{G}$ is $3$-choosable. Instead of proving this result, we directly prove a stronger result in the form of ``weakly'' DP-$3$-coloring. The main t… ▽ More Let $\mathscr{G}$ be the class of plane graphs without triangles normally adjacent to $8^{-}$-cycles, without $4$-cycles normally adjacent to $6^{-}$-cycles, and without normally adjacent $5$-cycles. In this paper, it is shown that every graph in $\mathscr{G}$ is $3$-choosable. Instead of proving this result, we directly prove a stronger result in the form of ``weakly'' DP-$3$-coloring. The main theorem improves the results in [J. Combin. Theory Ser. B 129 (2018) 38--54; European J. Combin. 82 (2019) 102995]. Consequently, every planar graph without $4$-, $6$-, $8$-cycles is $3$-choosable, and every planar graph without $4$-, $5$-, $7$-, $8$-cycles is $3$-choosable. In the third section, using almost the same technique, we prove that the vertex set of every graph in $\mathscr{G}$ can be partitioned into an independent set and a set that induces a forest, which strengthens the result in [Discrete Appl. Math. 284 (2020) 626--630]. In the final section, tightness is discussed. △ Less

Submitted 10 June, 2022; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: 17 pages, 3 figures

MSC Class: 05C15

Journal ref: Discrete Mathematics, 345 (2022) 112986

arXiv:1907.09536 [pdf, other]

Analysis of Worst-Case Interference in Underlay Radar-Massive MIMO Spectrum Sharing Scenarios

Authors: Raghunandan M. Rao, Harpeet S. Dhillon, Vuk Marojevic, Jeffrey H. Reed

Abstract: In this paper, we consider an underlay radar-massive MIMO spectrum sharing scenario in which massive MIMO base stations (BSs) are allowed to operate outside a circular exclusion zone centered at the radar. Modeling the locations of the massive MIMO BSs as a homogeneous Poisson point process (PPP), we derive an analytical expression for a tight upper bound on the average interference at the radar d… ▽ More In this paper, we consider an underlay radar-massive MIMO spectrum sharing scenario in which massive MIMO base stations (BSs) are allowed to operate outside a circular exclusion zone centered at the radar. Modeling the locations of the massive MIMO BSs as a homogeneous Poisson point process (PPP), we derive an analytical expression for a tight upper bound on the average interference at the radar due to cellular transmissions. The technical novelty is in bounding the worst-case elevation angle for each massive MIMO BS for which we devise a novel construction based on the circumradius distribution of a typical Poisson-Voronoi (PV) cell. While these worst-case elevation angles are correlated for neighboring BSs due to the structure of the PV tessellation, it does not explicitly appear in our analysis because of our focus on the average interference. We also provide an estimate of the nominal average interference by approximating each cell as a circle with area equal to the average area of the typical cell. Using these results, we demonstrate that the gap between the two results remains approximately constant with respect to the exclusion zone radius. Our analysis reveals useful trends in average interference power, as a function of key deployment parameters such as radar/BS antenna heights, number of antenna elements per radar/BS, BS density, and exclusion zone radius. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: 6 pages, 3 figures

arXiv:1904.09984 [pdf, other]

IOArbiter: Dynamic Provisioning of Backend Block Storage in the Cloud

Authors: Moo-Ryong Ra, Hee Won Lee

Abstract: With the advent of virtualization technology, cloud computing realizes on-demand computing. The capability of dynamic resource provisioning is a fundamental driving factor for users to adopt the cloud technology. The aspect is important for cloud service providers to optimize the expense for running the infrastructure as well. Despite many technological advances in related areas, however, it is st… ▽ More With the advent of virtualization technology, cloud computing realizes on-demand computing. The capability of dynamic resource provisioning is a fundamental driving factor for users to adopt the cloud technology. The aspect is important for cloud service providers to optimize the expense for running the infrastructure as well. Despite many technological advances in related areas, however, it is still the case that the infrastructure providers must decide hardware configuration before deploying a cloud infrastructure, especially from the storage's perspective. This static nature of the storage provisioning practice can cause many problems in meeting tenant requirements, which often come later into the picture. In this paper, we propose a system called IOArbiter that enables the dynamic creation of underlying storage implementation in the cloud. IOArbiter defers storage provisioning to the time at which a tenant actually requests a storage space. As a result, an underlying storage implementation, e.g., RAID-5, 6 or Ceph storage pool with 6+3 erasure coding, will be materialized at the volume creation time. Using our prototype implementation with Openstack Cinder, we show that IOArbiter can simultaneously satisfy a number of different tenant demands, which may not be possible with a static configuration strategy. Additionally QoS mechanisms such as admission control and dynamic throttling help the system mitigate a noisy neighbor problem significantly. △ Less

Submitted 23 April, 2019; originally announced April 2019.

Comments: 7 pages, 3 figures

arXiv:1904.03710 [pdf, other]

Planar Geometry and Image Recovery from Motion-Blur

Authors: Kuldeep Purohit, Subeesh Vasu, M. Purnachandra Rao, A. N. Rajagopalan

Abstract: Existing works on motion deblurring either ignore the effects of depth-dependent blur or work with the assumption of a multi-layered scene wherein each layer is modeled in the form of fronto-parallel plane. In this work, we consider the case of 3D scenes with piecewise planar structure i.e., a scene that can be modeled as a combination of multiple planes with arbitrary orientations. We first propo… ▽ More Existing works on motion deblurring either ignore the effects of depth-dependent blur or work with the assumption of a multi-layered scene wherein each layer is modeled in the form of fronto-parallel plane. In this work, we consider the case of 3D scenes with piecewise planar structure i.e., a scene that can be modeled as a combination of multiple planes with arbitrary orientations. We first propose an approach for estimation of normal of a planar scene from a single motion blurred observation. We then develop an algorithm for automatic recovery of number of planes, the parameters corresponding to each plane, and camera motion from a single motion blurred image of a multiplanar 3D scene. Finally, we propose a first-of-its-kind approach to recover the planar geometry and latent image of the scene by adopting an alternating minimization framework built on our findings. Experiments on synthetic and real data reveal that our proposed method achieves state-of-the-art results. △ Less

Submitted 6 February, 2022; v1 submitted 7 April, 2019; originally announced April 2019.

arXiv:1902.04067 [pdf, other]

Multi-tier Caching Analysis in CDN-based Over-the-top Video Streaming Systems

Authors: Abubakr O. Al-Abbasi, Vaneet Aggarwal, Moo-Ryong Ra

Abstract: Internet video traffic has been been rapidly increasing and is further expected to increase with the emerging 5G applications such as higher definition videos, IoT and augmented/virtual reality applications. As end-users consume video in massive amounts and in an increasing number of ways, the content distribution network (CDN) should be efficiently managed to improve the system efficiency. The st… ▽ More Internet video traffic has been been rapidly increasing and is further expected to increase with the emerging 5G applications such as higher definition videos, IoT and augmented/virtual reality applications. As end-users consume video in massive amounts and in an increasing number of ways, the content distribution network (CDN) should be efficiently managed to improve the system efficiency. The streaming service can include multiple caching tiers, at the distributed servers and the edge routers, and efficient content management at these locations affect the quality of experience (QoE) of the end users. In this paper, we propose a model for video streaming systems, typically composed of a centralized origin server, several CDN sites, and edge-caches located closer to the end user. We comprehensively consider different systems design factors including the limited caching space at the CDN sites, allocation of CDN for a video request, choice of different ports (or paths) from the CDN and the central storage, bandwidth allocation, the edge-cache capacity, and the caching policy. We focus on minimizing a performance metric, stall duration tail probability (SDTP), and present a novel and efficient algorithm accounting for the multiple design flexibilities. The theoretical bounds with respect to the SDTP metric are also analyzed and presented. The implementation on a virtualized cloud system managed by Openstack demonstrate that the proposed algorithms can significantly improve the SDTP metric, compared to the baseline strategies. △ Less

Submitted 10 February, 2019; originally announced February 2019.

Comments: Accepted to IEEE/ACM TON, 2019. arXiv admin note: substantial text overlap with arXiv:1807.01147

arXiv:1901.02574 [pdf, other]

Analysis of Non-Pilot Interference on Link Adaptation and Latency in Cellular Networks

Authors: Raghunandan M. Rao, Vuk Marojevic, Jeffrey H. Reed

Abstract: Modern wireless systems such as the Long-Term Evolution (LTE) and 5G New Radio (5G NR) use pilot-aided SINR estimates to adapt the transmission mode and the modulation and coding scheme (MCS) of data transmissions, maximizing the utility of the wireless channel capacity. However, when interference is localized exclusively on non-pilot resources, pilot-aided SINR estimates become inaccurate. We sho… ▽ More Modern wireless systems such as the Long-Term Evolution (LTE) and 5G New Radio (5G NR) use pilot-aided SINR estimates to adapt the transmission mode and the modulation and coding scheme (MCS) of data transmissions, maximizing the utility of the wireless channel capacity. However, when interference is localized exclusively on non-pilot resources, pilot-aided SINR estimates become inaccurate. We show that this leads to congestion due to retransmissions, and in the worst case, outage due to very high block error rate (BLER). We demonstrate this behavior through numerical as well as experimental results with the 4G LTE downlink, which show high BLER and significant throughput detriment in the presence of non-pilot interference (NPI). To provide useful insights on the impact of NPI on low-latency communications, we derive an approximate relation between the retransmission-induced latency and BLER. Our results show that NPI can severely compromise low-latency applications such as vehicle-to-vehicle (V2V) communications and 5G NR. We identify robust link adaptation schemes as the key to reliable communications. △ Less

Submitted 8 January, 2019; originally announced January 2019.

Comments: 6 pages, 9 figures, accepted for publication at the 89th IEEE Vehicular Technology Conference (IEEE VTC Spring 2019)

arXiv:1810.12896 [pdf, other]

doi 10.23638/DMTCS-21-1-9

The 2-domination and Roman domination numbers of grid graphs

Authors: Michaël Rao, Alexandre Talon

Abstract: We investigate the 2-domination number for grid graphs, that is the size of a smallest set $D$ of vertices of the grid such that each vertex of the grid belongs to $D$ or has at least two neighbours in $D$. We give a closed formula giving the 2-domination number of any $n \!\times\! m$ grid, hereby confirming the results found by Lu and Xu, and Shaheen et al. for $n \leq 4$ and slightly correct th… ▽ More We investigate the 2-domination number for grid graphs, that is the size of a smallest set $D$ of vertices of the grid such that each vertex of the grid belongs to $D$ or has at least two neighbours in $D$. We give a closed formula giving the 2-domination number of any $n \!\times\! m$ grid, hereby confirming the results found by Lu and Xu, and Shaheen et al. for $n \leq 4$ and slightly correct the value of Shaheen et al. for $n = 5$. The proof relies on some dynamic programming algorithms, using transfer matrices in (min,+)-algebra. We also apply the method to solve the Roman domination problem on grid graphs. △ Less

Submitted 17 May, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

Comments: 11 pages, 5 figures, presented at ICGT 2018 The program that led to the results is included in the Source directory (see Other formats) Accepted in DMTCS vol 21. Journal version with their template

Journal ref: Discrete Mathematics & Theoretical Computer Science, vol. 21 no. 1, ICGT 2018 (May 23, 2019) dmtcs:4952

arXiv:1810.12457 [pdf, ps, other]

Distributed Convex Optimization With Limited Communications

Authors: Milind Rao, Stefano Rini, Andrea Goldsmith

Abstract: In this paper, a distributed convex optimization algorithm, termed \emph{distributed coordinate dual averaging} (DCDA) algorithm, is proposed. The DCDA algorithm addresses the scenario of a large distributed optimization problem with limited communication among nodes in the network. Currently known distributed subgradient methods, such as the distributed dual averaging or the distributed alternati… ▽ More In this paper, a distributed convex optimization algorithm, termed \emph{distributed coordinate dual averaging} (DCDA) algorithm, is proposed. The DCDA algorithm addresses the scenario of a large distributed optimization problem with limited communication among nodes in the network. Currently known distributed subgradient methods, such as the distributed dual averaging or the distributed alternating direction method of multipliers algorithms, assume that nodes can exchange messages of large cardinality. Such network communication capabilities are not valid in many scenarios of practical relevance. In the DCDA algorithm, on the other hand, communication of each coordinate of the optimization variable is restricted over time. For the proposed algorithm, we bound the rate of convergence under different communication protocols and network architectures. We also consider the extensions to the case of imperfect gradient knowledge and the case in which transmitted messages are corrupted by additive noise or are quantized. Relevant numerical simulations are also provided. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: Extended version of submission to IEEE ICASSP 2019

Showing 1–50 of 94 results for author: Rao, M