-
On the Performance of IRS-Assisted SSK and RPM over Rician Fading Channels
Authors:
Harsh Raj,
Ugrasen Singh,
B. R. Manoj
Abstract:
This paper presents the index modulation, that is, the space-shift keying (SSK) and reflection phase modulation (RPM) schemes for intelligent reflecting surface (IRS)-assisted wireless network. IRS simultaneously reflects the incoming information signal from the base station and explicitly encodes the local information bits in the reflection phase shift of IRS elements. The phase shift of the IRS…
▽ More
This paper presents the index modulation, that is, the space-shift keying (SSK) and reflection phase modulation (RPM) schemes for intelligent reflecting surface (IRS)-assisted wireless network. IRS simultaneously reflects the incoming information signal from the base station and explicitly encodes the local information bits in the reflection phase shift of IRS elements. The phase shift of the IRS elements is employed according to local data from the RPM constellation. A joint detection using a maximum-likelihood (ML) decoder is performed for the SSK and RPM symbols over a realistic fading scenario modeled as the Rician fading channel. The pairwise error probability over Rician fading channels is derived and utilized to determine the average bit error rate. In addition, the ergodic capacity of the presented system is derived. The derived analytical results are verified and are in exact agreement with Monte-Carlo simulations.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Human-Robot Co-Transportation with Human Uncertainty-Aware MPC and Pose Optimization
Authors:
Al Jaber Mahmud,
Amir Hossain Raj,
Duc M. Nguyen,
Xuesu Xiao,
Xuan Wang
Abstract:
This paper proposes a new control algorithm for human-robot co-transportation based on a robot manipulator equipped with a mobile base and a robotic arm. The primary focus is to adapt to human uncertainties through the robot's whole-body dynamics and pose optimization. We introduce an augmented Model Predictive Control (MPC) formulation that explicitly models human uncertainties and contains extra…
▽ More
This paper proposes a new control algorithm for human-robot co-transportation based on a robot manipulator equipped with a mobile base and a robotic arm. The primary focus is to adapt to human uncertainties through the robot's whole-body dynamics and pose optimization. We introduce an augmented Model Predictive Control (MPC) formulation that explicitly models human uncertainties and contains extra variables than regular MPC to optimize the pose of the robotic arm. The core of our methodology involves a two-step iterative design: At each planning horizon, we select the best pose of the robotic arm (joint angle combination) from a candidate set, aiming to achieve the lowest estimated control cost. This selection is based on solving an uncertainty-aware Discrete Algebraic Ricatti Equation (DARE), which also informs the optimal control inputs for both the mobile base and the robotic arm. To validate the effectiveness of the proposed approach, we provide theoretical derivation for the uncertainty-aware DARE and perform simulated and proof-of-concept hardware experiments using a Fetch robot under varying conditions, including different nominal trajectories and noise levels. The results reveal that our proposed approach outperforms baseline algorithms, maintaining similar execution time with that do not consider human uncertainty or do not perform pose optimization.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
EmpowerAbility: A portal for employment & scholarships for differently-abled
Authors:
Himanshu Raj,
Shubham Kumar,
J Kalaivani
Abstract:
The internet has become a vital resource for job seekers in today's technologically advanced world, particularly for those with impairments. They mainly rely on internet resources to find jobs that fit their particular requirements and skill set. Though some disabled candidates receive prompt responses and job offers, others find it difficult to traverse the intricate world of job portals, the eff…
▽ More
The internet has become a vital resource for job seekers in today's technologically advanced world, particularly for those with impairments. They mainly rely on internet resources to find jobs that fit their particular requirements and skill set. Though some disabled candidates receive prompt responses and job offers, others find it difficult to traverse the intricate world of job portals, the efficacy of this process frequently varies. This discrepancy results from a typical error: a failure to completely comprehend and utilize the accessibility features and functions that can significantly expedite and simplify the job search process for people with impairments.This project is a job and scholarship portal that empowers individuals with diverse abilities. Through inspiring success stories, user-centric features, and practical opportunities, it fosters resilience and inclusivity while reshaping narratives. This platform's dual-pronged strategy instills pride and offers real-world solutions, making a lasting impact on the lives it touches.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning
Authors:
Zifan Xu,
Amir Hossain Raj,
Xuesu Xiao,
Peter Stone
Abstract:
Recent advances of locomotion controllers utilizing deep reinforcement learning (RL) have yielded impressive results in terms of achieving rapid and robust locomotion across challenging terrain, such as rugged rocks, non-rigid ground, and slippery surfaces. However, while these controllers primarily address challenges underneath the robot, relatively little research has investigated legged mobilit…
▽ More
Recent advances of locomotion controllers utilizing deep reinforcement learning (RL) have yielded impressive results in terms of achieving rapid and robust locomotion across challenging terrain, such as rugged rocks, non-rigid ground, and slippery surfaces. However, while these controllers primarily address challenges underneath the robot, relatively little research has investigated legged mobility through confined 3D spaces, such as narrow tunnels or irregular voids, which impose all-around constraints. The cyclic gait patterns resulted from existing RL-based methods to learn parameterized locomotion skills characterized by motion parameters, such as velocity and body height, may not be adequate to navigate robots through challenging confined 3D spaces, requiring both agile 3D obstacle avoidance and robust legged locomotion. Instead, we propose to learn locomotion skills end-to-end from goal-oriented navigation in confined 3D spaces. To address the inefficiency of tracking distant navigation goals, we introduce a hierarchical locomotion controller that combines a classical planner tasked with planning waypoints to reach a faraway global goal location, and an RL-based policy trained to follow these waypoints by generating low-level motion commands. This approach allows the policy to explore its own locomotion skills within the entire solution space and facilitates smooth transitions between local goals, enabling long-term navigation towards distant goals. In simulation, our hierarchical approach succeeds at navigating through demanding confined 3D environments, outperforming both pure end-to-end learning approaches and parameterized locomotion skills. We further demonstrate the successful real-world deployment of our simulation-trained controller on a real robot.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Finding Adversarial Inputs for Heuristics using Multi-level Optimization
Authors:
Pooria Namyar,
Behnaz Arzani,
Ryan Beckett,
Santiago Segarra,
Himanshu Raj,
Umesh Krishnaswamy,
Ramesh Govindan,
Srikanth Kandula
Abstract:
Production systems use heuristics because they are faster or scale better than their optimal counterparts. Yet, practitioners are often unaware of the performance gap between a heuristic and the optimum or between two heuristics in realistic scenarios. We present MetaOpt, a system that helps analyze heuristics. Users specify the heuristic and the optimal (or another heuristic) as input, and MetaOp…
▽ More
Production systems use heuristics because they are faster or scale better than their optimal counterparts. Yet, practitioners are often unaware of the performance gap between a heuristic and the optimum or between two heuristics in realistic scenarios. We present MetaOpt, a system that helps analyze heuristics. Users specify the heuristic and the optimal (or another heuristic) as input, and MetaOpt automatically encodes these efficiently for a solver to find performance gaps and their corresponding adversarial inputs. Its suite of built-in optimizations helps it scale its analysis to practical problem sizes. To show it is versatile, we used MetaOpt to analyze heuristics from three domains (traffic engineering, vector bin packing, and packet scheduling). We found a production traffic engineering heuristic can require 30% more capacity than the optimal to satisfy realistic demands. Based on the patterns in the adversarial inputs MetaOpt produced, we modified the heuristic to reduce its performance gap by 12.5$\times$. We examined adversarial inputs to a vector bin packing heuristic and proved a new lower bound on its performance.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Solving Max-Min Fair Resource Allocations Quickly on Large Graphs
Authors:
Pooria Namyar,
Behnaz Arzani,
Srikanth Kandula,
Santiago Segarra,
Daniel Crankshaw,
Umesh Krishnaswamy,
Ramesh Govindan,
Himanshu Raj
Abstract:
We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequen…
▽ More
We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequence into a single fast optimization, and (2) we generalize waterfilling to the multi-path case. We empirically show our new algorithms Pareto-dominate prior techniques: they produce faster, fairer, and more efficient allocations. Some of our allocators also have theoretical guarantees: they trade off a bounded amount of unfairness for faster allocation. We have deployed our allocators in Azure's WAN traffic engineering pipeline, where we preserve solution quality and achieve a roughly $3\times$ speedup.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Rethinking Social Robot Navigation: Leveraging the Best of Two Worlds
Authors:
Amir Hossain Raj,
Zichao Hu,
Haresh Karnan,
Rohan Chandra,
Amirreza Payandeh,
Luisa Mao,
Peter Stone,
Joydeep Biswas,
Xuesu Xiao
Abstract:
Empowering robots to navigate in a socially compliant manner is essential for the acceptance of robots moving in human-inhabited environments. Previously, roboticists have developed geometric navigation systems with decades of empirical validation to achieve safety and efficiency. However, the many complex factors of social compliance make geometric navigation systems hard to adapt to social situa…
▽ More
Empowering robots to navigate in a socially compliant manner is essential for the acceptance of robots moving in human-inhabited environments. Previously, roboticists have developed geometric navigation systems with decades of empirical validation to achieve safety and efficiency. However, the many complex factors of social compliance make geometric navigation systems hard to adapt to social situations, where no amount of tuning enables them to be both safe (people are too unpredictable) and efficient (the frozen robot problem). With recent advances in deep learning approaches, the common reaction has been to entirely discard these classical navigation systems and start from scratch, building a completely new learning-based social navigation planner. In this work, we find that this reaction is unnecessarily extreme: using a large-scale real-world social navigation dataset, SCAND, we find that geometric systems can produce trajectory plans that align with the human demonstrations in a large number of social situations. We, therefore, ask if we can rethink the social robot navigation problem by leveraging the advantages of both geometric and learning-based methods. We validate this hybrid paradigm through a proof-of-concept experiment, in which we develop a hybrid planner that switches between geometric and learning-based planning. Our experiments on both SCAND and two physical robots show that the hybrid planner can achieve better social compliance compared to using either the geometric or learning-based approach alone.
△ Less
Submitted 9 March, 2024; v1 submitted 23 September, 2023;
originally announced September 2023.
-
A Study on Learning Social Robot Navigation with Multimodal Perception
Authors:
Bhabaranjan Panigrahi,
Amir Hossain Raj,
Mohammad Nazeri,
Xuesu Xiao
Abstract:
Autonomous mobile robots need to perceive the environments with their onboard sensors (e.g., LiDARs and RGB cameras) and then make appropriate navigation decisions. In order to navigate human-inhabited public spaces, such a navigation task becomes more than only obstacle avoidance, but also requires considering surrounding humans and their intentions to somewhat change the navigation behavior in r…
▽ More
Autonomous mobile robots need to perceive the environments with their onboard sensors (e.g., LiDARs and RGB cameras) and then make appropriate navigation decisions. In order to navigate human-inhabited public spaces, such a navigation task becomes more than only obstacle avoidance, but also requires considering surrounding humans and their intentions to somewhat change the navigation behavior in response to the underlying social norms, i.e., being socially compliant. Machine learning methods are shown to be effective in capturing those complex and subtle social interactions in a data-driven manner, without explicitly hand-crafting simplified models or cost functions. Considering multiple available sensor modalities and the efficiency of learning methods, this paper presents a comprehensive study on learning social robot navigation with multimodal perception using a large-scale real-world dataset. The study investigates social robot navigation decision making on both the global and local planning levels and contrasts unimodal and multimodal learning against a set of classical navigation approaches in different social scenarios, while also analyzing the training and generalizability performance from the learning perspective. We also conduct a human study on how learning with multimodal perception affects the perceived social compliance. The results show that multimodal learning has a clear advantage over unimodal learning in both dataset and human studies. We open-source our code for the community's future use to study multimodal perception for learning social robot navigation.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Semantic Consistency for Assuring Reliability of Large Language Models
Authors:
Harsh Raj,
Vipul Gupta,
Domenic Rosati,
Subhabrata Majumdar
Abstract:
Large Language Models (LLMs) exhibit remarkable fluency and competence across various natural language tasks. However, recent research has highlighted their sensitivity to variations in input prompts. To deploy LLMs in a safe and reliable manner, it is crucial for their outputs to be consistent when prompted with expressions that carry the same meaning or intent. While some existing work has explo…
▽ More
Large Language Models (LLMs) exhibit remarkable fluency and competence across various natural language tasks. However, recent research has highlighted their sensitivity to variations in input prompts. To deploy LLMs in a safe and reliable manner, it is crucial for their outputs to be consistent when prompted with expressions that carry the same meaning or intent. While some existing work has explored how state-of-the-art LLMs address this issue, their evaluations have been confined to assessing lexical equality of single- or multi-word answers, overlooking the consistency of generative text sequences. For a more comprehensive understanding of the consistency of LLMs in open-ended text generation scenarios, we introduce a general measure of semantic consistency, and formulate multiple versions of this metric to evaluate the performance of various LLMs. Our proposal demonstrates significantly higher consistency and stronger correlation with human evaluations of output consistency than traditional metrics based on lexical consistency. Finally, we propose a novel prompting strategy, called Ask-to-Choose (A2C), to enhance semantic consistency. When evaluated for closed-book question answering based on answer variations from the TruthfulQA benchmark, A2C increases accuracy metrics for pretrained and finetuned LLMs by up to 47%, and semantic consistency metrics for instruction-tuned models by up to 7-fold.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Blockchain inspired secure and reliable data exchange architecture for cyber-physical healthcare system 4.0
Authors:
Mohit Kumar,
Hritu Raj,
Nisha Chaurasia,
Sukhpal Singh Gill
Abstract:
A cyber-physical system is considered to be a collection of strongly coupled communication systems and devices that poses numerous security trials in various industrial applications including healthcare. The security and privacy of patient data is still a big concern because healthcare data is sensitive and valuable, and it is most targeted over the internet. Moreover, from the industrial perspect…
▽ More
A cyber-physical system is considered to be a collection of strongly coupled communication systems and devices that poses numerous security trials in various industrial applications including healthcare. The security and privacy of patient data is still a big concern because healthcare data is sensitive and valuable, and it is most targeted over the internet. Moreover, from the industrial perspective, the cyber-physical system plays a crucial role in the exchange of data remotely using sensor nodes in distributed environments. In the healthcare industry, Blockchain technology offers a promising solution to resolve most securities-related issues due to its decentralized, immutability, and transparency properties. In this paper, a blockchain-inspired secure and reliable data exchange architecture is proposed in the cyber-physical healthcare industry 4.0. The proposed system uses the BigchainDB, Tendermint, Inter-Planetary-File-System (IPFS), MongoDB, and AES encryption algorithms to improve Healthcare 4.0. Furthermore, blockchain-enabled secure healthcare architecture for accessing and managing the records between Doctors and Patients is introduced. The development of a blockchain-based Electronic Healthcare Record (EHR) exchange system is purely patient-centric, which means the entire control of data is in the owner's hand which is backed by blockchain for security and privacy. Our experimental results reveal that the proposed architecture is robust to handle more security attacks and can recover the data if 2/3 of nodes are failed. The proposed model is patient-centric, and control of data is in the patient's hand to enhance security and privacy, even system administrators can't access data without user permission.
△ Less
Submitted 28 June, 2023;
originally announced July 2023.
-
Measuring Reliability of Large Language Models through Semantic Consistency
Authors:
Harsh Raj,
Domenic Rosati,
Subhabrata Majumdar
Abstract:
While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when prompts are semantically identical, language models may give very different answers. When considering safe and trustworthy deployments of PLMs we would like their…
▽ More
While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when prompts are semantically identical, language models may give very different answers. When considering safe and trustworthy deployments of PLMs we would like their outputs to be consistent under prompts that mean the same thing or convey the same intent. While some work has looked into how state-of-the-art PLMs address this need, they have been limited to only evaluating lexical equality of single- or multi-word answers and do not address consistency of generative text sequences. In order to understand consistency of PLMs under text generation settings, we develop a measure of semantic consistency that allows the comparison of open-ended text outputs. We implement several versions of this consistency metric to evaluate the performance of a number of PLMs on paraphrased versions of questions in the TruthfulQA dataset, we find that our proposed metrics are considerably more consistent than traditional metrics embodying lexical consistency, and also correlate with human evaluation of output consistency to a higher degree.
△ Less
Submitted 11 April, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
AskYourDB: An end-to-end system for querying and visualizing relational databases using natural language
Authors:
Manu Joseph,
Harsh Raj,
Anubhav Yadav,
Aaryamann Sharma
Abstract:
Querying databases for the right information is a time consuming and error-prone task and often requires experienced professionals for the job. Furthermore, the user needs to have some prior knowledge about the database. There have been various efforts to develop an intelligence which can help business users to query databases directly. However, there has been some successes, but very little in te…
▽ More
Querying databases for the right information is a time consuming and error-prone task and often requires experienced professionals for the job. Furthermore, the user needs to have some prior knowledge about the database. There have been various efforts to develop an intelligence which can help business users to query databases directly. However, there has been some successes, but very little in terms of testing and deploying those for real world users. In this paper, we propose a semantic parsing approach to address the challenge of converting complex natural language into SQL and institute a product out of it. For this purpose, we modified state-of-the-art models, by various pre and post processing steps which make the significant part when a model is deployed in production. To make the product serviceable to businesses we added an automatic visualization framework over the queried results.
△ Less
Submitted 16 October, 2022;
originally announced October 2022.
-
On Transfer of Adversarial Robustness from Pretraining to Downstream Tasks
Authors:
Laura Fee Nern,
Harsh Raj,
Maurice Georgi,
Yash Sharma
Abstract:
As large-scale training regimes have gained popularity, the use of pretrained models for downstream tasks has become common practice in machine learning. While pretraining has been shown to enhance the performance of models in practice, the transfer of robustness properties from pretraining to downstream tasks remains poorly understood. In this study, we demonstrate that the robustness of a linear…
▽ More
As large-scale training regimes have gained popularity, the use of pretrained models for downstream tasks has become common practice in machine learning. While pretraining has been shown to enhance the performance of models in practice, the transfer of robustness properties from pretraining to downstream tasks remains poorly understood. In this study, we demonstrate that the robustness of a linear predictor on downstream tasks can be constrained by the robustness of its underlying representation, regardless of the protocol used for pretraining. We prove (i) a bound on the loss that holds independent of any downstream task, as well as (ii) a criterion for robust classification in particular. We validate our theoretical results in practical applications, show how our results can be used for calibrating expectations of downstream robustness, and when our results are useful for optimal transfer learning. Taken together, our results offer an initial step towards characterizing the requirements of the representation function for reliable post-adaptation performance.
△ Less
Submitted 9 October, 2023; v1 submitted 7 August, 2022;
originally announced August 2022.
-
GANDALF: Gated Adaptive Network for Deep Automated Learning of Features
Authors:
Manu Joseph,
Harsh Raj
Abstract:
We propose a novel high-performance, interpretable, and parameter \& computationally efficient deep learning architecture for tabular data, Gated Adaptive Network for Deep Automated Learning of Features (GANDALF). GANDALF relies on a new tabular processing unit with a gating mechanism and in-built feature selection called Gated Feature Learning Unit (GFLU) as a feature representation learning unit…
▽ More
We propose a novel high-performance, interpretable, and parameter \& computationally efficient deep learning architecture for tabular data, Gated Adaptive Network for Deep Automated Learning of Features (GANDALF). GANDALF relies on a new tabular processing unit with a gating mechanism and in-built feature selection called Gated Feature Learning Unit (GFLU) as a feature representation learning unit. We demonstrate that GANDALF outperforms or stays at-par with SOTA approaches like XGBoost, SAINT, FT-Transformers, etc. by experiments on multiple established public benchmarks. We have made available the code at github.com/manujosephv/pytorch_tabular under MIT License.
△ Less
Submitted 9 January, 2024; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Multi-Image Visual Question Answering
Authors:
Harsh Raj,
Janhavi Dadhania,
Akhilesh Bhardwaj,
Prabuchandran KJ
Abstract:
While a lot of work has been done on developing models to tackle the problem of Visual Question Answering, the ability of these models to relate the question to the image features still remain less explored. We present an empirical study of different feature extraction methods with different loss functions. We propose New dataset for the task of Visual Question Answering with multiple image inputs…
▽ More
While a lot of work has been done on developing models to tackle the problem of Visual Question Answering, the ability of these models to relate the question to the image features still remain less explored. We present an empirical study of different feature extraction methods with different loss functions. We propose New dataset for the task of Visual Question Answering with multiple image inputs having only one ground truth, and benchmark our results on them. Our final model utilising Resnet + RCNN image features and Bert embeddings, inspired from stacked attention network gives 39% word accuracy and 99% image accuracy on CLEVER+TinyImagenet dataset.
△ Less
Submitted 6 February, 2022; v1 submitted 27 December, 2021;
originally announced December 2021.
-
Exploration of Visual Features and their weighted-additive fusion for Video Captioning
Authors:
Praveen S V,
Akhilesh Bharadwaj,
Harsh Raj,
Janhavi Dadhania,
Ganesh Samarth C. A,
Nikhil Pareek,
S R M Prasanna
Abstract:
Video captioning is a popular task that challenges models to describe events in videos using natural language. In this work, we investigate the ability of various visual feature representations derived from state-of-the-art convolutional neural networks to capture high-level semantic context. We introduce the Weighted Additive Fusion Transformer with Memory Augmented Encoders (WAFTM), a captioning…
▽ More
Video captioning is a popular task that challenges models to describe events in videos using natural language. In this work, we investigate the ability of various visual feature representations derived from state-of-the-art convolutional neural networks to capture high-level semantic context. We introduce the Weighted Additive Fusion Transformer with Memory Augmented Encoders (WAFTM), a captioning model that incorporates memory in a transformer encoder and uses a novel method, to fuse features, that ensures due importance is given to more significant representations. We illustrate a gain in performance realized by applying Word-Piece Tokenization and a popular REINFORCE algorithm. Finally, we benchmark our model on two datasets and obtain a CIDEr of 92.4 on MSVD and a METEOR of 0.091 on the ActivityNet Captions Dataset.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.