Search | arXiv e-print repository

BiRoDiff: Diffusion policies for bipedal robot locomotion on unseen terrains

Authors: GVS Mothish, Manan Tayal, Shishir Kolathaya

Abstract: Locomotion on unknown terrains is essential for bipedal robots to handle novel real-world challenges, thus expanding their utility in disaster response and exploration. In this work, we introduce a lightweight framework that learns a single walking controller that yields locomotion on multiple terrains. We have designed a real-time robot controller based on diffusion models, which not only capture… ▽ More Locomotion on unknown terrains is essential for bipedal robots to handle novel real-world challenges, thus expanding their utility in disaster response and exploration. In this work, we introduce a lightweight framework that learns a single walking controller that yields locomotion on multiple terrains. We have designed a real-time robot controller based on diffusion models, which not only captures multiple behaviours with different velocities in a single policy but also generalizes well for unseen terrains. Our controller learns with offline data, which is better than online learning in aspects like scalability, simplicity in training scheme etc. We have designed and implemented a diffusion model-based policy controller in simulation on our custom-made Bipedal Robot model named Stoch BiRo. We have demonstrated its generalization capability and high frequency control step generation relative to typical generative models, which require huge onboarding compute. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 6 pages, 5 figures

arXiv:2403.19332 [pdf, other]

Learning a Formally Verified Control Barrier Function in Stochastic Environment

Authors: Manan Tayal, Hongchao Zhang, Pushpak Jagtap, Andrew Clark, Shishir Kolathaya

Abstract: Safety is a fundamental requirement of control systems. Control Barrier Functions (CBFs) are proposed to ensure the safety of the control system by constructing safety filters or synthesizing control inputs. However, the safety guarantee and performance of safe controllers rely on the construction of valid CBFs. Inspired by universal approximatability, CBFs are represented by neural networks, know… ▽ More Safety is a fundamental requirement of control systems. Control Barrier Functions (CBFs) are proposed to ensure the safety of the control system by constructing safety filters or synthesizing control inputs. However, the safety guarantee and performance of safe controllers rely on the construction of valid CBFs. Inspired by universal approximatability, CBFs are represented by neural networks, known as neural CBFs (NCBFs). This paper presents an algorithm for synthesizing formally verified continuous-time neural Control Barrier Functions in stochastic environments in a single step. The proposed training process ensures efficacy across the entire state space with only a finite number of data points by constructing a sample-based learning framework for Stochastic Neural CBFs (SNCBFs). Our methodology eliminates the need for post hoc verification by enforcing Lipschitz bounds on the neural network, its Jacobian, and Hessian terms. We demonstrate the effectiveness of our approach through case studies on the inverted pendulum system and obstacle avoidance in autonomous driving, showcasing larger safe regions compared to baseline methods. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 8 pages, 3 figures

arXiv:2403.07043 [pdf, other]

A Collision Cone Approach for Control Barrier Functions

Authors: Manan Tayal, Bhavya Giri Goswami, Karthik Rajgopal, Rajpal Singh, Tejas Rao, Jishnu Keshavan, Pushpak Jagtap, Shishir Kolathaya

Abstract: This work presents a unified approach for collision avoidance using Collision-Cone Control Barrier Functions (CBFs) in both ground (UGV) and aerial (UAV) unmanned vehicles. We propose a novel CBF formulation inspired by collision cones, to ensure safety by constraining the relative velocity between the vehicle and the obstacle to always point away from each other. The efficacy of this approach is… ▽ More This work presents a unified approach for collision avoidance using Collision-Cone Control Barrier Functions (CBFs) in both ground (UGV) and aerial (UAV) unmanned vehicles. We propose a novel CBF formulation inspired by collision cones, to ensure safety by constraining the relative velocity between the vehicle and the obstacle to always point away from each other. The efficacy of this approach is demonstrated through simulations and hardware implementations on the TurtleBot, Stoch-Jeep, and Crazyflie 2.1 quadrotor robot, showcasing its effectiveness in avoiding collisions with dynamic obstacles in both ground and aerial settings. The real-time controller is developed using CBF Quadratic Programs (CBF-QPs). Comparative analysis with the state-of-the-art CBFs highlights the less conservative nature of the proposed approach. Overall, this research contributes to a novel control formation that can give a guarantee for collision avoidance in unmanned vehicles by modifying the control inputs from existing path-planning controllers. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 13 pages, 16 pages. arXiv admin note: substantial text overlap with arXiv:2209.11524, arXiv:2303.15871, arXiv:2310.10839

arXiv:2403.01410 [pdf, other]

Barrier Functions Inspired Reward Shaping for Reinforcement Learning

Authors: Nilaksh Nilaksh, Abhishek Ranjan, Shreenabh Agrawal, Aayush Jain, Pushpak Jagtap, Shishir Kolathaya

Abstract: Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward shaping is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-shaping framework inspired by bar… ▽ More Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward shaping is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-shaping framework inspired by barrier functions, offering simplicity and ease of implementation across various environments and tasks. To evaluate the effectiveness of the proposed reward formulations, we conduct simulation experiments on CartPole, Ant, and Humanoid environments, along with real-world deployment on the Unitree Go1 quadruped robot. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework. △ Less

Submitted 1 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 7 pages, 10 figures, Accepted as contributed paper at ICRA 2024

ACM Class: I.2.9

arXiv:2312.06512 [pdf, other]

Stoch BiRo: Design and Control of a low cost bipedal robot

Authors: GVS Mothish, Karthik Rajgopal, Ravi Kola, Manan Tayal, Shishir Kolathaya

Abstract: This paper introduces the Stoch BiRo, a cost-effective bipedal robot designed with a modular mechanical structure having point feet to navigate uneven and unfamiliar terrains. The robot employs proprioceptive actuation in abduction, hips, and knees, leveraging a Raspberry Pi4 for control. Overcoming computational limitations, a Learning-based Linear Policy controller manages balance and locomotion… ▽ More This paper introduces the Stoch BiRo, a cost-effective bipedal robot designed with a modular mechanical structure having point feet to navigate uneven and unfamiliar terrains. The robot employs proprioceptive actuation in abduction, hips, and knees, leveraging a Raspberry Pi4 for control. Overcoming computational limitations, a Learning-based Linear Policy controller manages balance and locomotion with only 3 degrees of freedom (DoF) per leg, distinct from the typical 5DoF in bipedal systems. Integrated within a modular control architecture, these controllers enable autonomous handling of unforeseen terrain disturbances without external sensors or prior environment knowledge. The robot's policies are trained and simulated using MuJoCo, transferring learned behaviors to the Stoch BiRo hardware for initial walking validations. This work highlights the Stoch BiRo's adaptability and cost-effectiveness in mechanical design, control strategies, and autonomous navigation, promising diverse applications in real-world robotics scenarios. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 7 Pages, 6 figures

arXiv:2311.08787 [pdf, other]

Polygonal Cone Control Barrier Functions (PolyC2BF) for safe navigation in cluttered environments

Authors: Manan Tayal, Shishir Kolathaya

Abstract: In fields such as mining, search and rescue, and archaeological exploration, ensuring real-time, collision-free navigation of robots in confined, cluttered environments is imperative. Despite the value of established path planning algorithms, they often face challenges in convergence rates and handling dynamic infeasibilities. Alternative techniques like collision cones struggle to accurately repr… ▽ More In fields such as mining, search and rescue, and archaeological exploration, ensuring real-time, collision-free navigation of robots in confined, cluttered environments is imperative. Despite the value of established path planning algorithms, they often face challenges in convergence rates and handling dynamic infeasibilities. Alternative techniques like collision cones struggle to accurately represent complex obstacle geometries. This paper introduces a novel category of control barrier functions, known as Polygonal Cone Control Barrier Function (PolyC2BF), which addresses overestimation and computational complexity issues. The proposed PolyC2BF, formulated as a Quadratic Programming (QP) problem, proves effective in facilitating collision-free movement of multiple robots in complex environments. The efficacy of this approach is further demonstrated through PyBullet simulations on quadruped (unicycle model), and crazyflie 2.1 (quadrotor model) in cluttered environments. △ Less

Submitted 27 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 6 Pages, 6 Figures. Accepted at European Control Conference (ECC) 2024. arXiv admin note: text overlap with arXiv:2303.15871

arXiv:2310.10839 [pdf, other]

Collision Cone Control Barrier Functions: Experimental Validation on UGVs for Kinematic Obstacle Avoidance

Authors: Bhavya Giri Goswami, Manan Tayal, Karthik Rajgopal, Pushpak Jagtap, Shishir Kolathaya

Abstract: Autonomy advances have enabled robots in diverse environments and close human interaction, necessitating controllers with formal safety guarantees. This paper introduces an experimental platform designed for the validation and demonstration of a novel class of Control Barrier Functions (CBFs) tailored for Unmanned Ground Vehicles (UGVs) to proactively prevent collisions with kinematic obstacles by… ▽ More Autonomy advances have enabled robots in diverse environments and close human interaction, necessitating controllers with formal safety guarantees. This paper introduces an experimental platform designed for the validation and demonstration of a novel class of Control Barrier Functions (CBFs) tailored for Unmanned Ground Vehicles (UGVs) to proactively prevent collisions with kinematic obstacles by integrating the concept of collision cones. While existing CBF formulations excel with static obstacles, extensions to torque/acceleration-controlled unicycle and bicycle models have seen limited success. Conventional CBF applications in nonholonomic UGV models have demonstrated control conservatism, particularly in scenarios where steering/thrust control was deemed infeasible. Drawing inspiration from collision cones in path planning, we present a pioneering CBF formulation ensuring theoretical safety guarantees for both unicycle and bicycle models. The core premise revolves around aligning the obstacle's velocity away from the vehicle, establishing a constraint to perpetually avoid vectors directed towards it. This control methodology is rigorously validated through simulations and experimental verification on the Copernicus mobile robot (Unicycle Model) and FOCAS-Car (Bicycle Model). △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 8 pages, 11 figures, Submitted at American Control Conference (ACC), 2024. arXiv admin note: substantial text overlap with arXiv:2209.11524

arXiv:2309.01898 [pdf, other]

Safe Legged Locomotion using Collision Cone Control Barrier Functions (C3BFs)

Authors: Manan Tayal, Shishir Kolathaya

Abstract: Legged robots exhibit significant potential across diverse applications, including but not limited to hazardous environment search and rescue missions and the exploration of unexplored regions both on Earth and in outer space. However, the successful navigation of these robots in dynamic environments heavily hinges on the implementation of efficient collision avoidance techniques. In this research… ▽ More Legged robots exhibit significant potential across diverse applications, including but not limited to hazardous environment search and rescue missions and the exploration of unexplored regions both on Earth and in outer space. However, the successful navigation of these robots in dynamic environments heavily hinges on the implementation of efficient collision avoidance techniques. In this research paper, we employ Collision Cone Control Barrier Functions (C3BF) to ensure the secure movement of legged robots within environments featuring a wide array of static and dynamic obstacles. We introduce the Quadratic Program (QP) formulation of C3BF, referred to as C3BF-QP, which serves as a protective filter layer atop a reference controller to ensure the robots' safety during operation. The effectiveness of this approach is illustrated through simulations conducted on PyBullet. △ Less

Submitted 28 March, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: 5 Pages, 5 Figures. Updated citation. arXiv admin note: substantial text overlap with arXiv:2303.15871

arXiv:2305.12239 [pdf, other]

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

Authors: Naman Saxena, Subhojyoti Khastigir, Shishir Kolathaya, Shalabh Bhatnagar

Abstract: The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy… ▽ More The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $ε$-optimal stationary policy with a sample complexity of $Ω(ε^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments. △ Less

Submitted 19 July, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

Comments: Accepted at ICML 2023

arXiv:2303.15871 [pdf, other]

Control Barrier Functions in Dynamic UAVs for Kinematic Obstacle Avoidance: A Collision Cone Approach

Authors: Manan Tayal, Rajpal Singh, Jishnu Keshavan, Shishir Kolathaya

Abstract: Unmanned aerial vehicles (UAVs), specifically quadrotors, have revolutionized various industries with their maneuverability and versatility, but their safe operation in dynamic environments heavily relies on effective collision avoidance techniques. This paper introduces a novel technique for safely navigating a quadrotor along a desired route while avoiding kinematic obstacles. We propose a new c… ▽ More Unmanned aerial vehicles (UAVs), specifically quadrotors, have revolutionized various industries with their maneuverability and versatility, but their safe operation in dynamic environments heavily relies on effective collision avoidance techniques. This paper introduces a novel technique for safely navigating a quadrotor along a desired route while avoiding kinematic obstacles. We propose a new constraint formulation that employs control barrier functions (CBFs) and collision cones to ensure that the relative velocity between the quadrotor and the obstacle always avoids a cone of vectors that may lead to a collision. By showing that the proposed constraint is a valid CBF for quadrotors, we are able to leverage its real-time implementation via Quadratic Programs (QPs), called the CBF-QPs. Validation includes PyBullet simulations and hardware experiments on Crazyflie 2.1, demonstrating effectiveness in static and moving obstacle scenarios. Comparative analysis with literature, especially higher order CBF-QPs, highlights the proposed approach's less conservative nature. Simulation and Hardware videos are available here: https://tayalmanan28.github.io/C3BF-UAV/ △ Less

Submitted 15 March, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: Accepted at American Control Conference(ACC) 2024. 6 pages, 8 figures

arXiv:2209.11524 [pdf, other]

Control Barrier Functions in UGVs for Kinematic Obstacle Avoidance: A Collision Cone Approach

Authors: Phani Thontepu, Bhavya Giri Goswami, Manan Tayal, Neelaksh Singh, Shyamsundar P I, Shyam Sundar M G, Suresh Sundaram, Vaibhav Katewa, Shishir Kolathaya

Abstract: In this paper, we propose a new class of Control Barrier Functions (CBFs) for Unmanned Ground Vehicles (UGVs) that help avoid collisions with kinematic (non-zero velocity) obstacles. While the current forms of CBFs have been successful in guaranteeing safety/collision avoidance with static obstacles, extensions for the dynamic case have seen limited success. Moreover, with the UGV models like the… ▽ More In this paper, we propose a new class of Control Barrier Functions (CBFs) for Unmanned Ground Vehicles (UGVs) that help avoid collisions with kinematic (non-zero velocity) obstacles. While the current forms of CBFs have been successful in guaranteeing safety/collision avoidance with static obstacles, extensions for the dynamic case have seen limited success. Moreover, with the UGV models like the unicycle or the bicycle, applications of existing CBFs have been conservative in terms of control, i.e., steering/thrust control has not been possible under certain scenarios. Drawing inspiration from the classical use of collision cones for obstacle avoidance in trajectory planning, we introduce its novel CBF formulation with theoretical guarantees on safety for both the unicycle and bicycle models. The main idea is to ensure that the velocity of the obstacle w.r.t. the vehicle is always pointing away from the vehicle. Accordingly, we construct a constraint that ensures that the velocity vector always avoids a cone of vectors pointing at the vehicle. The efficacy of this new control methodology is later verified by Pybullet simulations on TurtleBot3 and F1Tenth. △ Less

Submitted 16 October, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: 6 pages, 4 figures, For supplement video follow https://youtu.be/Dme7Wm9y6es. *The first and second authors have contributed equally

ACM Class: I.2.9; G.1.6; J.2

arXiv:2208.10583 [pdf, other]

Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking

Authors: Eshwar S R, Shishir Kolathaya, Gugan Thoppe

Abstract: Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimate… ▽ More Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimated by interacting several times with the environment using that policy. This leads to a lot of wasteful interactions since, once the ranking is done, only the data associated with the top-ranked policies is used for subsequent learning. To improve sample efficiency, we propose a novel off-policy alternative for ranking, based on a local approximation for the fitness function. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well. △ Less

Submitted 21 February, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

arXiv:2112.02999 [pdf, other]

Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot Learning

Authors: Utkarsh A. Mishra, Soumya R. Samineni, Prakhar Goel, Chandravaran Kunjeti, Himanshu Lodha, Aman Singh, Aditya Sagi, Shalabh Bhatnagar, Shishir Kolathaya

Abstract: Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two… ▽ More Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two loops are proposed, where the Dynamic Mirror Descent based Model Predictive Control (DMD-MPC) is used as the inner loop Mb-RL to obtain an optimal sequence of actions. These actions are in turn used to significantly accelerate the outer loop Mf-RL. We show that our formulation is generic for a broad class of MPC-based policies and objectives, and includes some of the well-known Mb-Mf approaches. We finally introduce a new algorithm: Mirror-Descent Model Predictive RL (M-DeMoRL), which uses Cross-Entropy Method (CEM) with elite fractions for the inner loop. Our experiments show faster convergence of the proposed hierarchical approach on benchmark MuJoCo tasks. We also demonstrate hardware training for trajectory tracking in a 2R leg and hardware transfer for robust walking in a quadruped. We show that the inner-loop Mb-RL significantly decreases the number of training iterations required in the real system, thereby validating the proposed approach. △ Less

Submitted 4 November, 2021; originally announced December 2021.

Comments: 8 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2110.12239

arXiv:2109.12665 [pdf, other]

Linear Policies are Sufficient to Realize Robust Bipedal Walking on Challenging Terrains

Authors: Lokesh Krishna, Guillermo A. Castillo, Utkarsh A. Mishra, Ayonga Hereid, Shishir Kolathaya

Abstract: In this work, we demonstrate robust walking in the bipedal robot Digit on uneven terrains by just learning a single linear policy. In particular, we propose a new control pipeline, wherein the high-level trajectory modulator shapes the end-foot ellipsoidal trajectories, and the low-level gait controller regulates the torso and ankle orientation. The foot-trajectory modulator uses a linear policy a… ▽ More In this work, we demonstrate robust walking in the bipedal robot Digit on uneven terrains by just learning a single linear policy. In particular, we propose a new control pipeline, wherein the high-level trajectory modulator shapes the end-foot ellipsoidal trajectories, and the low-level gait controller regulates the torso and ankle orientation. The foot-trajectory modulator uses a linear policy and the regulator uses a linear PD control law. As opposed to neural network-based policies, the proposed linear policy has only 13 learnable parameters, thereby not only guaranteeing sample efficient learning but also enabling simplicity and interpretability of the policy. This is achieved with no loss of performance on challenging terrains like slopes, stairs and outdoor landscapes. We first demonstrate robust walking in the custom simulation environment, MuJoCo, and then directly transfer to hardware with no modification of the control pipeline. We subject the biped to a series of pushes and terrain height changes, both indoors and outdoors, thereby validating the presented work. △ Less

Submitted 5 October, 2021; v1 submitted 26 September, 2021; originally announced September 2021.

Comments: 8 pages, 10 Figures

arXiv:2104.01662 [pdf, other]

Learning Linear Policies for Robust Bipedal Locomotion on Terrains with Varying Slopes

Authors: Lokesh Krishna, Utkarsh A. Mishra, Guillermo A. Castillo, Ayonga Hereid, Shishir Kolathaya

Abstract: In this paper, with a view toward deployment of light-weight control frameworks for bipedal walking robots, we realize end-foot trajectories that are shaped by a single linear feedback policy. We learn this policy via a model-free and a gradient-free learning algorithm, Augmented Random Search (ARS), in the two robot platforms Rabbit and Digit. Our contributions are two-fold: a) By using torso and… ▽ More In this paper, with a view toward deployment of light-weight control frameworks for bipedal walking robots, we realize end-foot trajectories that are shaped by a single linear feedback policy. We learn this policy via a model-free and a gradient-free learning algorithm, Augmented Random Search (ARS), in the two robot platforms Rabbit and Digit. Our contributions are two-fold: a) By using torso and support plane orientation as inputs, we achieve robust walking on slopes of up to 20 degrees in simulation. b) We demonstrate additional behaviors like walking backwards, stepping-in-place, and recovery from external pushes of up to 120 N. The end result is a robust and a fast feedback control law for bipedal walking on terrains with varying slopes. Towards the end, we also provide preliminary results of hardware transfer to Digit. △ Less

Submitted 9 August, 2021; v1 submitted 4 April, 2021; originally announced April 2021.

Comments: 6 pages, 5 figures, Accepted in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021) in Prague, Czech Republic

arXiv:2101.01055 [pdf, other]

Stochastic Action Prediction for Imitation Learning

Authors: Sagar Gubbi Venkatesh, Nihesh Rathod, Shishir Kolathaya, Bharadwaj Amrutur

Abstract: Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions. When performing demonstrations, experts are not always consistent and might accomplish the same task in slightly different ways. In this paper, we demonstrate inherent stochasticity in demonstrations collected for tasks including line following… ▽ More Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions. When performing demonstrations, experts are not always consistent and might accomplish the same task in slightly different ways. In this paper, we demonstrate inherent stochasticity in demonstrations collected for tasks including line following with a remote-controlled car and manipulation tasks including reaching, pushing, and picking and placing an object. We model stochasticity in the data distribution using autoregressive action generation, generative adversarial nets, and variational prediction and compare the performance of these approaches. We find that accounting for stochasticity in the expert data leads to substantial improvement in the success rate of task completion. △ Less

Submitted 26 December, 2020; originally announced January 2021.

arXiv:2101.01053 [pdf, other]

Multi-Instance Aware Localization for End-to-End Imitation Learning

Authors: Sagar Gubbi Venkatesh, Raviteja Upadrashta, Shishir Kolathaya, Bharadwaj Amrutur

Abstract: Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map outp… ▽ More Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map output of the vision layers with an embedding that can indicate instance preference or take advantage of an implicit preference present in the expert demonstrations, and (b) employing an autoregressive action generator network for the control layers. The proposed architecture for localization has improved accuracy and sample efficiency and can generalize to the presence of more instances of objects than seen during training. When used for end-to-end imitation learning to perform reach, push, and pick-and-place tasks on a real robot, training is achieved with as few as 15 expert demonstrations. △ Less

Submitted 26 December, 2020; originally announced January 2021.

Comments: Accepted at IROS 2020

arXiv:2101.01052 [pdf, other]

doi 10.1109/ICCAR49639.2020.9108072

Imitation Learning for High Precision Peg-in-Hole Tasks

Authors: Sagar Gubbi, Shishir Kolathaya, Bharadwaj Amrutur

Abstract: Industrial robot manipulators are not able to match the precision and speed with which humans are able to execute contact rich tasks even to this day. Therefore, as a means overcome this gap, we demonstrate generative methods for imitating a peg-in-hole insertion task in a 6-DOF robot manipulator. In particular, generative adversarial imitation learning (GAIL) is used to successfully achieve this… ▽ More Industrial robot manipulators are not able to match the precision and speed with which humans are able to execute contact rich tasks even to this day. Therefore, as a means overcome this gap, we demonstrate generative methods for imitating a peg-in-hole insertion task in a 6-DOF robot manipulator. In particular, generative adversarial imitation learning (GAIL) is used to successfully achieve this task with a 10 um, and a 6 um peg-hole clearance on the Yaskawa GP8 industrial robot. Experimental results show that the policy successfully learns within 20 episodes from a handful of human expert demonstrations on the robot (i.e., < 10 tele-operated robot demonstrations). The insertion time improves from > 20 seconds (which also includes failed insertions) to < 15 seconds, thereby validating the effectiveness of this approach. △ Less

Submitted 26 December, 2020; originally announced January 2021.

Comments: Accepted at ICCAR 2020

arXiv:2012.13620 [pdf, other]

doi 10.1109/RO-MAN47096.2020.9223596

Teaching Robots Novel Objects by Pointing at Them

Authors: Sagar Gubbi Venkatesh, Raviteja Upadrashta, Shishir Kolathaya, Bharadwaj Amrutur

Abstract: Robots that must operate in novel environments and collaborate with humans must be capable of acquiring new knowledge from human experts during operation. We propose teaching a robot novel objects it has not encountered before by pointing a hand at the new object of interest. An end-to-end neural network is used to attend to the novel object of interest indicated by the pointing hand and then to l… ▽ More Robots that must operate in novel environments and collaborate with humans must be capable of acquiring new knowledge from human experts during operation. We propose teaching a robot novel objects it has not encountered before by pointing a hand at the new object of interest. An end-to-end neural network is used to attend to the novel object of interest indicated by the pointing hand and then to localize the object in new scenes. In order to attend to the novel object indicated by the pointing hand, we propose a spatial attention modulation mechanism that learns to focus on the highlighted object while ignoring the other objects in the scene. We show that a robot arm can manipulate novel objects that are highlighted by pointing a hand at them. We also evaluate the performance of the proposed architecture on a synthetic dataset constructed using emojis and on a real-world dataset of common objects. △ Less

Submitted 25 December, 2020; originally announced December 2020.

arXiv:2010.16342 [pdf, other]

Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach

Authors: Kartik Paigwar, Lokesh Krishna, Sashank Tirumala, Naman Khetan, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

Abstract: In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtain… ▽ More In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtained via an inverse kinematics solver and tracked via a PID control law. Augmented Random Search, a model-free and a gradient-free learning algorithm is used to train this linear policy. Simulation results show that the resulting walking is robust to terrain slope variations and external pushes. This methodology is not only computationally light-weight but also uses minimal sensing and actuation capabilities in the robot, thereby justifying the approach. △ Less

Submitted 10 November, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

Comments: Accepted in 4th Conference on Robot Learning 2020, MIT, USA

arXiv:2007.14290 [pdf, other]

Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations

Authors: Sashank Tirumala, Sagar Gubbi, Kartik Paigwar, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

Abstract: With the research into development of quadruped robots picking up pace, learning based techniques are being explored for developing locomotion controllers for such robots. A key problem is to generate leg trajectories for continuously varying target linear and angular velocities, in a stable manner. In this paper, we propose a two pronged approach to address this problem. First, multiple simpler p… ▽ More With the research into development of quadruped robots picking up pace, learning based techniques are being explored for developing locomotion controllers for such robots. A key problem is to generate leg trajectories for continuously varying target linear and angular velocities, in a stable manner. In this paper, we propose a two pronged approach to address this problem. First, multiple simpler policies are trained to generate trajectories for a discrete set of target velocities and turning radius. These policies are then augmented using a higher level neural network for handling the transition between the learned trajectories. Specifically, we develop a neural network-based filter that takes in target velocity, radius and transforms them into new commands that enable smooth transitions to the new trajectory. This transformation is achieved by learning from expert demonstrations. An application of this is the transformation of a novice user's input into an expert user's input, thereby ensuring stable manoeuvres regardless of the user's experience. Training our proposed architecture requires much less expert demonstrations compared to standard neural network architectures. Finally, we demonstrate experimentally these results in the in-house quadruped Stoch 2. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 6 pages, Robot and Human Interaction Conference Italy 2020

arXiv:2001.00145 [pdf, other]

doi 10.1016/j.automatica.2020.108841

Local Stability of PD Controlled Bipedal Walking Robots

Authors: Shishir Kolathaya

Abstract: We establish stability results for PD tracking control laws in bipedal walking robots. Stability of PD control laws for continuous robotic systems is an established result, and we extend this for hybrid robotic systems, an alternating sequence of continuous and discrete events. Bipedal robots have the leg-swing as the continuous event, and the foot-strike as the discrete event. In addition, bipeds… ▽ More We establish stability results for PD tracking control laws in bipedal walking robots. Stability of PD control laws for continuous robotic systems is an established result, and we extend this for hybrid robotic systems, an alternating sequence of continuous and discrete events. Bipedal robots have the leg-swing as the continuous event, and the foot-strike as the discrete event. In addition, bipeds largely have underactuations due to the interactions between feet and ground. For each continuous event, we establish that the convergence rate of the tracking error can be regulated via appropriate tuning of the PD gains; and for each discrete event, we establish that this convergence rate sufficiently overcomes the nonlinear impacts by assumptions on the hybrid zero dynamics. The main contributions are 1) Extension of the stability results of PD control laws for underactuated robotic systems, and 2) Exponential ultimate boundedness of hybrid periodic orbits under the assumption of exponential stability of their projections to the hybrid zero dynamics. Towards the end, we will validate these results in a 2-link bipedal walker in simulation. △ Less

Submitted 1 January, 2020; originally announced January 2020.

Comments: 10 pages, 5 figures

Journal ref: Automatica 114 (2020) 108841

arXiv:1912.12907 [pdf, other]

Gait Library Synthesis for Quadruped Robots via Augmented Random Search

Authors: Sashank Tirumala, Aditya Sagi, Kartik Paigwar, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

Abstract: In this paper, with a view toward fast deployment of learned locomotion gaits in low-cost hardware, we generate a library of walking trajectories, namely, forward trot, backward trot, side-step, and turn in our custom-built quadruped robot, Stoch 2, using reinforcement learning. There are existing approaches that determine optimal policies for each time step, whereas we determine an optimal policy… ▽ More In this paper, with a view toward fast deployment of learned locomotion gaits in low-cost hardware, we generate a library of walking trajectories, namely, forward trot, backward trot, side-step, and turn in our custom-built quadruped robot, Stoch 2, using reinforcement learning. There are existing approaches that determine optimal policies for each time step, whereas we determine an optimal policy, in the form of end-foot trajectories, for each half walking step i.e., swing phase and stance phase. The way-points for the foot trajectories are obtained from a linear policy, i.e., a linear function of the states of the robot, and cubic splines are used to interpolate between these points. Augmented Random Search, a model-free and gradient-free learning algorithm is used to learn the policy in simulation. This learned policy is then deployed on hardware, yielding a trajectory in every half walking step. Different locomotion patterns are learned in simulation by enforcing a preconfigured phase shift between the trajectories of different legs. The transition from one gait to another is achieved by using a low-pass filter for the phase, and the sim-to-real transfer is improved by a linear transformation of the states obtained through regression. △ Less

Submitted 30 December, 2019; originally announced December 2019.

Comments: 7 pages, 11 figures, 1 table

arXiv:1905.06077 [pdf, other]

Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots

Authors: Shounak Bhattacharya, Abhik Singla, Abhimanyu, Dhaivat Dholakiya, Shalabh Bhatnagar, Bharadwaj Amrutur, Ashitava Ghosal, Shishir Kolathaya

Abstract: In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this pr… ▽ More In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this problem have not seen much success. Deep-Reinforcement Learning seems to be a promising approach, after its recent success in a variety of robot platforms, and the goal of this paper is to use this approach to realize the aforementioned behaviors. With this learning framework, the robot reached a bounding speed of 2.1 m/s with a maximum Froude number of 2. Simulation results also show that use of active spine, indeed, increased the stride length, improved the cost of transport, and also reduced the natural frequency to more realistic values. △ Less

Submitted 15 May, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: Submitted to IEEE RO-MAN 2019. Supplementary video: https://youtu.be/INp4aa-8z2E

arXiv:1901.00697 [pdf, other]

Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch

Authors: Dhaivat Dholakiya, Shounak Bhattacharya, Ajay Gunalan, Abhik Singla, Shalabh Bhatnagar, Bharadwaj Amrutur, Ashitava Ghosal, Shishir Kolathaya

Abstract: In this paper, we present a complete description of the hardware design and control architecture of our custom built quadruped robot, called the `Stoch'. Our goal is to realize a robust, modular, and a reliable quadrupedal platform, using which various locomotion behaviors are explored. This platform enables us to explore different research problems in legged locomotion, which use both traditional… ▽ More In this paper, we present a complete description of the hardware design and control architecture of our custom built quadruped robot, called the `Stoch'. Our goal is to realize a robust, modular, and a reliable quadrupedal platform, using which various locomotion behaviors are explored. This platform enables us to explore different research problems in legged locomotion, which use both traditional and learning based techniques. We discuss the merits and limitations of the platform in terms of exploitation of available behaviours, fast rapid prototyping, reproduction and repair. Towards the end, we will demonstrate trotting, bounding behaviors, and preliminary results in turning. In addition, we will also show various gait transitions i.e., trot-to-turn and trot-to-bound behaviors. △ Less

Submitted 27 February, 2019; v1 submitted 3 January, 2019; originally announced January 2019.

Comments: Accepted by International Conference on Control, Automation and Robotics (ICCAR) 2019. Supplementary Video: https://youtu.be/Wxx9pwwTIL4

arXiv:1810.03842 [pdf, other]

Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives

Authors: Abhik Singla, Shounak Bhattacharya, Dhaivat Dholakiya, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

Abstract: Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of… ▽ More Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of behaviors, namely trot, walk, gallop and bound from these kinematic motion primitives in our custom four legged robot, called the `Stoch'. D-RL is a data driven approach, which has been shown to be very effective for realizing all kinds of robust locomotion behaviors, both in simulation and in experiment. On the other hand, kMPs are known to capture the underlying structure of walking and yield a set of derived behaviors. We first generate walking gaits from D-RL, which uses policy gradient based approaches. We then analyze the resulting walking by using principal component analysis. We observe that the kMPs extracted from PCA followed a similar pattern irrespective of the type of gaits generated. Leveraging on this underlying structure, we then realize walking in Stoch by a straightforward reconstruction of joint trajectories from kMPs. This type of methodology improves the transferability of these gaits to real hardware, lowers the computational overhead on-board, and also avoids multiple training iterations by generating a set of derived behaviors from a single learned gait. △ Less

Submitted 26 February, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

Comments: Accepted by ICRA 2019. Supplementary Video: https://youtu.be/kiLKSqI4KhE

arXiv:1801.00618 [pdf, other]

Input to State Stability of Bipedal Walking Robots: Application to DURUS

Authors: Shishir Kolathaya, Jacob Reher, Aaron D. Ames

Abstract: Bipedal robots are a prime example of systems which exhibit highly nonlinear dynamics, underactuation, and undergo complex dissipative impacts. This paper discusses methods used to overcome a wide variety of uncertainties, with the end result being stable bipedal walking. The principal contribution of this paper is to establish sufficiency conditions for yielding input to state stable (ISS) hybrid… ▽ More Bipedal robots are a prime example of systems which exhibit highly nonlinear dynamics, underactuation, and undergo complex dissipative impacts. This paper discusses methods used to overcome a wide variety of uncertainties, with the end result being stable bipedal walking. The principal contribution of this paper is to establish sufficiency conditions for yielding input to state stable (ISS) hybrid periodic orbits, i.e., stable walking gaits under model-based and phase-based uncertainties. In particular, it will be shown formally that exponential input to state stabilization (e-ISS) of the continuous dynamics, and hybrid invariance conditions are enough to realize stable walking in the 23-DOF bipedal robot DURUS. This main result will be supported through successful and sustained walking of the bipedal robot DURUS in a laboratory environment. △ Less

Submitted 2 January, 2018; originally announced January 2018.

Comments: 16 pages, 10 figures

arXiv:1707.02258 [pdf, other]

Phase Uncertainty to State Stability of Continuous Periodic Orbits

Authors: Shishir Nadubettu Yadukumar Kolathaya

Abstract: The paper shows sufficiency conditions for stability of continuous periodic orbits under phase uncertainty. Phase based uncertainty is a trait of bipedal walking robots, where the desired trajectories are parameterized by a monotonous function. This monotonous function, called the phase variable, is often affected by intermittent perturbations due to noisy sensors. We will mainly focus on continuo… ▽ More The paper shows sufficiency conditions for stability of continuous periodic orbits under phase uncertainty. Phase based uncertainty is a trait of bipedal walking robots, where the desired trajectories are parameterized by a monotonous function. This monotonous function, called the phase variable, is often affected by intermittent perturbations due to noisy sensors. We will mainly focus on continuous periodic orbits obtained via parameterized trajectories, and then analyze their stability properties under a noisy phase estimation. In other words, our focus is on examples where phase variables are difficult to compute, and therefore are imperfect. We will show that stable periodic orbits subject to phase based uncertainty are input to state stable. △ Less

Submitted 7 July, 2017; originally announced July 2017.

Comments: 6 pages, 2 figures

arXiv:1608.02683 [pdf, other]

System Identification and Control of Valkyrie through SVA--Based Regressor Computation

Authors: Shishir Kolathaya, Benjamin J. Morris, Ryan W. Sinnet, Aaron D. Ames

Abstract: This paper demonstrates simultaneous identification and control of the humanoid robot, Valkyrie, utilizing Spatial Vector Algebra (SVA). In particular, the inertia, Coriolis-centrifugal and gravity terms for the dynamics of a robot are computed using spatial inertia tensors. With the assumption that the link lengths or the distance between the joint axes are accurately known, it will be shown that… ▽ More This paper demonstrates simultaneous identification and control of the humanoid robot, Valkyrie, utilizing Spatial Vector Algebra (SVA). In particular, the inertia, Coriolis-centrifugal and gravity terms for the dynamics of a robot are computed using spatial inertia tensors. With the assumption that the link lengths or the distance between the joint axes are accurately known, it will be shown that inertial properties of a robot can be directly evaluated from the inertia tensor. An algorithm is proposed to evaluate the regressor, yielding a run time of $O(n^2)$. The efficiency of this algorithm yields a means for online system identification via the SVA--based regressor and, as a byproduct, a method for accurate model-based control. Experimental validation of the proposed method is provided through its implementation in three case studies: offline identification of a double pendulum and a $4$-DOF robotic leg, and online identification and control of a $4$-DOF robotic arm. △ Less

Submitted 12 September, 2016; v1 submitted 8 August, 2016; originally announced August 2016.

Comments: 8 pages, 15 figures

Showing 1–29 of 29 results for author: Kolathaya, S