-
BiRoDiff: Diffusion policies for bipedal robot locomotion on unseen terrains
Authors:
GVS Mothish,
Manan Tayal,
Shishir Kolathaya
Abstract:
Locomotion on unknown terrains is essential for bipedal robots to handle novel real-world challenges, thus expanding their utility in disaster response and exploration. In this work, we introduce a lightweight framework that learns a single walking controller that yields locomotion on multiple terrains. We have designed a real-time robot controller based on diffusion models, which not only capture…
▽ More
Locomotion on unknown terrains is essential for bipedal robots to handle novel real-world challenges, thus expanding their utility in disaster response and exploration. In this work, we introduce a lightweight framework that learns a single walking controller that yields locomotion on multiple terrains. We have designed a real-time robot controller based on diffusion models, which not only captures multiple behaviours with different velocities in a single policy but also generalizes well for unseen terrains. Our controller learns with offline data, which is better than online learning in aspects like scalability, simplicity in training scheme etc. We have designed and implemented a diffusion model-based policy controller in simulation on our custom-made Bipedal Robot model named Stoch BiRo. We have demonstrated its generalization capability and high frequency control step generation relative to typical generative models, which require huge onboarding compute.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Learning a Formally Verified Control Barrier Function in Stochastic Environment
Authors:
Manan Tayal,
Hongchao Zhang,
Pushpak Jagtap,
Andrew Clark,
Shishir Kolathaya
Abstract:
Safety is a fundamental requirement of control systems. Control Barrier Functions (CBFs) are proposed to ensure the safety of the control system by constructing safety filters or synthesizing control inputs. However, the safety guarantee and performance of safe controllers rely on the construction of valid CBFs. Inspired by universal approximatability, CBFs are represented by neural networks, know…
▽ More
Safety is a fundamental requirement of control systems. Control Barrier Functions (CBFs) are proposed to ensure the safety of the control system by constructing safety filters or synthesizing control inputs. However, the safety guarantee and performance of safe controllers rely on the construction of valid CBFs. Inspired by universal approximatability, CBFs are represented by neural networks, known as neural CBFs (NCBFs). This paper presents an algorithm for synthesizing formally verified continuous-time neural Control Barrier Functions in stochastic environments in a single step. The proposed training process ensures efficacy across the entire state space with only a finite number of data points by constructing a sample-based learning framework for Stochastic Neural CBFs (SNCBFs). Our methodology eliminates the need for post hoc verification by enforcing Lipschitz bounds on the neural network, its Jacobian, and Hessian terms. We demonstrate the effectiveness of our approach through case studies on the inverted pendulum system and obstacle avoidance in autonomous driving, showcasing larger safe regions compared to baseline methods.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
A Collision Cone Approach for Control Barrier Functions
Authors:
Manan Tayal,
Bhavya Giri Goswami,
Karthik Rajgopal,
Rajpal Singh,
Tejas Rao,
Jishnu Keshavan,
Pushpak Jagtap,
Shishir Kolathaya
Abstract:
This work presents a unified approach for collision avoidance using Collision-Cone Control Barrier Functions (CBFs) in both ground (UGV) and aerial (UAV) unmanned vehicles. We propose a novel CBF formulation inspired by collision cones, to ensure safety by constraining the relative velocity between the vehicle and the obstacle to always point away from each other. The efficacy of this approach is…
▽ More
This work presents a unified approach for collision avoidance using Collision-Cone Control Barrier Functions (CBFs) in both ground (UGV) and aerial (UAV) unmanned vehicles. We propose a novel CBF formulation inspired by collision cones, to ensure safety by constraining the relative velocity between the vehicle and the obstacle to always point away from each other. The efficacy of this approach is demonstrated through simulations and hardware implementations on the TurtleBot, Stoch-Jeep, and Crazyflie 2.1 quadrotor robot, showcasing its effectiveness in avoiding collisions with dynamic obstacles in both ground and aerial settings. The real-time controller is developed using CBF Quadratic Programs (CBF-QPs). Comparative analysis with the state-of-the-art CBFs highlights the less conservative nature of the proposed approach. Overall, this research contributes to a novel control formation that can give a guarantee for collision avoidance in unmanned vehicles by modifying the control inputs from existing path-planning controllers.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Barrier Functions Inspired Reward Shaping for Reinforcement Learning
Authors:
Nilaksh Nilaksh,
Abhishek Ranjan,
Shreenabh Agrawal,
Aayush Jain,
Pushpak Jagtap,
Shishir Kolathaya
Abstract:
Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward shaping is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-shaping framework inspired by bar…
▽ More
Reinforcement Learning (RL) has progressed from simple control tasks to complex real-world challenges with large state spaces. While RL excels in these tasks, training time remains a limitation. Reward shaping is a popular solution, but existing methods often rely on value functions, which face scalability issues. This paper presents a novel safety-oriented reward-shaping framework inspired by barrier functions, offering simplicity and ease of implementation across various environments and tasks. To evaluate the effectiveness of the proposed reward formulations, we conduct simulation experiments on CartPole, Ant, and Humanoid environments, along with real-world deployment on the Unitree Go1 quadruped robot. Our results demonstrate that our method leads to 1.4-2.8 times faster convergence and as low as 50-60% actuation effort compared to the vanilla reward. In a sim-to-real experiment with the Go1 robot, we demonstrated better control and dynamics of the bot with our reward framework.
△ Less
Submitted 1 April, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
Stoch BiRo: Design and Control of a low cost bipedal robot
Authors:
GVS Mothish,
Karthik Rajgopal,
Ravi Kola,
Manan Tayal,
Shishir Kolathaya
Abstract:
This paper introduces the Stoch BiRo, a cost-effective bipedal robot designed with a modular mechanical structure having point feet to navigate uneven and unfamiliar terrains. The robot employs proprioceptive actuation in abduction, hips, and knees, leveraging a Raspberry Pi4 for control. Overcoming computational limitations, a Learning-based Linear Policy controller manages balance and locomotion…
▽ More
This paper introduces the Stoch BiRo, a cost-effective bipedal robot designed with a modular mechanical structure having point feet to navigate uneven and unfamiliar terrains. The robot employs proprioceptive actuation in abduction, hips, and knees, leveraging a Raspberry Pi4 for control. Overcoming computational limitations, a Learning-based Linear Policy controller manages balance and locomotion with only 3 degrees of freedom (DoF) per leg, distinct from the typical 5DoF in bipedal systems. Integrated within a modular control architecture, these controllers enable autonomous handling of unforeseen terrain disturbances without external sensors or prior environment knowledge. The robot's policies are trained and simulated using MuJoCo, transferring learned behaviors to the Stoch BiRo hardware for initial walking validations. This work highlights the Stoch BiRo's adaptability and cost-effectiveness in mechanical design, control strategies, and autonomous navigation, promising diverse applications in real-world robotics scenarios.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Polygonal Cone Control Barrier Functions (PolyC2BF) for safe navigation in cluttered environments
Authors:
Manan Tayal,
Shishir Kolathaya
Abstract:
In fields such as mining, search and rescue, and archaeological exploration, ensuring real-time, collision-free navigation of robots in confined, cluttered environments is imperative. Despite the value of established path planning algorithms, they often face challenges in convergence rates and handling dynamic infeasibilities. Alternative techniques like collision cones struggle to accurately repr…
▽ More
In fields such as mining, search and rescue, and archaeological exploration, ensuring real-time, collision-free navigation of robots in confined, cluttered environments is imperative. Despite the value of established path planning algorithms, they often face challenges in convergence rates and handling dynamic infeasibilities. Alternative techniques like collision cones struggle to accurately represent complex obstacle geometries. This paper introduces a novel category of control barrier functions, known as Polygonal Cone Control Barrier Function (PolyC2BF), which addresses overestimation and computational complexity issues. The proposed PolyC2BF, formulated as a Quadratic Programming (QP) problem, proves effective in facilitating collision-free movement of multiple robots in complex environments. The efficacy of this approach is further demonstrated through PyBullet simulations on quadruped (unicycle model), and crazyflie 2.1 (quadrotor model) in cluttered environments.
△ Less
Submitted 27 March, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Collision Cone Control Barrier Functions: Experimental Validation on UGVs for Kinematic Obstacle Avoidance
Authors:
Bhavya Giri Goswami,
Manan Tayal,
Karthik Rajgopal,
Pushpak Jagtap,
Shishir Kolathaya
Abstract:
Autonomy advances have enabled robots in diverse environments and close human interaction, necessitating controllers with formal safety guarantees. This paper introduces an experimental platform designed for the validation and demonstration of a novel class of Control Barrier Functions (CBFs) tailored for Unmanned Ground Vehicles (UGVs) to proactively prevent collisions with kinematic obstacles by…
▽ More
Autonomy advances have enabled robots in diverse environments and close human interaction, necessitating controllers with formal safety guarantees. This paper introduces an experimental platform designed for the validation and demonstration of a novel class of Control Barrier Functions (CBFs) tailored for Unmanned Ground Vehicles (UGVs) to proactively prevent collisions with kinematic obstacles by integrating the concept of collision cones. While existing CBF formulations excel with static obstacles, extensions to torque/acceleration-controlled unicycle and bicycle models have seen limited success. Conventional CBF applications in nonholonomic UGV models have demonstrated control conservatism, particularly in scenarios where steering/thrust control was deemed infeasible. Drawing inspiration from collision cones in path planning, we present a pioneering CBF formulation ensuring theoretical safety guarantees for both unicycle and bicycle models. The core premise revolves around aligning the obstacle's velocity away from the vehicle, establishing a constraint to perpetually avoid vectors directed towards it. This control methodology is rigorously validated through simulations and experimental verification on the Copernicus mobile robot (Unicycle Model) and FOCAS-Car (Bicycle Model).
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Safe Legged Locomotion using Collision Cone Control Barrier Functions (C3BFs)
Authors:
Manan Tayal,
Shishir Kolathaya
Abstract:
Legged robots exhibit significant potential across diverse applications, including but not limited to hazardous environment search and rescue missions and the exploration of unexplored regions both on Earth and in outer space. However, the successful navigation of these robots in dynamic environments heavily hinges on the implementation of efficient collision avoidance techniques. In this research…
▽ More
Legged robots exhibit significant potential across diverse applications, including but not limited to hazardous environment search and rescue missions and the exploration of unexplored regions both on Earth and in outer space. However, the successful navigation of these robots in dynamic environments heavily hinges on the implementation of efficient collision avoidance techniques. In this research paper, we employ Collision Cone Control Barrier Functions (C3BF) to ensure the secure movement of legged robots within environments featuring a wide array of static and dynamic obstacles. We introduce the Quadratic Program (QP) formulation of C3BF, referred to as C3BF-QP, which serves as a protective filter layer atop a reference controller to ensure the robots' safety during operation. The effectiveness of this approach is illustrated through simulations conducted on PyBullet.
△ Less
Submitted 28 March, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Off-Policy Average Reward Actor-Critic with Deterministic Policy Search
Authors:
Naman Saxena,
Subhojyoti Khastigir,
Shishir Kolathaya,
Shalabh Bhatnagar
Abstract:
The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy…
▽ More
The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $ε$-optimal stationary policy with a sample complexity of $Ω(ε^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments.
△ Less
Submitted 19 July, 2023; v1 submitted 20 May, 2023;
originally announced May 2023.
-
Control Barrier Functions in Dynamic UAVs for Kinematic Obstacle Avoidance: A Collision Cone Approach
Authors:
Manan Tayal,
Rajpal Singh,
Jishnu Keshavan,
Shishir Kolathaya
Abstract:
Unmanned aerial vehicles (UAVs), specifically quadrotors, have revolutionized various industries with their maneuverability and versatility, but their safe operation in dynamic environments heavily relies on effective collision avoidance techniques. This paper introduces a novel technique for safely navigating a quadrotor along a desired route while avoiding kinematic obstacles. We propose a new c…
▽ More
Unmanned aerial vehicles (UAVs), specifically quadrotors, have revolutionized various industries with their maneuverability and versatility, but their safe operation in dynamic environments heavily relies on effective collision avoidance techniques. This paper introduces a novel technique for safely navigating a quadrotor along a desired route while avoiding kinematic obstacles. We propose a new constraint formulation that employs control barrier functions (CBFs) and collision cones to ensure that the relative velocity between the quadrotor and the obstacle always avoids a cone of vectors that may lead to a collision. By showing that the proposed constraint is a valid CBF for quadrotors, we are able to leverage its real-time implementation via Quadratic Programs (QPs), called the CBF-QPs. Validation includes PyBullet simulations and hardware experiments on Crazyflie 2.1, demonstrating effectiveness in static and moving obstacle scenarios. Comparative analysis with literature, especially higher order CBF-QPs, highlights the proposed approach's less conservative nature. Simulation and Hardware videos are available here: https://tayalmanan28.github.io/C3BF-UAV/
△ Less
Submitted 15 March, 2024; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Control Barrier Functions in UGVs for Kinematic Obstacle Avoidance: A Collision Cone Approach
Authors:
Phani Thontepu,
Bhavya Giri Goswami,
Manan Tayal,
Neelaksh Singh,
Shyamsundar P I,
Shyam Sundar M G,
Suresh Sundaram,
Vaibhav Katewa,
Shishir Kolathaya
Abstract:
In this paper, we propose a new class of Control Barrier Functions (CBFs) for Unmanned Ground Vehicles (UGVs) that help avoid collisions with kinematic (non-zero velocity) obstacles. While the current forms of CBFs have been successful in guaranteeing safety/collision avoidance with static obstacles, extensions for the dynamic case have seen limited success. Moreover, with the UGV models like the…
▽ More
In this paper, we propose a new class of Control Barrier Functions (CBFs) for Unmanned Ground Vehicles (UGVs) that help avoid collisions with kinematic (non-zero velocity) obstacles. While the current forms of CBFs have been successful in guaranteeing safety/collision avoidance with static obstacles, extensions for the dynamic case have seen limited success. Moreover, with the UGV models like the unicycle or the bicycle, applications of existing CBFs have been conservative in terms of control, i.e., steering/thrust control has not been possible under certain scenarios. Drawing inspiration from the classical use of collision cones for obstacle avoidance in trajectory planning, we introduce its novel CBF formulation with theoretical guarantees on safety for both the unicycle and bicycle models. The main idea is to ensure that the velocity of the obstacle w.r.t. the vehicle is always pointing away from the vehicle. Accordingly, we construct a constraint that ensures that the velocity vector always avoids a cone of vectors pointing at the vehicle. The efficacy of this new control methodology is later verified by Pybullet simulations on TurtleBot3 and F1Tenth.
△ Less
Submitted 16 October, 2023; v1 submitted 23 September, 2022;
originally announced September 2022.
-
Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking
Authors:
Eshwar S R,
Shishir Kolathaya,
Gugan Thoppe
Abstract:
Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimate…
▽ More
Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimated by interacting several times with the environment using that policy. This leads to a lot of wasteful interactions since, once the ranking is done, only the data associated with the top-ranked policies is used for subsequent learning. To improve sample efficiency, we propose a novel off-policy alternative for ranking, based on a local approximation for the fitness function. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70% as much data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well.
△ Less
Submitted 21 February, 2023; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Dynamic Mirror Descent based Model Predictive Control for Accelerating Robot Learning
Authors:
Utkarsh A. Mishra,
Soumya R. Samineni,
Prakhar Goel,
Chandravaran Kunjeti,
Himanshu Lodha,
Aman Singh,
Aditya Sagi,
Shalabh Bhatnagar,
Shishir Kolathaya
Abstract:
Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two…
▽ More
Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two loops are proposed, where the Dynamic Mirror Descent based Model Predictive Control (DMD-MPC) is used as the inner loop Mb-RL to obtain an optimal sequence of actions. These actions are in turn used to significantly accelerate the outer loop Mf-RL. We show that our formulation is generic for a broad class of MPC-based policies and objectives, and includes some of the well-known Mb-Mf approaches. We finally introduce a new algorithm: Mirror-Descent Model Predictive RL (M-DeMoRL), which uses Cross-Entropy Method (CEM) with elite fractions for the inner loop. Our experiments show faster convergence of the proposed hierarchical approach on benchmark MuJoCo tasks. We also demonstrate hardware training for trajectory tracking in a 2R leg and hardware transfer for robust walking in a quadruped. We show that the inner-loop Mb-RL significantly decreases the number of training iterations required in the real system, thereby validating the proposed approach.
△ Less
Submitted 4 November, 2021;
originally announced December 2021.
-
Linear Policies are Sufficient to Realize Robust Bipedal Walking on Challenging Terrains
Authors:
Lokesh Krishna,
Guillermo A. Castillo,
Utkarsh A. Mishra,
Ayonga Hereid,
Shishir Kolathaya
Abstract:
In this work, we demonstrate robust walking in the bipedal robot Digit on uneven terrains by just learning a single linear policy. In particular, we propose a new control pipeline, wherein the high-level trajectory modulator shapes the end-foot ellipsoidal trajectories, and the low-level gait controller regulates the torso and ankle orientation. The foot-trajectory modulator uses a linear policy a…
▽ More
In this work, we demonstrate robust walking in the bipedal robot Digit on uneven terrains by just learning a single linear policy. In particular, we propose a new control pipeline, wherein the high-level trajectory modulator shapes the end-foot ellipsoidal trajectories, and the low-level gait controller regulates the torso and ankle orientation. The foot-trajectory modulator uses a linear policy and the regulator uses a linear PD control law. As opposed to neural network-based policies, the proposed linear policy has only 13 learnable parameters, thereby not only guaranteeing sample efficient learning but also enabling simplicity and interpretability of the policy. This is achieved with no loss of performance on challenging terrains like slopes, stairs and outdoor landscapes. We first demonstrate robust walking in the custom simulation environment, MuJoCo, and then directly transfer to hardware with no modification of the control pipeline. We subject the biped to a series of pushes and terrain height changes, both indoors and outdoors, thereby validating the presented work.
△ Less
Submitted 5 October, 2021; v1 submitted 26 September, 2021;
originally announced September 2021.
-
Learning Linear Policies for Robust Bipedal Locomotion on Terrains with Varying Slopes
Authors:
Lokesh Krishna,
Utkarsh A. Mishra,
Guillermo A. Castillo,
Ayonga Hereid,
Shishir Kolathaya
Abstract:
In this paper, with a view toward deployment of light-weight control frameworks for bipedal walking robots, we realize end-foot trajectories that are shaped by a single linear feedback policy. We learn this policy via a model-free and a gradient-free learning algorithm, Augmented Random Search (ARS), in the two robot platforms Rabbit and Digit. Our contributions are two-fold: a) By using torso and…
▽ More
In this paper, with a view toward deployment of light-weight control frameworks for bipedal walking robots, we realize end-foot trajectories that are shaped by a single linear feedback policy. We learn this policy via a model-free and a gradient-free learning algorithm, Augmented Random Search (ARS), in the two robot platforms Rabbit and Digit. Our contributions are two-fold: a) By using torso and support plane orientation as inputs, we achieve robust walking on slopes of up to 20 degrees in simulation. b) We demonstrate additional behaviors like walking backwards, stepping-in-place, and recovery from external pushes of up to 120 N. The end result is a robust and a fast feedback control law for bipedal walking on terrains with varying slopes. Towards the end, we also provide preliminary results of hardware transfer to Digit.
△ Less
Submitted 9 August, 2021; v1 submitted 4 April, 2021;
originally announced April 2021.
-
Stochastic Action Prediction for Imitation Learning
Authors:
Sagar Gubbi Venkatesh,
Nihesh Rathod,
Shishir Kolathaya,
Bharadwaj Amrutur
Abstract:
Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions. When performing demonstrations, experts are not always consistent and might accomplish the same task in slightly different ways. In this paper, we demonstrate inherent stochasticity in demonstrations collected for tasks including line following…
▽ More
Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions. When performing demonstrations, experts are not always consistent and might accomplish the same task in slightly different ways. In this paper, we demonstrate inherent stochasticity in demonstrations collected for tasks including line following with a remote-controlled car and manipulation tasks including reaching, pushing, and picking and placing an object. We model stochasticity in the data distribution using autoregressive action generation, generative adversarial nets, and variational prediction and compare the performance of these approaches. We find that accounting for stochasticity in the expert data leads to substantial improvement in the success rate of task completion.
△ Less
Submitted 26 December, 2020;
originally announced January 2021.
-
Multi-Instance Aware Localization for End-to-End Imitation Learning
Authors:
Sagar Gubbi Venkatesh,
Raviteja Upadrashta,
Shishir Kolathaya,
Bharadwaj Amrutur
Abstract:
Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map outp…
▽ More
Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map output of the vision layers with an embedding that can indicate instance preference or take advantage of an implicit preference present in the expert demonstrations, and (b) employing an autoregressive action generator network for the control layers. The proposed architecture for localization has improved accuracy and sample efficiency and can generalize to the presence of more instances of objects than seen during training. When used for end-to-end imitation learning to perform reach, push, and pick-and-place tasks on a real robot, training is achieved with as few as 15 expert demonstrations.
△ Less
Submitted 26 December, 2020;
originally announced January 2021.
-
Imitation Learning for High Precision Peg-in-Hole Tasks
Authors:
Sagar Gubbi,
Shishir Kolathaya,
Bharadwaj Amrutur
Abstract:
Industrial robot manipulators are not able to match the precision and speed with which humans are able to execute contact rich tasks even to this day. Therefore, as a means overcome this gap, we demonstrate generative methods for imitating a peg-in-hole insertion task in a 6-DOF robot manipulator. In particular, generative adversarial imitation learning (GAIL) is used to successfully achieve this…
▽ More
Industrial robot manipulators are not able to match the precision and speed with which humans are able to execute contact rich tasks even to this day. Therefore, as a means overcome this gap, we demonstrate generative methods for imitating a peg-in-hole insertion task in a 6-DOF robot manipulator. In particular, generative adversarial imitation learning (GAIL) is used to successfully achieve this task with a 10 um, and a 6 um peg-hole clearance on the Yaskawa GP8 industrial robot. Experimental results show that the policy successfully learns within 20 episodes from a handful of human expert demonstrations on the robot (i.e., < 10 tele-operated robot demonstrations). The insertion time improves from > 20 seconds (which also includes failed insertions) to < 15 seconds, thereby validating the effectiveness of this approach.
△ Less
Submitted 26 December, 2020;
originally announced January 2021.
-
Teaching Robots Novel Objects by Pointing at Them
Authors:
Sagar Gubbi Venkatesh,
Raviteja Upadrashta,
Shishir Kolathaya,
Bharadwaj Amrutur
Abstract:
Robots that must operate in novel environments and collaborate with humans must be capable of acquiring new knowledge from human experts during operation. We propose teaching a robot novel objects it has not encountered before by pointing a hand at the new object of interest. An end-to-end neural network is used to attend to the novel object of interest indicated by the pointing hand and then to l…
▽ More
Robots that must operate in novel environments and collaborate with humans must be capable of acquiring new knowledge from human experts during operation. We propose teaching a robot novel objects it has not encountered before by pointing a hand at the new object of interest. An end-to-end neural network is used to attend to the novel object of interest indicated by the pointing hand and then to localize the object in new scenes. In order to attend to the novel object indicated by the pointing hand, we propose a spatial attention modulation mechanism that learns to focus on the highlighted object while ignoring the other objects in the scene. We show that a robot arm can manipulate novel objects that are highlighted by pointing a hand at them. We also evaluate the performance of the proposed architecture on a synthetic dataset constructed using emojis and on a real-world dataset of common objects.
△ Less
Submitted 25 December, 2020;
originally announced December 2020.
-
Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach
Authors:
Kartik Paigwar,
Lokesh Krishna,
Sashank Tirumala,
Naman Khetan,
Aditya Sagi,
Ashish Joglekar,
Shalabh Bhatnagar,
Ashitava Ghosal,
Bharadwaj Amrutur,
Shishir Kolathaya
Abstract:
In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtain…
▽ More
In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtained via an inverse kinematics solver and tracked via a PID control law. Augmented Random Search, a model-free and a gradient-free learning algorithm is used to train this linear policy. Simulation results show that the resulting walking is robust to terrain slope variations and external pushes. This methodology is not only computationally light-weight but also uses minimal sensing and actuation capabilities in the robot, thereby justifying the approach.
△ Less
Submitted 10 November, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations
Authors:
Sashank Tirumala,
Sagar Gubbi,
Kartik Paigwar,
Aditya Sagi,
Ashish Joglekar,
Shalabh Bhatnagar,
Ashitava Ghosal,
Bharadwaj Amrutur,
Shishir Kolathaya
Abstract:
With the research into development of quadruped robots picking up pace, learning based techniques are being explored for developing locomotion controllers for such robots. A key problem is to generate leg trajectories for continuously varying target linear and angular velocities, in a stable manner. In this paper, we propose a two pronged approach to address this problem. First, multiple simpler p…
▽ More
With the research into development of quadruped robots picking up pace, learning based techniques are being explored for developing locomotion controllers for such robots. A key problem is to generate leg trajectories for continuously varying target linear and angular velocities, in a stable manner. In this paper, we propose a two pronged approach to address this problem. First, multiple simpler policies are trained to generate trajectories for a discrete set of target velocities and turning radius. These policies are then augmented using a higher level neural network for handling the transition between the learned trajectories. Specifically, we develop a neural network-based filter that takes in target velocity, radius and transforms them into new commands that enable smooth transitions to the new trajectory. This transformation is achieved by learning from expert demonstrations. An application of this is the transformation of a novice user's input into an expert user's input, thereby ensuring stable manoeuvres regardless of the user's experience. Training our proposed architecture requires much less expert demonstrations compared to standard neural network architectures. Finally, we demonstrate experimentally these results in the in-house quadruped Stoch 2.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Local Stability of PD Controlled Bipedal Walking Robots
Authors:
Shishir Kolathaya
Abstract:
We establish stability results for PD tracking control laws in bipedal walking robots. Stability of PD control laws for continuous robotic systems is an established result, and we extend this for hybrid robotic systems, an alternating sequence of continuous and discrete events. Bipedal robots have the leg-swing as the continuous event, and the foot-strike as the discrete event. In addition, bipeds…
▽ More
We establish stability results for PD tracking control laws in bipedal walking robots. Stability of PD control laws for continuous robotic systems is an established result, and we extend this for hybrid robotic systems, an alternating sequence of continuous and discrete events. Bipedal robots have the leg-swing as the continuous event, and the foot-strike as the discrete event. In addition, bipeds largely have underactuations due to the interactions between feet and ground. For each continuous event, we establish that the convergence rate of the tracking error can be regulated via appropriate tuning of the PD gains; and for each discrete event, we establish that this convergence rate sufficiently overcomes the nonlinear impacts by assumptions on the hybrid zero dynamics. The main contributions are 1) Extension of the stability results of PD control laws for underactuated robotic systems, and 2) Exponential ultimate boundedness of hybrid periodic orbits under the assumption of exponential stability of their projections to the hybrid zero dynamics. Towards the end, we will validate these results in a 2-link bipedal walker in simulation.
△ Less
Submitted 1 January, 2020;
originally announced January 2020.
-
Gait Library Synthesis for Quadruped Robots via Augmented Random Search
Authors:
Sashank Tirumala,
Aditya Sagi,
Kartik Paigwar,
Ashish Joglekar,
Shalabh Bhatnagar,
Ashitava Ghosal,
Bharadwaj Amrutur,
Shishir Kolathaya
Abstract:
In this paper, with a view toward fast deployment of learned locomotion gaits in low-cost hardware, we generate a library of walking trajectories, namely, forward trot, backward trot, side-step, and turn in our custom-built quadruped robot, Stoch 2, using reinforcement learning. There are existing approaches that determine optimal policies for each time step, whereas we determine an optimal policy…
▽ More
In this paper, with a view toward fast deployment of learned locomotion gaits in low-cost hardware, we generate a library of walking trajectories, namely, forward trot, backward trot, side-step, and turn in our custom-built quadruped robot, Stoch 2, using reinforcement learning. There are existing approaches that determine optimal policies for each time step, whereas we determine an optimal policy, in the form of end-foot trajectories, for each half walking step i.e., swing phase and stance phase. The way-points for the foot trajectories are obtained from a linear policy, i.e., a linear function of the states of the robot, and cubic splines are used to interpolate between these points. Augmented Random Search, a model-free and gradient-free learning algorithm is used to learn the policy in simulation. This learned policy is then deployed on hardware, yielding a trajectory in every half walking step. Different locomotion patterns are learned in simulation by enforcing a preconfigured phase shift between the trajectories of different legs. The transition from one gait to another is achieved by using a low-pass filter for the phase, and the sim-to-real transfer is improved by a linear transformation of the states obtained through regression.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots
Authors:
Shounak Bhattacharya,
Abhik Singla,
Abhimanyu,
Dhaivat Dholakiya,
Shalabh Bhatnagar,
Bharadwaj Amrutur,
Ashitava Ghosal,
Shishir Kolathaya
Abstract:
In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this pr…
▽ More
In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this problem have not seen much success. Deep-Reinforcement Learning seems to be a promising approach, after its recent success in a variety of robot platforms, and the goal of this paper is to use this approach to realize the aforementioned behaviors. With this learning framework, the robot reached a bounding speed of 2.1 m/s with a maximum Froude number of 2. Simulation results also show that use of active spine, indeed, increased the stride length, improved the cost of transport, and also reduced the natural frequency to more realistic values.
△ Less
Submitted 15 May, 2019; v1 submitted 15 May, 2019;
originally announced May 2019.
-
Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch
Authors:
Dhaivat Dholakiya,
Shounak Bhattacharya,
Ajay Gunalan,
Abhik Singla,
Shalabh Bhatnagar,
Bharadwaj Amrutur,
Ashitava Ghosal,
Shishir Kolathaya
Abstract:
In this paper, we present a complete description of the hardware design and control architecture of our custom built quadruped robot, called the `Stoch'. Our goal is to realize a robust, modular, and a reliable quadrupedal platform, using which various locomotion behaviors are explored. This platform enables us to explore different research problems in legged locomotion, which use both traditional…
▽ More
In this paper, we present a complete description of the hardware design and control architecture of our custom built quadruped robot, called the `Stoch'. Our goal is to realize a robust, modular, and a reliable quadrupedal platform, using which various locomotion behaviors are explored. This platform enables us to explore different research problems in legged locomotion, which use both traditional and learning based techniques. We discuss the merits and limitations of the platform in terms of exploitation of available behaviours, fast rapid prototyping, reproduction and repair. Towards the end, we will demonstrate trotting, bounding behaviors, and preliminary results in turning. In addition, we will also show various gait transitions i.e., trot-to-turn and trot-to-bound behaviors.
△ Less
Submitted 27 February, 2019; v1 submitted 3 January, 2019;
originally announced January 2019.
-
Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives
Authors:
Abhik Singla,
Shounak Bhattacharya,
Dhaivat Dholakiya,
Shalabh Bhatnagar,
Ashitava Ghosal,
Bharadwaj Amrutur,
Shishir Kolathaya
Abstract:
Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of…
▽ More
Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of behaviors, namely trot, walk, gallop and bound from these kinematic motion primitives in our custom four legged robot, called the `Stoch'. D-RL is a data driven approach, which has been shown to be very effective for realizing all kinds of robust locomotion behaviors, both in simulation and in experiment. On the other hand, kMPs are known to capture the underlying structure of walking and yield a set of derived behaviors. We first generate walking gaits from D-RL, which uses policy gradient based approaches. We then analyze the resulting walking by using principal component analysis. We observe that the kMPs extracted from PCA followed a similar pattern irrespective of the type of gaits generated. Leveraging on this underlying structure, we then realize walking in Stoch by a straightforward reconstruction of joint trajectories from kMPs. This type of methodology improves the transferability of these gaits to real hardware, lowers the computational overhead on-board, and also avoids multiple training iterations by generating a set of derived behaviors from a single learned gait.
△ Less
Submitted 26 February, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Input to State Stability of Bipedal Walking Robots: Application to DURUS
Authors:
Shishir Kolathaya,
Jacob Reher,
Aaron D. Ames
Abstract:
Bipedal robots are a prime example of systems which exhibit highly nonlinear dynamics, underactuation, and undergo complex dissipative impacts. This paper discusses methods used to overcome a wide variety of uncertainties, with the end result being stable bipedal walking. The principal contribution of this paper is to establish sufficiency conditions for yielding input to state stable (ISS) hybrid…
▽ More
Bipedal robots are a prime example of systems which exhibit highly nonlinear dynamics, underactuation, and undergo complex dissipative impacts. This paper discusses methods used to overcome a wide variety of uncertainties, with the end result being stable bipedal walking. The principal contribution of this paper is to establish sufficiency conditions for yielding input to state stable (ISS) hybrid periodic orbits, i.e., stable walking gaits under model-based and phase-based uncertainties. In particular, it will be shown formally that exponential input to state stabilization (e-ISS) of the continuous dynamics, and hybrid invariance conditions are enough to realize stable walking in the 23-DOF bipedal robot DURUS. This main result will be supported through successful and sustained walking of the bipedal robot DURUS in a laboratory environment.
△ Less
Submitted 2 January, 2018;
originally announced January 2018.
-
Phase Uncertainty to State Stability of Continuous Periodic Orbits
Authors:
Shishir Nadubettu Yadukumar Kolathaya
Abstract:
The paper shows sufficiency conditions for stability of continuous periodic orbits under phase uncertainty. Phase based uncertainty is a trait of bipedal walking robots, where the desired trajectories are parameterized by a monotonous function. This monotonous function, called the phase variable, is often affected by intermittent perturbations due to noisy sensors. We will mainly focus on continuo…
▽ More
The paper shows sufficiency conditions for stability of continuous periodic orbits under phase uncertainty. Phase based uncertainty is a trait of bipedal walking robots, where the desired trajectories are parameterized by a monotonous function. This monotonous function, called the phase variable, is often affected by intermittent perturbations due to noisy sensors. We will mainly focus on continuous periodic orbits obtained via parameterized trajectories, and then analyze their stability properties under a noisy phase estimation. In other words, our focus is on examples where phase variables are difficult to compute, and therefore are imperfect. We will show that stable periodic orbits subject to phase based uncertainty are input to state stable.
△ Less
Submitted 7 July, 2017;
originally announced July 2017.
-
System Identification and Control of Valkyrie through SVA--Based Regressor Computation
Authors:
Shishir Kolathaya,
Benjamin J. Morris,
Ryan W. Sinnet,
Aaron D. Ames
Abstract:
This paper demonstrates simultaneous identification and control of the humanoid robot, Valkyrie, utilizing Spatial Vector Algebra (SVA). In particular, the inertia, Coriolis-centrifugal and gravity terms for the dynamics of a robot are computed using spatial inertia tensors. With the assumption that the link lengths or the distance between the joint axes are accurately known, it will be shown that…
▽ More
This paper demonstrates simultaneous identification and control of the humanoid robot, Valkyrie, utilizing Spatial Vector Algebra (SVA). In particular, the inertia, Coriolis-centrifugal and gravity terms for the dynamics of a robot are computed using spatial inertia tensors. With the assumption that the link lengths or the distance between the joint axes are accurately known, it will be shown that inertial properties of a robot can be directly evaluated from the inertia tensor. An algorithm is proposed to evaluate the regressor, yielding a run time of $O(n^2)$. The efficiency of this algorithm yields a means for online system identification via the SVA--based regressor and, as a byproduct, a method for accurate model-based control. Experimental validation of the proposed method is provided through its implementation in three case studies: offline identification of a double pendulum and a $4$-DOF robotic leg, and online identification and control of a $4$-DOF robotic arm.
△ Less
Submitted 12 September, 2016; v1 submitted 8 August, 2016;
originally announced August 2016.