-
Block-level Text Spotting with LLMs
Authors:
Ganesh Bannur,
Bharadwaj Amrutur
Abstract:
Text spotting has seen tremendous progress in recent years yielding performant techniques which can extract text at the character, word or line level. However, extracting blocks of text from images (block-level text spotting) is relatively unexplored. Blocks contain more context than individual lines, words or characters and so block-level text spotting would enhance downstream applications, such…
▽ More
Text spotting has seen tremendous progress in recent years yielding performant techniques which can extract text at the character, word or line level. However, extracting blocks of text from images (block-level text spotting) is relatively unexplored. Blocks contain more context than individual lines, words or characters and so block-level text spotting would enhance downstream applications, such as translation, which benefit from added context. We propose a novel method, BTS-LLM (Block-level Text Spotting with LLMs), to identify text at the block level. BTS-LLM has three parts: 1) detecting and recognizing text at the line level, 2) grouping lines into blocks and 3) finding the best order of lines within a block using a large language model (LLM). We aim to exploit the strong semantic knowledge in LLMs for accurate block-level text spotting. Consequently if the text spotted is semantically meaningful but has been corrupted during text recognition, the LLM is also able to rectify mistakes in the text and produce a reconstruction of it.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
CORNET 2.0: A Co-Simulation Middleware for Robot Networks
Authors:
Srikrishna Acharya,
Bharadwaj Amrutur,
Mukunda Bharatheesha,
Yogesh Simmhan
Abstract:
We present a networked co-simulation framework for multi-robot systems applications. We require a simulation framework that captures both physical interactions and communications aspects to effectively design such complex systems. This is necessary to co-design the multi-robots' autonomy logic and the communication protocols. The proposed framework extends existing tools to simulate the robot's au…
▽ More
We present a networked co-simulation framework for multi-robot systems applications. We require a simulation framework that captures both physical interactions and communications aspects to effectively design such complex systems. This is necessary to co-design the multi-robots' autonomy logic and the communication protocols. The proposed framework extends existing tools to simulate the robot's autonomy and network-related aspects. We have used Gazebo with ROS/ROS2 to develop the autonomy logic for robots and mininet-WiFi as the network simulator to capture the cyber-physical systems properties of the multi-robot system. This framework addresses the need to seamlessly integrate the two simulation environments by synchronizing mobility and time, allowing for easy migration of the algorithms to real platforms. The framework supports container-based virtualization and extends a generic robotic framework by decoupling the data plane and control plane.
△ Less
Submitted 8 June, 2022; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Stochastic Action Prediction for Imitation Learning
Authors:
Sagar Gubbi Venkatesh,
Nihesh Rathod,
Shishir Kolathaya,
Bharadwaj Amrutur
Abstract:
Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions. When performing demonstrations, experts are not always consistent and might accomplish the same task in slightly different ways. In this paper, we demonstrate inherent stochasticity in demonstrations collected for tasks including line following…
▽ More
Imitation learning is a data-driven approach to acquiring skills that relies on expert demonstrations to learn a policy that maps observations to actions. When performing demonstrations, experts are not always consistent and might accomplish the same task in slightly different ways. In this paper, we demonstrate inherent stochasticity in demonstrations collected for tasks including line following with a remote-controlled car and manipulation tasks including reaching, pushing, and picking and placing an object. We model stochasticity in the data distribution using autoregressive action generation, generative adversarial nets, and variational prediction and compare the performance of these approaches. We find that accounting for stochasticity in the expert data leads to substantial improvement in the success rate of task completion.
△ Less
Submitted 26 December, 2020;
originally announced January 2021.
-
Scene Text Detection for Augmented Reality -- Character Bigram Approach to reduce False Positive Rate
Authors:
Sagar Gubbi,
Bharadwaj Amrutur
Abstract:
Natural scene text detection is an important aspect of scene understanding and could be a useful tool in building engaging augmented reality applications. In this work, we address the problem of false positives in text spotting. We propose improving the performace of sliding window text spotters by looking for character pairs (bigrams) rather than single characters. An efficient convolutional neur…
▽ More
Natural scene text detection is an important aspect of scene understanding and could be a useful tool in building engaging augmented reality applications. In this work, we address the problem of false positives in text spotting. We propose improving the performace of sliding window text spotters by looking for character pairs (bigrams) rather than single characters. An efficient convolutional neural network is designed and trained to detect bigrams. The proposed detector reduces false positive rate by 28.16% on the ICDAR 2015 dataset. We demonstrate that detecting bigrams is a computationally inexpensive way to improve sliding window text spotters.
△ Less
Submitted 26 December, 2020;
originally announced January 2021.
-
Multi-Instance Aware Localization for End-to-End Imitation Learning
Authors:
Sagar Gubbi Venkatesh,
Raviteja Upadrashta,
Shishir Kolathaya,
Bharadwaj Amrutur
Abstract:
Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map outp…
▽ More
Existing architectures for imitation learning using image-to-action policy networks perform poorly when presented with an input image containing multiple instances of the object of interest, especially when the number of expert demonstrations available for training are limited. We show that end-to-end policy networks can be trained in a sample efficient manner by (a) appending the feature map output of the vision layers with an embedding that can indicate instance preference or take advantage of an implicit preference present in the expert demonstrations, and (b) employing an autoregressive action generator network for the control layers. The proposed architecture for localization has improved accuracy and sample efficiency and can generalize to the presence of more instances of objects than seen during training. When used for end-to-end imitation learning to perform reach, push, and pick-and-place tasks on a real robot, training is achieved with as few as 15 expert demonstrations.
△ Less
Submitted 26 December, 2020;
originally announced January 2021.
-
Imitation Learning for High Precision Peg-in-Hole Tasks
Authors:
Sagar Gubbi,
Shishir Kolathaya,
Bharadwaj Amrutur
Abstract:
Industrial robot manipulators are not able to match the precision and speed with which humans are able to execute contact rich tasks even to this day. Therefore, as a means overcome this gap, we demonstrate generative methods for imitating a peg-in-hole insertion task in a 6-DOF robot manipulator. In particular, generative adversarial imitation learning (GAIL) is used to successfully achieve this…
▽ More
Industrial robot manipulators are not able to match the precision and speed with which humans are able to execute contact rich tasks even to this day. Therefore, as a means overcome this gap, we demonstrate generative methods for imitating a peg-in-hole insertion task in a 6-DOF robot manipulator. In particular, generative adversarial imitation learning (GAIL) is used to successfully achieve this task with a 10 um, and a 6 um peg-hole clearance on the Yaskawa GP8 industrial robot. Experimental results show that the policy successfully learns within 20 episodes from a handful of human expert demonstrations on the robot (i.e., < 10 tele-operated robot demonstrations). The insertion time improves from > 20 seconds (which also includes failed insertions) to < 15 seconds, thereby validating the effectiveness of this approach.
△ Less
Submitted 26 December, 2020;
originally announced January 2021.
-
Translating Natural Language Instructions to Computer Programs for Robot Manipulation
Authors:
Sagar Gubbi Venkatesh,
Raviteja Upadrashta,
Bharadwaj Amrutur
Abstract:
It is highly desirable for robots that work alongside humans to be able to understand instructions in natural language. Existing language conditioned imitation learning models directly predict the actuator commands from the image observation and the instruction text. Rather than directly predicting actuator commands, we propose translating the natural language instruction to a Python function whic…
▽ More
It is highly desirable for robots that work alongside humans to be able to understand instructions in natural language. Existing language conditioned imitation learning models directly predict the actuator commands from the image observation and the instruction text. Rather than directly predicting actuator commands, we propose translating the natural language instruction to a Python function which queries the scene by accessing the output of the object detector and controls the robot to perform the specified task. This enables the use of non-differentiable modules such as a constraint solver when computing commands to the robot. Moreover, the labels in this setup are significantly more informative computer programs that capture the intent of the expert rather than teleoperated demonstrations. We show that the proposed method performs better than training a neural network to directly predict the robot actions.
△ Less
Submitted 20 March, 2021; v1 submitted 26 December, 2020;
originally announced December 2020.
-
Spatial Reasoning from Natural Language Instructions for Robot Manipulation
Authors:
Sagar Gubbi Venkatesh,
Anirban Biswas,
Raviteja Upadrashta,
Vikram Srinivasan,
Partha Talukdar,
Bharadwaj Amrutur
Abstract:
Robots that can manipulate objects in unstructured environments and collaborate with humans can benefit immensely by understanding natural language. We propose a pipelined architecture of two stages to perform spatial reasoning on the text input. All the objects in the scene are first localized, and then the instruction for the robot in natural language and the localized co-ordinates are mapped to…
▽ More
Robots that can manipulate objects in unstructured environments and collaborate with humans can benefit immensely by understanding natural language. We propose a pipelined architecture of two stages to perform spatial reasoning on the text input. All the objects in the scene are first localized, and then the instruction for the robot in natural language and the localized co-ordinates are mapped to the start and end co-ordinates corresponding to the locations where the robot must pick up and place the object respectively. We show that representing the localized objects by quantizing their positions to a binary grid is preferable to representing them as a list of 2D co-ordinates. We also show that attention improves generalization and can overcome biases in the dataset. The proposed method is used to pick-and-place playing cards using a robot arm.
△ Less
Submitted 26 March, 2021; v1 submitted 26 December, 2020;
originally announced December 2020.
-
One-Shot Object Localization Using Learnt Visual Cues via Siamese Networks
Authors:
Sagar Gubbi Venkatesh,
Bharadwaj Amrutur
Abstract:
A robot that can operate in novel and unstructured environments must be capable of recognizing new, previously unseen, objects. In this work, a visual cue is used to specify a novel object of interest which must be localized in new environments. An end-to-end neural network equipped with a Siamese network is used to learn the cue, infer the object of interest, and then to localize it in new enviro…
▽ More
A robot that can operate in novel and unstructured environments must be capable of recognizing new, previously unseen, objects. In this work, a visual cue is used to specify a novel object of interest which must be localized in new environments. An end-to-end neural network equipped with a Siamese network is used to learn the cue, infer the object of interest, and then to localize it in new environments. We show that a simulated robot can pick-and-place novel objects pointed to by a laser pointer. We also evaluate the performance of the proposed approach on a dataset derived from the Omniglot handwritten character dataset and on a small dataset of toys.
△ Less
Submitted 26 December, 2020;
originally announced December 2020.
-
Teaching Robots Novel Objects by Pointing at Them
Authors:
Sagar Gubbi Venkatesh,
Raviteja Upadrashta,
Shishir Kolathaya,
Bharadwaj Amrutur
Abstract:
Robots that must operate in novel environments and collaborate with humans must be capable of acquiring new knowledge from human experts during operation. We propose teaching a robot novel objects it has not encountered before by pointing a hand at the new object of interest. An end-to-end neural network is used to attend to the novel object of interest indicated by the pointing hand and then to l…
▽ More
Robots that must operate in novel environments and collaborate with humans must be capable of acquiring new knowledge from human experts during operation. We propose teaching a robot novel objects it has not encountered before by pointing a hand at the new object of interest. An end-to-end neural network is used to attend to the novel object of interest indicated by the pointing hand and then to localize the object in new scenes. In order to attend to the novel object indicated by the pointing hand, we propose a spatial attention modulation mechanism that learns to focus on the highlighted object while ignoring the other objects in the scene. We show that a robot arm can manipulate novel objects that are highlighted by pointing a hand at them. We also evaluate the performance of the proposed architecture on a synthetic dataset constructed using emojis and on a real-world dataset of common objects.
△ Less
Submitted 25 December, 2020;
originally announced December 2020.
-
Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach
Authors:
Kartik Paigwar,
Lokesh Krishna,
Sashank Tirumala,
Naman Khetan,
Aditya Sagi,
Ashish Joglekar,
Shalabh Bhatnagar,
Ashitava Ghosal,
Bharadwaj Amrutur,
Shishir Kolathaya
Abstract:
In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtain…
▽ More
In this paper, with a view toward fast deployment of locomotion gaits in low-cost hardware, we use a linear policy for realizing end-foot trajectories in the quadruped robot, Stoch $2$. In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs. The corresponding desired joint angles are obtained via an inverse kinematics solver and tracked via a PID control law. Augmented Random Search, a model-free and a gradient-free learning algorithm is used to train this linear policy. Simulation results show that the resulting walking is robust to terrain slope variations and external pushes. This methodology is not only computationally light-weight but also uses minimal sensing and actuation capabilities in the robot, thereby justifying the approach.
△ Less
Submitted 10 November, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Robust and Scalable Techniques for TWR and TDoA based localization using Ultra Wide Band Radios
Authors:
Rakshit Ramesh,
Aaron John-Sabu,
Harshitha S,
Siddarth Ramesh,
Vishwas Navada B,
Mukunth Arunachalam,
Bharadwaj Amrutur
Abstract:
Current trends in autonomous vehicles and their applications indicates an increasing need in positioning at low battery and compute cost. Lidars provide accurate localization at the cost of high compute and power consumption which could be detrimental for drones. Modern requirements for autonomous drones such as No-Permit-No-Takeoff (NPNT) and applications restricting drones to a corridor require…
▽ More
Current trends in autonomous vehicles and their applications indicates an increasing need in positioning at low battery and compute cost. Lidars provide accurate localization at the cost of high compute and power consumption which could be detrimental for drones. Modern requirements for autonomous drones such as No-Permit-No-Takeoff (NPNT) and applications restricting drones to a corridor require the infrastructure to constantly determine the location of the drone. Ultra Wide Band Radios (UWB) fulfill such requirements and offer high precision localization and fast position update rates at a fraction of the cost and battery consumption as compared to lidars and also have greater network availability than GPS in a dense forested campus or an indoor setting. We present in this paper a novel protocol and technique to localize a drone for such applications using a Time Difference of Arrival (TDoA) approach. This further increases the position update rates without sacrificing on accuracy and compare it to traditional methods
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations
Authors:
Sashank Tirumala,
Sagar Gubbi,
Kartik Paigwar,
Aditya Sagi,
Ashish Joglekar,
Shalabh Bhatnagar,
Ashitava Ghosal,
Bharadwaj Amrutur,
Shishir Kolathaya
Abstract:
With the research into development of quadruped robots picking up pace, learning based techniques are being explored for developing locomotion controllers for such robots. A key problem is to generate leg trajectories for continuously varying target linear and angular velocities, in a stable manner. In this paper, we propose a two pronged approach to address this problem. First, multiple simpler p…
▽ More
With the research into development of quadruped robots picking up pace, learning based techniques are being explored for developing locomotion controllers for such robots. A key problem is to generate leg trajectories for continuously varying target linear and angular velocities, in a stable manner. In this paper, we propose a two pronged approach to address this problem. First, multiple simpler policies are trained to generate trajectories for a discrete set of target velocities and turning radius. These policies are then augmented using a higher level neural network for handling the transition between the learned trajectories. Specifically, we develop a neural network-based filter that takes in target velocity, radius and transforms them into new commands that enable smooth transitions to the new trajectory. This transformation is achieved by learning from expert demonstrations. An application of this is the transformation of a novice user's input into an expert user's input, thereby ensuring stable manoeuvres regardless of the user's experience. Training our proposed architecture requires much less expert demonstrations compared to standard neural network architectures. Finally, we demonstrate experimentally these results in the in-house quadruped Stoch 2.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Vermillion: A High-Performance Scalable IoT Middleware for Smart Cities
Authors:
Poorna Chandra Tejasvi,
Vasanth Rajaraman,
Arun Babu Puthuparambil,
Akhil Pankaj,
Bharadwaj Amrutur
Abstract:
With the massive increase in the number of IoT devices being deployed in smart cities, it becomes paramount for middlewares to be able to handle very high loads and support demanding use-cases. In order to do so, middlewares must scale horizontally while providing a commensurate increase in availability and throughput. Currently, most open-source IoT middlewares do not provide out-of-the-box suppo…
▽ More
With the massive increase in the number of IoT devices being deployed in smart cities, it becomes paramount for middlewares to be able to handle very high loads and support demanding use-cases. In order to do so, middlewares must scale horizontally while providing a commensurate increase in availability and throughput. Currently, most open-source IoT middlewares do not provide out-of-the-box support for scaling horizontally. In this paper, we present "Vermillion'', a scalable, secure and open-source IoT middleware for smart cities which provides in-built support for scaling-out. We make three contributions in this paper. Firstly, the middleware platform itself along with a formal process for data exchange between data producers and consumers. Secondly, we propose the use of hash-based federation to distribute and manage load across various message broker nodes while eliminating inter-node synchronisation overheads. Thirdly, we discuss a case study where Vermillion was deployed in a city and briefly discuss about deployment considerations using the obtained results.
△ Less
Submitted 14 March, 2020;
originally announced March 2020.
-
Gait Library Synthesis for Quadruped Robots via Augmented Random Search
Authors:
Sashank Tirumala,
Aditya Sagi,
Kartik Paigwar,
Ashish Joglekar,
Shalabh Bhatnagar,
Ashitava Ghosal,
Bharadwaj Amrutur,
Shishir Kolathaya
Abstract:
In this paper, with a view toward fast deployment of learned locomotion gaits in low-cost hardware, we generate a library of walking trajectories, namely, forward trot, backward trot, side-step, and turn in our custom-built quadruped robot, Stoch 2, using reinforcement learning. There are existing approaches that determine optimal policies for each time step, whereas we determine an optimal policy…
▽ More
In this paper, with a view toward fast deployment of learned locomotion gaits in low-cost hardware, we generate a library of walking trajectories, namely, forward trot, backward trot, side-step, and turn in our custom-built quadruped robot, Stoch 2, using reinforcement learning. There are existing approaches that determine optimal policies for each time step, whereas we determine an optimal policy, in the form of end-foot trajectories, for each half walking step i.e., swing phase and stance phase. The way-points for the foot trajectories are obtained from a linear policy, i.e., a linear function of the states of the robot, and cubic splines are used to interpolate between these points. Augmented Random Search, a model-free and gradient-free learning algorithm is used to learn the policy in simulation. This learned policy is then deployed on hardware, yielding a trajectory in every half walking step. Different locomotion patterns are learned in simulation by enforcing a preconfigured phase shift between the trajectories of different legs. The transition from one gait to another is achieved by using a low-pass filter for the phase, and the sim-to-real transfer is improved by a linear transformation of the states obtained through regression.
△ Less
Submitted 30 December, 2019;
originally announced December 2019.
-
Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots
Authors:
Shounak Bhattacharya,
Abhik Singla,
Abhimanyu,
Dhaivat Dholakiya,
Shalabh Bhatnagar,
Bharadwaj Amrutur,
Ashitava Ghosal,
Shishir Kolathaya
Abstract:
In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this pr…
▽ More
In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2. Fast quadrupedal locomotion with active spine is an extremely hard problem, and involves a complex coordination between the various degrees of freedom. Therefore, past attempts at addressing this problem have not seen much success. Deep-Reinforcement Learning seems to be a promising approach, after its recent success in a variety of robot platforms, and the goal of this paper is to use this approach to realize the aforementioned behaviors. With this learning framework, the robot reached a bounding speed of 2.1 m/s with a maximum Froude number of 2. Simulation results also show that use of active spine, indeed, increased the stride length, improved the cost of transport, and also reduced the natural frequency to more realistic values.
△ Less
Submitted 15 May, 2019; v1 submitted 15 May, 2019;
originally announced May 2019.
-
Design, Development and Experimental Realization of a Quadrupedal Research Platform: Stoch
Authors:
Dhaivat Dholakiya,
Shounak Bhattacharya,
Ajay Gunalan,
Abhik Singla,
Shalabh Bhatnagar,
Bharadwaj Amrutur,
Ashitava Ghosal,
Shishir Kolathaya
Abstract:
In this paper, we present a complete description of the hardware design and control architecture of our custom built quadruped robot, called the `Stoch'. Our goal is to realize a robust, modular, and a reliable quadrupedal platform, using which various locomotion behaviors are explored. This platform enables us to explore different research problems in legged locomotion, which use both traditional…
▽ More
In this paper, we present a complete description of the hardware design and control architecture of our custom built quadruped robot, called the `Stoch'. Our goal is to realize a robust, modular, and a reliable quadrupedal platform, using which various locomotion behaviors are explored. This platform enables us to explore different research problems in legged locomotion, which use both traditional and learning based techniques. We discuss the merits and limitations of the platform in terms of exploitation of available behaviours, fast rapid prototyping, reproduction and repair. Towards the end, we will demonstrate trotting, bounding behaviors, and preliminary results in turning. In addition, we will also show various gait transitions i.e., trot-to-turn and trot-to-bound behaviors.
△ Less
Submitted 27 February, 2019; v1 submitted 3 January, 2019;
originally announced January 2019.
-
Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives
Authors:
Abhik Singla,
Shounak Bhattacharya,
Dhaivat Dholakiya,
Shalabh Bhatnagar,
Ashitava Ghosal,
Bharadwaj Amrutur,
Shishir Kolathaya
Abstract:
Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of…
▽ More
Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of behaviors, namely trot, walk, gallop and bound from these kinematic motion primitives in our custom four legged robot, called the `Stoch'. D-RL is a data driven approach, which has been shown to be very effective for realizing all kinds of robust locomotion behaviors, both in simulation and in experiment. On the other hand, kMPs are known to capture the underlying structure of walking and yield a set of derived behaviors. We first generate walking gaits from D-RL, which uses policy gradient based approaches. We then analyze the resulting walking by using principal component analysis. We observe that the kMPs extracted from PCA followed a similar pattern irrespective of the type of gaits generated. Leveraging on this underlying structure, we then realize walking in Stoch by a straightforward reconstruction of joint trajectories from kMPs. This type of methodology improves the transferability of these gaits to real hardware, lowers the computational overhead on-board, and also avoids multiple training iterations by generating a set of derived behaviors from a single learned gait.
△ Less
Submitted 26 February, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.