-
A Backbone for Long-Horizon Robot Task Understanding
Authors:
Xiaoshuai Chen,
Wei Chen,
Dongmyoung Lee,
Yukun Ge,
Nicolas Rojas,
Petar Kormushev
Abstract:
End-to-end robot learning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability. This framework uses therbligs (basic action elements) as the backbone to decompose high-level robot tasks into elemental robo…
▽ More
End-to-end robot learning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability. This framework uses therbligs (basic action elements) as the backbone to decompose high-level robot tasks into elemental robot configurations, which are then integrated with current foundation models to improve task understanding. The approach consists of two stages: offline training and online testing. During the offline training stage, we developed the Meta-RGate SynerFusion (MGSF) network for accurate therblig segmentation across various tasks. In the online testing stage, after a one-shot demonstration of a new task is collected, our MGSF network extracts high-level knowledge, which is then encoded into the image using Action Registration (ActionREG). Additionally, the Large Language Model (LLM)-Alignment Policy for Visual Correction (LAP-VC) is employed to ensure precise action execution, facilitating trajectory transfer in novel robot scenarios. Experimental results validate these methods, achieving 94.37% recall in therblig segmentation and success rates of 94.4% and 80% in real-world online robot testing for simple and complex scenarios, respectively. Supplementary material is available at: https://sites.google.com/view/therbligsbasedbackbone/home
△ Less
Submitted 7 August, 2024; v1 submitted 2 August, 2024;
originally announced August 2024.
-
The Hydra Hand: A Mode-Switching Underactuated Gripper with Precision and Power Grasping Modes
Authors:
Digby Chappell,
Fernando Bello,
Petar Kormushev,
Nicolas Rojas
Abstract:
Human hands are able to grasp a wide range of object sizes, shapes, and weights, achieved via reshaping and altering their apparent grasping stiffness between compliant power and rigid precision. Achieving similar versatility in robotic hands remains a challenge, which has often been addressed by adding extra controllable degrees of freedom, tactile sensors, or specialised extra grasping hardware,…
▽ More
Human hands are able to grasp a wide range of object sizes, shapes, and weights, achieved via reshaping and altering their apparent grasping stiffness between compliant power and rigid precision. Achieving similar versatility in robotic hands remains a challenge, which has often been addressed by adding extra controllable degrees of freedom, tactile sensors, or specialised extra grasping hardware, at the cost of control complexity and robustness. We introduce a novel reconfigurable four-fingered two-actuator underactuated gripper -- the Hydra Hand -- that switches between compliant power and rigid precision grasps using a single motor, while generating grasps via a single hydraulic actuator -- exhibiting adaptive grasping between finger pairs, enabling the power grasping of two objects simultaneously. The mode switching mechanism and the hand's kinematics are presented and analysed, and performance is tested on two grasping benchmarks: one focused on rigid objects, and the other on items of clothing. The Hydra Hand is shown to excel at grasping large and irregular objects, and small objects with its respective compliant power and rigid precision configurations. The hand's versatility is then showcased by executing the challenging manipulation task of safely grasping and placing a bunch of grapes, and then plucking a single grape from the bunch.
△ Less
Submitted 26 September, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
When and Where to Step: Terrain-Aware Real-Time Footstep Location and Timing Optimization for Bipedal Robots
Authors:
Ke Wang,
Zhaoyang Jacopo Hu,
Peter Tisnikar,
Oskar Helander,
Digby Chappell,
Petar Kormushev
Abstract:
Online footstep planning is essential for bipedal walking robots, allowing them to walk in the presence of disturbances and sensory noise. Most of the literature on the topic has focused on optimizing the footstep placement while keeping the step timing constant. In this work, we introduce a footstep planner capable of optimizing footstep placement and step time online. The proposed planner, consi…
▽ More
Online footstep planning is essential for bipedal walking robots, allowing them to walk in the presence of disturbances and sensory noise. Most of the literature on the topic has focused on optimizing the footstep placement while keeping the step timing constant. In this work, we introduce a footstep planner capable of optimizing footstep placement and step time online. The proposed planner, consisting of an Interior Point Optimizer (IPOPT) and an optimizer based on Augmented Lagrangian (AL) method with analytical gradient descent, solves the full dynamics of the Linear Inverted Pendulum (LIP) model in real time to optimize for footstep location as well as step timing at the rate of 200~Hz. We show that such asynchronous real-time optimization with the AL method (ARTO-AL) provides the required robustness and speed for successful online footstep planning. Furthermore, ARTO-AL can be extended to plan footsteps in 3D, allowing terrain-aware footstep planning on uneven terrains. Compared to an algorithm with no footstep time adaptation, our proposed ARTO-AL demonstrates increased stability in simulated walking experiments as it can resist pushes on flat ground and on a $10^{\circ}$ ramp up to 120 N and 100 N respectively. For the video, see https://youtu.be/ABdnvPqCUu4. For code, see https://github.com/WangKeAlchemist/ARTO-AL/tree/master.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion
Authors:
Vittorio La Barbera,
Fabio Pardo,
Yuval Tassa,
Monica Daley,
Christopher Richards,
Petar Kormushev,
John Hutchinson
Abstract:
Muscle-actuated control is a research topic that spans multiple domains, including biomechanics, neuroscience, reinforcement learning, robotics, and graphics. This type of control is particularly challenging as bodies are often overactuated and dynamics are delayed and non-linear. It is however a very well tested and tuned actuation mechanism that has undergone millions of years of evolution with…
▽ More
Muscle-actuated control is a research topic that spans multiple domains, including biomechanics, neuroscience, reinforcement learning, robotics, and graphics. This type of control is particularly challenging as bodies are often overactuated and dynamics are delayed and non-linear. It is however a very well tested and tuned actuation mechanism that has undergone millions of years of evolution with interesting properties exploiting passive forces and efficient energy storage of muscle-tendon units. To facilitate research on muscle-actuated simulation, we release a 3D musculoskeletal simulation of an ostrich based on the MuJoCo physics engine. The ostrich is one of the fastest bipeds on earth and therefore makes an excellent model for studying muscle-actuated bipedal locomotion. The model is based on CT scans and dissections used to collect actual muscle data, such as insertion sites, lengths, and pennation angles. Along with this model, we also provide a set of reinforcement learning tasks, including reference motion tracking, running, and neck control, used to infer muscle actuation patterns. The reference motion data is based on motion capture clips of various behaviors that we preprocessed and adapted to our model. This paper describes how the model was built and iteratively improved using the tasks. We also evaluate the accuracy of the muscle actuation patterns by comparing them to experimentally collected electromyographic data from locomoting birds. The results demonstrate the need for rich reward signals or regularization techniques to constrain muscle excitations and produce realistic movements. Overall, we believe that this work can provide a useful bridge between fields of research interested in muscle actuation.
△ Less
Submitted 24 May, 2022; v1 submitted 11 December, 2021;
originally announced December 2021.
-
Fast Online Optimization for Terrain-Blind Bipedal Robot Walking with a Decoupled Actuated SLIP Model
Authors:
Ke Wang,
Hengyi Fei,
Petar Kormushev
Abstract:
We present a highly reactive controller which enables bipedal robots to blindly walk over various kinds of uneven terrains while resisting pushes. The high level motion planner does fast online optimization for footstep locations and Center of Mass (CoM) height using the decoupled actuated Spring Loaded Inverted Pendulum (aSLIP) model. The decoupled aSLIP model simplifies the original aSLIP with L…
▽ More
We present a highly reactive controller which enables bipedal robots to blindly walk over various kinds of uneven terrains while resisting pushes. The high level motion planner does fast online optimization for footstep locations and Center of Mass (CoM) height using the decoupled actuated Spring Loaded Inverted Pendulum (aSLIP) model. The decoupled aSLIP model simplifies the original aSLIP with Linear Inverted Pendulum (LIP) dynamics in horizontal states and spring dynamics in the vertical state. The motion planning can be formulated as a discrete-time Model Predictive Control (MPC) and solved at a frequency of 1k~HZ. The output of the motion planner using a reduced-order model is fed into an inverse-dynamics based whole body controller for execution on the robot. A key result of this controller is that the foot of the robot is compliant, which further extends the robot's ability to be robust to unobserved terrain changes. We evaluate our method in simulation with the bipedal robot SLIDER. Results show the robot can blindly walk over various uneven terrains including slopes, wave fields and stairs. It can also resist pushes while walking on uneven terrain.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
A Unified Model with Inertia Shaping for Highly Dynamic Jumps of Legged Robots
Authors:
Ke Wang,
Guiyang Xin,
Songyan Xin,
Michael Mistry,
Sethu Vijayakumar,
Petar Kormushev
Abstract:
To achieve highly dynamic jumps of legged robots, it is essential to control the rotational dynamics of the robot. In this paper, we aim to improve the jumping performance by proposing a unified model for planning highly dynamic jumps that can approximately model the centroidal inertia. This model abstracts the robot as a single rigid body for the base and point masses for the legs. The model is c…
▽ More
To achieve highly dynamic jumps of legged robots, it is essential to control the rotational dynamics of the robot. In this paper, we aim to improve the jumping performance by proposing a unified model for planning highly dynamic jumps that can approximately model the centroidal inertia. This model abstracts the robot as a single rigid body for the base and point masses for the legs. The model is called the Lump Leg Single Rigid Body Model (LL-SRBM) and can be used to plan motions for both bipedal and quadrupedal robots. By taking the effects of leg dynamics into account, LL-SRBM provides a computationally efficient way for the motion planner to change the centroidal inertia of the robot with various leg configurations. Concurrently, we propose a novel contact detection method by using the norm of the average spatial velocity. After the contact is detected, the controller is switched to force control to achieve a soft landing. Twisting jump and forward jump experiments on the bipedal robot SLIDER and quadrupedal robot ANYmal demonstrate the improved jump performance by actively changing the centroidal inertia. These experiments also show the generalization and the robustness of the integrated planning and control framework.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution
Authors:
Nemanja Rakicevic,
Antoine Cully,
Petar Kormushev
Abstract:
Neuroevolution is an alternative to gradient-based optimisation that has the potential to avoid local minima and allows parallelisation. The main limiting factor is that usually it does not scale well with parameter space dimensionality. Inspired by recent work examining neural network intrinsic dimension and loss landscapes, we hypothesise that there exists a low-dimensional manifold, embedded in…
▽ More
Neuroevolution is an alternative to gradient-based optimisation that has the potential to avoid local minima and allows parallelisation. The main limiting factor is that usually it does not scale well with parameter space dimensionality. Inspired by recent work examining neural network intrinsic dimension and loss landscapes, we hypothesise that there exists a low-dimensional manifold, embedded in the policy network parameter space, around which a high-density of diverse and useful policies are located. This paper proposes a novel method for diversity-based policy search via Neuroevolution, that leverages learned representations of the policy network parameters, by performing policy search in this learned representation space. Our method relies on the Quality-Diversity (QD) framework which provides a principled approach to policy search, and maintains a collection of diverse policies, used as a dataset for learning policy representations. Further, we use the Jacobian of the inverse-mapping function to guide the search in the representation space. This ensures that the generated samples remain in the high-density regions, after mapping back to the original space. Finally, we evaluate our contributions on four continuous-control tasks in simulated environments, and compare to diversity-based baselines.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Policy Manifold Search for Improving Diversity-based Neuroevolution
Authors:
Nemanja Rakicevic,
Antoine Cully,
Petar Kormushev
Abstract:
Diversity-based approaches have recently gained popularity as an alternative paradigm to performance-based policy search. A popular approach from this family, Quality-Diversity (QD), maintains a collection of high-performing policies separated in the diversity-metric space, defined based on policies' rollout behaviours. When policies are parameterised as neural networks, i.e. Neuroevolution, QD te…
▽ More
Diversity-based approaches have recently gained popularity as an alternative paradigm to performance-based policy search. A popular approach from this family, Quality-Diversity (QD), maintains a collection of high-performing policies separated in the diversity-metric space, defined based on policies' rollout behaviours. When policies are parameterised as neural networks, i.e. Neuroevolution, QD tends to not scale well with parameter space dimensionality. Our hypothesis is that there exists a low-dimensional manifold embedded in the policy parameter space, containing a high density of diverse and feasible policies. We propose a novel approach to diversity-based policy search via Neuroevolution, that leverages learned latent representations of the policy parameters which capture the local structure of the data. Our approach iteratively collects policies according to the QD framework, in order to (i) build a collection of diverse policies, (ii) use it to learn a latent representation of the policy parameters, (iii) perform policy search in the learned latent space. We use the Jacobian of the inverse transformation (i.e.reconstruction function) to guide the search in the latent space. This ensures that the generated samples remain in the high-density regions of the original space, after reconstruction. We evaluate our contributions on three continuous control tasks in simulated environments, and compare to diversity-based baselines. The findings suggest that our approach yields a more efficient and robust policy search process.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
Learning to Represent Action Values as a Hypergraph on the Action Vertices
Authors:
Arash Tavakoli,
Mehdi Fatemi,
Petar Kormushev
Abstract:
Action-value estimation is a critical component of many reinforcement learning (RL) methods whereby sample complexity relies heavily on how fast a good estimator for action value can be learned. By viewing this problem through the lens of representation learning, good representations of both state and action can facilitate action-value estimation. While advances in deep learning have seamlessly dr…
▽ More
Action-value estimation is a critical component of many reinforcement learning (RL) methods whereby sample complexity relies heavily on how fast a good estimator for action value can be learned. By viewing this problem through the lens of representation learning, good representations of both state and action can facilitate action-value estimation. While advances in deep learning have seamlessly driven progress in learning state representations, given the specificity of the notion of agency to RL, little attention has been paid to learning action representations. We conjecture that leveraging the combinatorial structure of multi-dimensional action spaces is a key ingredient for learning good representations of action. To test this, we set forth the action hypergraph networks framework -- a class of functions for learning action representations in multi-dimensional discrete action spaces with a structural inductive bias. Using this framework we realise an agent class based on a combination with deep Q-networks, which we dub hypergraph Q-networks. We show the effectiveness of our approach on a myriad of domains: illustrative prediction problems under minimal confounding effects, Atari 2600 games, and discretised physical control benchmarks.
△ Less
Submitted 20 June, 2021; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Asynchronous Real-Time Optimization of Footstep Placement and Timing in Bipedal Walking Robots
Authors:
Digby Chappell,
Ke Wang,
Petar Kormushev
Abstract:
Online footstep planning is essential for bipedal walking robots to be able to walk in the presence of disturbances. Until recently this has been achieved by only optimizing the placement of the footstep, keeping the duration of the step constant. In this paper we introduce a footstep planner capable of optimizing footstep placement and timing in real-time by asynchronously combining two optimizer…
▽ More
Online footstep planning is essential for bipedal walking robots to be able to walk in the presence of disturbances. Until recently this has been achieved by only optimizing the placement of the footstep, keeping the duration of the step constant. In this paper we introduce a footstep planner capable of optimizing footstep placement and timing in real-time by asynchronously combining two optimizers, which we refer to as asynchronous real-time optimization (ARTO). The first optimizer which runs at approximately 25 Hz, utilizes a fourth-order Runge-Kutta (RK4) method to accurately approximate the dynamics of the linear inverted pendulum (LIP) model for bipedal walking, then uses non-linear optimization to find optimal footsteps and duration at a lower frequency. The second optimizer that runs at approximately 250 Hz, uses analytical gradients derived from the full dynamics of the LIP model and constraint penalty terms to perform gradient descent, which finds approximately optimal footstep placement and timing at a higher frequency. By combining the two optimizers asynchronously, ARTO has the benefits of fast reactions to disturbances from the gradient descent optimizer, accurate solutions that avoid local optima from the RK4 optimizer, and increases the probability that a feasible solution will be found from the two optimizers. Experimentally, we show that ARTO is able to recover from considerably larger pushes and produces feasible solutions to larger reference velocity changes than a standard footstep location optimizer, and outperforms using just the RK4 optimizer alone.
△ Less
Submitted 2 July, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Sim-to-Real Learning for Casualty Detection from Ground Projected Point Cloud Data
Authors:
Roni Permana Saputra,
Nemanja Rakicevic,
Petar Kormushev
Abstract:
This paper addresses the problem of human body detection---particularly a human body lying on the ground (a.k.a. casualty)---using point cloud data. This ability to detect a casualty is one of the most important features of mobile rescue robots, in order for them to be able to operate autonomously. We propose a deep-learning-based casualty detection method using a deep convolutional neural network…
▽ More
This paper addresses the problem of human body detection---particularly a human body lying on the ground (a.k.a. casualty)---using point cloud data. This ability to detect a casualty is one of the most important features of mobile rescue robots, in order for them to be able to operate autonomously. We propose a deep-learning-based casualty detection method using a deep convolutional neural network (CNN). This network is trained to be able to detect a casualty using a point-cloud data input. In the method we propose, the point cloud input is pre-processed to generate a depth image-like ground-projected heightmap. This heightmap is generated based on the projected distance of each point onto the detected ground plane within the point cloud data. The generated heightmap -- in image form -- is then used as an input for the CNN to detect a human body lying on the ground. To train the neural network, we propose a novel sim-to-real approach, in which the network model is trained using synthetic data obtained in simulation and then tested on real sensor data. To make the model transferable to real data implementations, during the training we adopt specific data augmentation strategies with the synthetic training data. The experimental results show that data augmentation introduced during the training process is essential for improving the performance of the trained model on real data. More specifically, the results demonstrate that the data augmentations on raw point-cloud data have contributed to a considerable improvement of the trained model performance.
△ Less
Submitted 9 August, 2019; v1 submitted 8 August, 2019;
originally announced August 2019.
-
ResQbot: A Mobile Rescue Robot with Immersive Teleperception for Casualty Extraction
Authors:
Roni Permana Saputra,
Petar Kormushev
Abstract:
In this work, we propose a novel mobile rescue robot equipped with an immersive stereoscopic teleperception and a teleoperation control. This robot is designed with the capability to perform safely a casualty-extraction procedure. We have built a proof-of-concept mobile rescue robot called ResQbot for the experimental platform. An approach called "loco-manipulation" is used to perform the casualty…
▽ More
In this work, we propose a novel mobile rescue robot equipped with an immersive stereoscopic teleperception and a teleoperation control. This robot is designed with the capability to perform safely a casualty-extraction procedure. We have built a proof-of-concept mobile rescue robot called ResQbot for the experimental platform. An approach called "loco-manipulation" is used to perform the casualty-extraction procedure using the platform. The performance of this robot is evaluated in terms of task accomplishment and safety by conducting a mock rescue experiment. We use a custom-made human-sized dummy that has been sensorised to be used as the casualty. In terms of safety, we observe several parameters during the experiment including impact force, acceleration, speed and displacement of the dummy's head. We also compare the performance of the proposed immersive stereoscopic teleperception to conventional monocular teleperception. The results of the experiments show that the observed safety parameters are below key safety thresholds which could possibly lead to head or neck injuries. Moreover, the teleperception comparison results demonstrate an improvement in task-accomplishment performance when the operator is using the immersive teleperception.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Casualty Detection from 3D Point Cloud Data for Autonomous Ground Mobile Rescue Robots
Authors:
Roni Permana Saputra,
Petar Kormushev
Abstract:
One of the most important features of mobile rescue robots is the ability to autonomously detect casualties, i.e. human bodies, which are usually lying on the ground. This paper proposes a novel method for autonomously detecting casualties lying on the ground using obtained 3D point-cloud data from an on-board sensor, such as an RGB-D camera or a 3D LIDAR, on a mobile rescue robot. In this method,…
▽ More
One of the most important features of mobile rescue robots is the ability to autonomously detect casualties, i.e. human bodies, which are usually lying on the ground. This paper proposes a novel method for autonomously detecting casualties lying on the ground using obtained 3D point-cloud data from an on-board sensor, such as an RGB-D camera or a 3D LIDAR, on a mobile rescue robot. In this method, the obtained 3D point-cloud data is projected onto the detected ground plane, i.e. floor, within the point cloud. Then, this projected point cloud is converted into a grid-map that is used afterwards as an input for the algorithm to detect human body shapes. The proposed method is evaluated by performing detection of a human dummy, placed in different random positions and orientations, using an on-board RGB-D camera on a mobile rescue robot called ResQbot. To evaluate the robustness of the casualty detection method to different camera angles, the orientation of the camera is set to different angles. The experimental results show that using the point-cloud data from the on-board RGB-D camera, the proposed method successfully detects the casualty in all tested body positions and orientations relative to the on-board camera, as well as in all tested camera angles.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Exploring Restart Distributions
Authors:
Arash Tavakoli,
Vitaly Levdik,
Riashat Islam,
Christopher M. Smith,
Petar Kormushev
Abstract:
We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states as…
▽ More
We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states associated with significant past experiences. This approach is compatible with both on-policy and off-policy methods. However, a caveat is that altering the distribution of initial states could change the optimal policies when searching within a restricted class of policies. To reduce this unsought learning bias, we evaluate our approach in deep reinforcement learning which benefits from the high representational capacity of deep neural networks. We instantiate three variants of our approach, each inspired by an idea in the context of experience replay. Using these variants, we show that performance gains can be achieved, especially in hard exploration problems.
△ Less
Submitted 17 August, 2020; v1 submitted 27 November, 2018;
originally announced November 2018.
-
Human-centered manipulation and navigation with Robot DE NIRO
Authors:
Fabian Falck,
Sagar Doshi,
Nico Smuts,
John Lingi,
Kim Rants,
Petar Kormushev
Abstract:
Social assistance robots in health and elderly care have the potential to support and ease human lives. Given the macrosocial trends of aging and long-lived populations, robotics-based care research mainly focused on helping the elderly live independently. In this paper, we introduce Robot DE NIRO, a research platform that aims to support the supporter (the caregiver) and also offers direct human-…
▽ More
Social assistance robots in health and elderly care have the potential to support and ease human lives. Given the macrosocial trends of aging and long-lived populations, robotics-based care research mainly focused on helping the elderly live independently. In this paper, we introduce Robot DE NIRO, a research platform that aims to support the supporter (the caregiver) and also offers direct human-robot interaction for the care recipient. Augmented by several sensors, DE NIRO is capable of complex manipulation tasks. It reliably interacts with humans and can autonomously and swiftly navigate through dynamically changing environments. We describe preliminary experiments in a demonstrative scenario and discuss DE NIRO's design and capabilities. We put particular emphases on safe, human-centered interaction procedures implemented in both hardware and software, including collision avoidance in manipulation and navigation as well as an intuitive perception stack through speech and face recognition.
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks
Authors:
Fabio Pardo,
Vitaly Levdik,
Petar Kormushev
Abstract:
Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular…
▽ More
Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in epsilon-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge and Super Mario All-Stars games.
△ Less
Submitted 4 February, 2020; v1 submitted 5 October, 2018;
originally announced October 2018.
-
Goal-oriented Trajectories for Efficient Exploration
Authors:
Fabio Pardo,
Vitaly Levdik,
Petar Kormushev
Abstract:
Exploration is a difficult challenge in reinforcement learning and even recent state-of-the art curiosity-based methods rely on the simple epsilon-greedy strategy to generate novelty. We argue that pure random walks do not succeed to properly expand the exploration area in most environments and propose to replace single random action choices by random goals selection followed by several steps in t…
▽ More
Exploration is a difficult challenge in reinforcement learning and even recent state-of-the art curiosity-based methods rely on the simple epsilon-greedy strategy to generate novelty. We argue that pure random walks do not succeed to properly expand the exploration area in most environments and propose to replace single random action choices by random goals selection followed by several steps in their direction. This approach is compatible with any curiosity-based exploration and off-policy reinforcement learning agents and generates longer and safer trajectories than individual random actions. To illustrate this, we present a task-independent agent that learns to reach coordinates in screen frames and demonstrate its ability to explore with the game Super Mario Bros. improving significantly the score of a baseline DQN agent.
△ Less
Submitted 5 July, 2018;
originally announced July 2018.
-
Time Limits in Reinforcement Learning
Authors:
Fabio Pardo,
Arash Tavakoli,
Vitaly Levdik,
Petar Kormushev
Abstract:
In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this…
▽ More
In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we provide a formal account for how time limits could effectively be handled in each of the two cases and explain why not doing so can cause state aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. In case (i), we argue that the terminations due to time limits are in fact part of the environment, and thus a notion of the remaining time should be included as part of the agent's input to avoid violation of the Markov property. In case (ii), the time limits are not part of the environment and are only used to facilitate learning. We argue that this insight should be incorporated by bootstrapping from the value of the state at the end of each partial episode. For both cases, we illustrate empirically the significance of our considerations in improving the performance and stability of existing reinforcement learning algorithms, showing state-of-the-art results on several control tasks.
△ Less
Submitted 27 January, 2022; v1 submitted 1 December, 2017;
originally announced December 2017.
-
Action Branching Architectures for Deep Reinforcement Learning
Authors:
Arash Tavakoli,
Fabio Pardo,
Petar Kormushev
Abstract:
Discrete-action algorithms have been central to numerous recent successes of deep reinforcement learning. However, applying these algorithms to high-dimensional action tasks requires tackling the combinatorial increase of the number of possible actions with the number of action dimensions. This problem is further exacerbated for continuous-action tasks that require fine control of actions via disc…
▽ More
Discrete-action algorithms have been central to numerous recent successes of deep reinforcement learning. However, applying these algorithms to high-dimensional action tasks requires tackling the combinatorial increase of the number of possible actions with the number of action dimensions. This problem is further exacerbated for continuous-action tasks that require fine control of actions via discretization. In this paper, we propose a novel neural architecture featuring a shared decision module followed by several network branches, one for each action dimension. This approach achieves a linear increase of the number of network outputs with the number of degrees of freedom by allowing a level of independence for each individual action dimension. To illustrate the approach, we present a novel agent, called Branching Dueling Q-Network (BDQ), as a branching variant of the Dueling Double Deep Q-Network (Dueling DDQN). We evaluate the performance of our agent on a set of challenging continuous control tasks. The empirical results show that the proposed agent scales gracefully to environments with increasing action dimensionality and indicate the significance of the shared decision module in coordination of the distributed action branches. Furthermore, we show that the proposed agent performs competitively against a state-of-the-art continuous control algorithm, Deep Deterministic Policy Gradient (DDPG).
△ Less
Submitted 24 January, 2019; v1 submitted 24 November, 2017;
originally announced November 2017.
-
Visuospatial Skill Learning for Robots
Authors:
S. Reza Ahmadzadeh,
Fulvio Mastrogiovanni,
Petar Kormushev
Abstract:
A novel skill learning approach is proposed that allows a robot to acquire human-like visuospatial skills for object manipulation tasks. Visuospatial skills are attained by observing spatial relationships among objects through demonstrations. The proposed Visuospatial Skill Learning (VSL) is a goal-based approach that focuses on achieving a desired goal configuration of objects relative to one ano…
▽ More
A novel skill learning approach is proposed that allows a robot to acquire human-like visuospatial skills for object manipulation tasks. Visuospatial skills are attained by observing spatial relationships among objects through demonstrations. The proposed Visuospatial Skill Learning (VSL) is a goal-based approach that focuses on achieving a desired goal configuration of objects relative to one another while maintaining the sequence of operations. VSL is capable of learning and generalizing multi-operation skills from a single demonstration, while requiring minimum prior knowledge about the objects and the environment. In contrast to many existing approaches, VSL offers simplicity, efficiency and user-friendly human-robot interaction. We also show that VSL can be easily extended towards 3D object manipulation tasks, simply by employing point cloud processing techniques. In addition, a robot learning framework, VSL-SP, is proposed by integrating VSL, Imitation Learning, and a conventional planning method. In VSL-SP, the sequence of performed actions are learned using VSL, while the sensorimotor skills are learned using a conventional trajectory-based learning approach. such integration easily extends robot capabilities to novel situations, even by users without programming ability. In VSL-SP the internal planner of VSL is integrated with an existing action-level symbolic planner. Using the underlying constraints of the task and extracted symbolic predicates, identified by VSL, symbolic representation of the task is updated. Therefore the planner maintains a generalized representation of each skill as a reusable action, which can be used in planning and performed independently during the learning phase. The proposed approach is validated through several real-world experiments.
△ Less
Submitted 3 June, 2017;
originally announced June 2017.
-
Intent expression using eye robot for mascot robot system
Authors:
Yoichi Yamazaki,
Fangyan Dong,
Yuta Masuda,
Yukiko Uehara,
Petar Kormushev,
Hai An Vu,
Phuc Quang Le,
Kaoru Hirota
Abstract:
An intent expression system using eye robots is proposed for a mascot robot system from a viewpoint of humatronics. The eye robot aims at providing a basic interface method for an information terminal robot system. To achieve better understanding of the displayed information, the importance and the degree of certainty of the information should be communicated along with the main content. The pro…
▽ More
An intent expression system using eye robots is proposed for a mascot robot system from a viewpoint of humatronics. The eye robot aims at providing a basic interface method for an information terminal robot system. To achieve better understanding of the displayed information, the importance and the degree of certainty of the information should be communicated along with the main content. The proposed intent expression system aims at conveying this additional information using the eye robot system. Eye motions are represented as the states in a pleasure-arousal space model. Changes in the model state are calculated by fuzzy inference according to the importance and degree of certainty of the displayed information. These changes influence the arousal-sleep coordinates in the space that corresponds to levels of liveliness during communication. The eye robot provides a basic interface for the mascot robot system that is easy to be understood as an information terminal for home environments in a humatronics society.
△ Less
Submitted 9 April, 2009;
originally announced April 2009.
-
Fuzzy inference based mentality estimation for eye robot agent
Authors:
Yoichi Yamazaki,
Fangyan Dong,
Yuta Masuda,
Yukiko Uehara,
Petar Kormushev,
Hai An Vu,
Phuc Quang Le,
Kaoru Hirota
Abstract:
Household robots need to communicate with human beings in a friendly fashion. To achieve better understanding of displayed information, an importance and a certainty of the information should be communicated together with the main information. The proposed intent expression system aims to convey this additional information using an eye robot. The eye motions are represented as states in a pleasu…
▽ More
Household robots need to communicate with human beings in a friendly fashion. To achieve better understanding of displayed information, an importance and a certainty of the information should be communicated together with the main information. The proposed intent expression system aims to convey this additional information using an eye robot. The eye motions are represented as states in a pleasure-arousal space model. Change of the model state is calculated by fuzzy inference according to the importance and certainty of the displayed information. This change influences the arousal-sleep coordinate in the space which corresponds to activeness in communication. The eye robot provides a basic interface for the mascot robot system which is an easy to understand information terminal for home environments in a humatronics society.
△ Less
Submitted 9 April, 2009;
originally announced April 2009.
-
Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning
Authors:
Petar Kormushev,
Kohei Nomoto,
Fangyan Dong,
Kaoru Hirota
Abstract:
A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph.…
▽ More
A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph. Experiments on a simulated biped crawling robot confirm that Eligibility Propagation accelerates the learning process more than 3 times.
△ Less
Submitted 3 April, 2009;
originally announced April 2009.
-
Time Hopping technique for faster reinforcement learning in simulations
Authors:
Petar Kormushev,
Kohei Nomoto,
Fangyan Dong,
Kaoru Hirota
Abstract:
This preprint has been withdrawn by the author for revision
This preprint has been withdrawn by the author for revision
△ Less
Submitted 6 September, 2011; v1 submitted 3 April, 2009;
originally announced April 2009.
-
Visual approach for data mining on medical information databases using Fastmap algorithm
Authors:
Petar Kormushev
Abstract:
The rapid development of tools for acquisition and storage of information has lead to the formation of enormous medical databases. The large quantity of data definitely surpasses the abilities of humans for efficient usage without specialized tools for analysis. The situation is described as rich in data, but poor in information. In order to fill this growing gap, different approaches from the f…
▽ More
The rapid development of tools for acquisition and storage of information has lead to the formation of enormous medical databases. The large quantity of data definitely surpasses the abilities of humans for efficient usage without specialized tools for analysis. The situation is described as rich in data, but poor in information. In order to fill this growing gap, different approaches from the field of Data Mining are applied. These methods perform analysis of large sets of observed data in order to find new dependencies or concise representation of the data, which is more meaningful to humans. One of the possible approaches for discovery of dependencies is the visual approach, in which data is processed and visualized in a way suitable for analysis by a domain expert. This work proposes a visual approach, in which data is processed and visualized in a way suitable for analysis by a domain expert. We design and implement a software solution for visualization of multi-dimensional, classified medical data using the FastMap algorithm for graduate reduction of dimensions. The implementation of the graphical user interface is described in detail since it is the most important factor for the ease of use of these tools by non-professionals in data mining.
△ Less
Submitted 2 April, 2009;
originally announced April 2009.
-
Design, development and implementation of a tool for construction of declarative functional descriptions of semantic web services based on WSMO methodology
Authors:
Petar Kormushev
Abstract:
Semantic web services (SWS) are self-contained, self-describing, semantically marked-up software resources that can be published, discovered, composed and executed across the Web in a semi-automatic way. They are a key component of the future Semantic Web, in which networked computer programs become providers and users of information at the same time. This work focuses on developing a full-life-…
▽ More
Semantic web services (SWS) are self-contained, self-describing, semantically marked-up software resources that can be published, discovered, composed and executed across the Web in a semi-automatic way. They are a key component of the future Semantic Web, in which networked computer programs become providers and users of information at the same time. This work focuses on developing a full-life-cycle software toolset for creating and maintaining Semantic Web Services (SWSs) based on the Web Service Modelling Ontology (WSMO) framework. A main part of WSMO-based SWS is service capability - a declarative description of Web service functionality. A formal syntax and semantics for such a description is provided by Web Service Modeling Language (WSML), which is based on different logical formalisms, namely, Description Logics, First-Order Logic and Logic Programming. A WSML description of a Web service capability is represented as a set of complex logical expressions (axioms). We develop a specialized user-friendly tool for constructing and editing WSMO-based SWS capabilities. Since the users of this tool are not specialists in first-order logic, a graphical way for constricting and editing axioms is proposed. The designed process for constructing logical expressions is ontology-driven, which abstracts away as much as possible from any concrete syntax of logical language. We propose several mechanisms to guarantees the semantic consistency of the produced logical expressions. The tool is implemented in Java using Eclipse for IDE and GEF (Graphical Editing Framework) for visualization.
△ Less
Submitted 2 April, 2009;
originally announced April 2009.
-
INFRAWEBS axiom editor - a graphical ontology-driven tool for creating complex logical expressions
Authors:
Gennady Agre,
Petar Kormushev,
Ivan Dilov
Abstract:
The current INFRAWEBS European research project aims at developing ICT framework enabling software and service providers to generate and establish open and extensible development platforms for Web Service applications. One of the concrete project objectives is developing a full-life-cycle software toolset for creating and maintaining Semantic Web Services (SWSs) supporting specific applications ba…
▽ More
The current INFRAWEBS European research project aims at developing ICT framework enabling software and service providers to generate and establish open and extensible development platforms for Web Service applications. One of the concrete project objectives is developing a full-life-cycle software toolset for creating and maintaining Semantic Web Services (SWSs) supporting specific applications based on Web Service Modelling Ontology (WSMO) framework. According to WSMO, functional and behavioural descriptions of a SWS may be represented by means of complex logical expressions (axioms). The paper describes a specialized user-friendly tool for constructing and editing such axioms - INFRAWEBS Axiom Editor. After discussing the main design principles of the Editor, its functional architecture is briefly presented. The tool is implemented in Eclipse Graphical Environment Framework and Eclipse Rich Client Platform.
△ Less
Submitted 7 January, 2012; v1 submitted 1 April, 2009;
originally announced April 2009.
-
Time manipulation technique for speeding up reinforcement learning in simulations
Authors:
Petar Kormushev,
Kohei Nomoto,
Fangyan Dong,
Kaoru Hirota
Abstract:
A technique for speeding up reinforcement learning algorithms by using time manipulation is proposed. It is applicable to failure-avoidance control problems running in a computer simulation. Turning the time of the simulation backwards on failure events is shown to speed up the learning by 260% and improve the state space exploration by 12% on the cart-pole balancing task, compared to the conven…
▽ More
A technique for speeding up reinforcement learning algorithms by using time manipulation is proposed. It is applicable to failure-avoidance control problems running in a computer simulation. Turning the time of the simulation backwards on failure events is shown to speed up the learning by 260% and improve the state space exploration by 12% on the cart-pole balancing task, compared to the conventional Q-learning and Actor-Critic algorithms.
△ Less
Submitted 27 March, 2009;
originally announced March 2009.