Soft Contact Simulation and Manipulation Learning of Deformable Objects with Vision-based Tactile Sensor
Abstract
Deformable object manipulation is a challenging problem due to its complex deformable properties. With the development of artificial intelligence, learning-based methods have shown outstanding performance in the manipulation of robots. Previous works have investigated the manipulation of deformable objects via Reinforcement Learning (RL) in simulation. However, they approximate object deformation with particles, and particle states are employed as observations, which are not available in reality. To address these issues, we propose a novel approach utilizing Vision-Based Tactile Sensors (VBTSs) as the end-effector in simulation to produce observations like relative position, squeezed area, and object contour, which are transferable to real robots. However, this makes contact simulation more complex due to the gel layer of VBTS is also a deformable object. Existing simulation methods of the vision-based tactile sensor can only simulate elastic deformation, while the simulation of plastic and elastoplastic deformation is poor. This must be overcome for a more realistic contact simulation between deformable objects. In this work, we build a new contact simulation environment for deformable objects including elastic, plastic, and elastoplastic deformation. We utilize RL strategies to train agents in the simulation, and expert demonstrations are applied for challenging tasks. To achieve simulation-to-real-world (sim-to-real) transfer, transferable observations like relative position, squeezed area, and object contour are applied in RL training. Also, we build a real experimental platform, including a VBTS, to complete the sim-to-real work and robustness testing. Leveraging the developed simulation and real experiment setup, a benchmark has been created for contact simulation and manipulation learning of deformable objects with VBTS. Our work innovatively proposes a strategy that employs high-resolution VBTSs in contact simulation and manipulation of deformable objects. We achieve a success rate on difficult tasks such as cylinder and sphere. The experimental results show superior performances of deformable object manipulation with the proposed method.
Index Terms:
Vision-based tactile sensors, Deformable objects, Contact simulation, Manipulation learning.I Introduction
Deformable object manipulation is a classical and challenging research area in robotics. Compared with rigid object manipulation, this problem is more complex due to the deformation properties including elastic, plastic, and elastoplastic deformation. Considerable degrees of freedom (DOFs) require a complex modeling method, and various reactions to applied forces lead to unpredictable deformation and motion. Meanwhile, deformable objects widely exist in hospital, industrial, and domestic environments like dressing assistance, cable harnessing, fruit harvesting and suturing [1]. In this case, deformable object manipulation plays an essential role in robotics development.
Learning-based methods have achieved success in robotic manipulation[2]. Reinforcement learning (RL) is an effective method for sophisticated tasks. This method enables an agent to learn how to utilize inputs called observations and improve specified values called rewards during interaction with the environment. The agent is desired to implement proper actions according to the agent and environment states to fulfill a task. Hence, this method is suitable for complex tasks and unknown environments. Various policies have been proposed, such as SARSA, Q-learning, DQN [3], and TD3[4]. RL has an innate appeal to robotics research due to its ability to learn from interaction, and it has been applied to rigid object manipulation tasks like surface following [5], typing [6], and swing-up manipulation [7]. The complex properties of deformable objects increase the difficulty of simulation, which limits the application of RL. To simulate deformable objects, Huang et al.[8] employed many particles to represent the deformable object, i.e., plasticine, and characterize its deformation. However, particle positions and velocities of the deformable objects were applied as the observations, which are impossible to collect in the real world. Consequently, using the RL to manipulate deformable objects in the simulation and transfer them to the real remains a challenge.
To address these issues, we use vision-based tactile sensors (VBTSs) as the end-effector instead of a rigid end-effector. The sensor contains a soft gel layer interacting with other objects. Thanks to the camera module and gel layer, VBTS can capture the deformation states of the gel layer with high-resolution[9]. At the same time, based on the gel deformation captured by the camera, the sensor can provide observations such as relative position, squeezed area, and object contour. The VBTSs such as GelSight [10], TactTip [11], and [12, 13], have been used in perception tasks like texture recognition [14], fruit hardness evaluation [15], and fossil texture detection[13]. These sensors can also be applied in manipulation tasks like pushing [16] and cable manipulation [17]. However, the deformation of the gel layer in contact with the deformable objects raises the difficulty of the simulation due to the gel layer and deformable objects are both elastoplastic. This requires a reliable simulation method.
Simulation methods of deformable objects such as the Finite Element Method (FEM)[18], key points[19] and the Moving Least Squares Material Point Method (MLS-MPM)[20] have been raised. To predict the state of the deformable object, Chen et al.[21, 22] used the MLS-MPM as a deformation prediction approach and designed a simulation environment, but it only considers the elastic deformability of the gel layer. Consequently, it is suitable for the interaction between VBTS and rigid objects. However, the interaction between VBTS and deformable objects is more complex due to the unknown properties of deformable objects, which may be elastic, plastic, or elastoplastic.
To address the task of manipulating deformable objects, we establish a system shown in Figure 1. We build a new simulation environment for contact simulation. This simulation environment can simulate elastic, plastic, and elastoplastic deformation. The VBTS is applied as the end-effector. MLS-MPM is applied for deformable objects and VBTS simulation, and the VBTS can provide transferable observations for deformable object manipulation.
Furthermore, an RL benchmark is built for deformable object manipulation in the simulation environment. Classical deformable object manipulation tasks with different difficulty levels named position control, squeeze, cylinder, and sphere are included. The TD3 strategy is used for simple tasks such as position control and squeeze. Considering the complexity of the motion, expert demonstration strategies are used for challenging tasks such as cylinder and sphere.
Finally, we build an experimental platform in reality for sim-to-real. VBTS is used to generate transferable observations and manipulate objects. The models trained in the simulation environment are applied to test corresponding tasks. Meanwhile, we use deformable objects with different hardness and sizes for robustness experiments.
In this paper, we make the following contributions:
1) We develop a simulation environment for the contact of deformable objects including elastic, plastic, and elastoplastic deformation. As far as we know, this is the first contact simulation for vision-based tactile sensors and deformable objects that utilizes physical deformation simulation.
2) We innovatively propose to introduce high-resolution VBTS into contact simulation and manipulation of deformable objects. Transferable observations such as relative position, squeezed area, and object contour are leveraged by introducing VBTSs. An RL benchmark has been built for deformable object manipulation, including transferable observations, tasks, and learning from demonstration strategies.
3) We build the corresponding experimental platform and complete the sim-to-real work. The results of sim-to-real demonstrate that training in the contact simulation and transfer to reality to manipulate deformable objects by VBTS is reliable. This is a new benchmark for deformable object manipulation.
This paper is structured as follows: Section II includes works related to deformable objects simulation and robot reinforcement learning. Section III introduces the soft contact simulation environment. Section IV describes the manipulation learning of deformable objects. Section V describes the experimental results, including the deformation parameter effect, RL performance, the results of sim-to-real, and the robustness experiments. Section VI summarizes the conclusions and future work for this work.
II RELATED WORK
II-A Deformable Objects Simulation
Deformable objects such as plasticine and dough are challenging to manipulate due to their deformation properties. Huang et al.[8] employed the MLS-MPM and von Mises yield criterion [23] for deformable object simulation, and RL was utilized for manipulation policy design. Li et al.[24] aimed to find the best contact point and applied a manipulation policy based on contact point discovery. Such a method could overcome the local minima and perform well on complex multi-stage tasks. Chen et al.[21] used the MLS-MPM to complete a simulator of the interaction between optical tactile sensors and rigid objects. Although the existing works provided realistic environments for contact simulation of deformable objects, particle positions, and velocities are employed as the observations for control policies, which are not available in reality. From sensors like cameras or tactile sensors to acquire transferable observations, we use VBTSs as the end-effector with MLS-MPM for the gel layer and deformable objects interaction. The sensors can provide observations like relative position, squeezed area, and object contour displacements, which can be collected in reality.
II-B Robot Reinforcement Learning
RL is widely applied in robot research to enable robots to obtain specific skills during environmental interaction. Matl et al.[25] applied model-based reinforcement learning to manipulate dough with a soft end-effector in reality. Church et al.[6] utilized a VBTS, TacTip, to type on a braille keyboard. The marker motion during a press was the observation, and the robot learned to press a specified key. Church et al.[26] implemented sim-to-real policy transfer for surface following and manipulation tasks with TacTip. The RL policy was trained in simulation, and a generative adversarial network (GAN) was used to generate simulation tactile images based on the corresponding real tactile images, which were observations in this work. In this case, the RL policy in the simulation could be transferred to the real world. Zhao et al.[27] created a new tendon-connected multi-functional optical tactile sensor, MechTac, for object perception in the field of view (TacTip) and location of touching points in the blind area of vision (TacSide). The use of a new binarized convolutional layer greatly improved the prediction efficiency of pictures. Bi et al.[7] trained an RL network for aggressive swing-up manipulation. To utilize complex observations, an RL network with simple observations was trained as an expert first, while the other network applying complex observations learned to imitate that network. Expert demonstrations were applied in [28], and the trade-off between exploring the environment online and using expert guidance was balanced. Si et al.[29] used tactile sensors to achieve a stable grip and achieved good results. Inspired by previous RL works, we create a simulated deformable objects manipulation benchmark. This benchmark includes classical human manipulations of deformable objects. Based on the transferable observations, these tasks are achieved with a proper reward design. Demonstration learning strategies are also employed for sophisticated tasks.
III SOFT CONTACT SIMULATION
In this article, we propose an improved soft contact simulation environment based on our previous work [21]. The simulation environment can simulate the contact between VBTS and deformable objects including elastic, plastic, and elastoplastic deformation. The deficiencies of previous work and the design goals of the new contact simulation environment are introduced in Subsection III-A. Subsection III-B introduces the methods of our new soft contact simulation environment.
III-A Deficiencies and design goals
In our previous work, we only consider the elastic deformation of the gel layer which is considered elastic in most cases. Hence, the simulation environment in our previous work is adapted to the contact between the VBTS and rigid objects. However, the deformation properties of deformable objects are more complicated due to the elastoplastic. Deformable objects may exhibit elastic, plastic, or elastoplastic deformation when deformed, depending on the deformation properties. Elastic deformation occurs when an object deforms and springs back to its initial shape like silicone gel and rubber. Plastic deformation occurs when an object deforms and does not spring back like sand and snow. Elastoplastic deformation occurs when an object deforms and shows some spring back like plasticine. In order to obtain a more realistic simulation, the soft contact simulation environment must be able to simulate all three deformations.
In this simulation environment, we aim to simulate the deformation of contact between the VBTSs and deformable objects including elastic, plastic, and elastoplastic. It should be noted that although VBTSs are used in this work, our simulation framework can be adapted to other soft tactile sensors like [30] and [31]. In order to generate transferable observations, gel layer simulation is crucial as it embeds the essential information of the interaction between the sensor and the deformable objects, and our work only includes the gel layer and ignores the other parts of the sensors. Of course, the structure of the sensor can be added to the simulation if it will have an impact on the simulation like finger VBTS [32], but this will increase the amount of computation.
III-B The methods of the soft contact simulation
As shown in Figure 2-(A), the MLS-MPM [20] and a parallel programming language for high-performance numerical computation Taichi [33] are applied for soft contact simulation.
The MLS-MPM method utilizes particles to represent objects, and the particle motion can simulate gel layer and object deformation. Every particle contains object information, such as mass, velocity, and deformation. In addition, there is a fixed grid in the simulation environment, and the particles exchange object information with the nearby grid nodes. Particles move according to the particle velocities after information exchange in each timestep. Thanks to the use of particles and grids, MLS-MPM can take advantage of both particle and grid simulation methods. It includes five steps: initialization, particle-to-grid, grid operation, grid-to-particle, and particle operation. These parts include the whole process of information exchange, deformation simulation, and object motion. Deformable object simulation is introduced in detail in [34]. Von Mises yield criterion applied for deformable objects is introduced in [23]. The detail of the MLS-MPM has been introduced in our previous work[21].
The deformation object simulation is achieved in step grid-to-particles. This step applies von Mises yield criterion [23] to calculate the deformation gradient. Suppose that the states of the -th time step are known. For a deformable object, the deformation gradient of the particle in the time step is:
(1) |
where is the affine velocity of this particle in the time step, and is the deformation gradient of this particle in the time step.
In order to solve the elastoplastic problem of deformable objects, we introduce the von Mises yield criterion [23]. This method can help us decide whether the particle deforms elastically or plastically. Suppose is the particle deformation gradient calculated by equation 1. The trial Hencky strain is derived by singular value decomposition . Therefore, the von Mises yield criterion is:
(2) |
where ; denotes the yield stress parameter defined by material property; is Lame’s 1st parameter. This criterion decides the final deformation gradient:
(3) |
In summary, when the particle’s second invariant of the deviatoric stress exceeds a certain value, the particle will plastically deform. This process is implemented by projecting on the Hencky strain and, finally, the deformation gradient. This process is called return mapping.
IV MANIPULATION LEARNING
To build a transferable simulation environment, the MLS-MPM is applied for simulation, and VBTSs are utilized as the end-effectors. We employ RL for manipulation policy design. Subsection IV-A introduces the design of the VBTS. The simulation environment and three kinds of observations are shown in Subsection IV-B. Subsection IV-C introduces the RL training details, including manipulation tasks and several training strategies.
IV-A The design of vision-based tactile sensors
VBTS is an innovative optical sensor that has been widely used in robotic perception due to its high resolution and robustness [9]. In this work, we follow our previous work [35] to design the VBTS. It should be noted that planar optical tactile sensors like [10, 12, 13] are all applicable to the method introduced in this paper. We designed this VBTS only to be used as an end-effector to accomplish sim-to-real, and different optical tactile sensors can be selected with different observations to follow our work. We discuss the different observations adapted to different optical tactile sensors in subsection VI.
The principle of the VBTS is shown in Figure 3-(A). Light is refracted into the medium. When the incidence angle of the refracted light exceeds the critical angle (defined by the refractive index of the medium), the propagation of light satisfies the condition of the total internal reflection (TIR)[35]. After contact with the medium, light is scattered[36] and captured by the camera. If the total reflection occurs, the critical angle is defined as follows:
(4) |
If the TIR condition is not satisfied, some internal light will overflow the elastomer [35]. The light intensity is defined as follows:
(5) |
IV-B Simulation Environment and Transferable Observations
Environment: We apply the simulation method mentioned in Subsection III for the gel layer and deformable object in the environment. Two gel layers are utilized to imitate human dual-hand manipulation. The deformable object is initially in a cube shape because this shape widely exists in reality. Before manipulation, the end-effectors are controlled to lightly touch the deformable object to obtain observations different from those without interaction.
Observation: In the proposed simulation environment, we have three types of observations that are transferable to reality: relative position, squeezed area, and object contour. The methods of obtaining observations in simulation and reality are shown in Figure 2-(B) and Figure 4, respectively. If the design of VBTS is the same as ours, the algorithm can refer to the Algorithm 1.
Relative position: The relative position contains the deformable object’s middle point and the gel layer’s middle point. In the simulation environment, we can collect the observation by the positions of particles.
In reality, we can get the observation by the VBTS. Algorithm 1, the input is the RGB image and the output is an array of midpoint positions including sensor and object. The represents the pixel points in the -th row and -th column. The is a threshold that we set. The is a binary image that represents the object’s shape. The represents the pixel points is not the object, the opposite is. The center of the object is {} which can be calculated by the . The of the object is not available from the VBTS and we ignore it. The of the VBTS can roughly replace the of the midpoint of the object. The exact of the midpoint of the object may require depth calibration to obtain, which will be addressed in our future work. The midpoint position of the sensor is {} which can be provided by the UR5 due to VBTS being used as the end-effector. In order to unify data, we use the position of the initial VBTS as the basis to normalize the position during the move to align with the sim.
Squeezed area: The squeezed area refers to the deformation area of the gel layer during the pressing process. In the simulation environment, the squeezed area is composed of particle positions. The depth image is obtained by linear interpolation of the positions of the surface particles. Then, a threshold is selected to segment the depth image and finally get the squeezed area represented by a binary image.
In reality, the squeezed area can be segmented from the tactile image, which can be obtained from VBTS. The input is the RGB image and the output is a binary image. Refer to the above for the same parts as relative position. We first segment the object and then segment the deformable area from the object. The is a binary image that represents the squeezed area. The represents the pixel point is not the squeezed area, the opposite is.
Object contour: The object contour refers to the deformation area of the gel layer and object shape during the pressing process. In the simulation environment, the deformation area is composed of particle positions and the shape of the object can be obtained by projecting the particles representing the deformable object in the direction of the particles representing the gel layer.
In reality, the squeezed area can be segmented from the tactile image, which can be obtained from VBTS. The input is the RGB image and the output is a binary image. The object shape is obtained by , and the deformation area is obtained by . The observation named object contour can be obtained by comparison of the two binary images and represented by . The represents the pixel point is not the deformable area, represents the pixel point is the deformable area, and represents the pixel point is the object area which no-contact with the VBTS.
IV-C Reinforcement Learning Benchmark
Task: Considering human manipulations implemented on deformable objects, we propose four tasks: position control, squeeze, cylinder, and sphere, as shown in Figure 5. These tasks are classical deformable object manipulations.
Position control: The deformable objects are moved while being held by the end-effector, and their rewards are related to the position of the objects. The reward of position control is related to the distance between the current reformable object position and the desired position.
Squeeze: This task changes the relative positions between sensors and deformable objects, and simple deformation of the objects is included. In the squeeze task, deformable objects are squeezed to the desired thickness. The reward is related to the thickness.
Cylinder and sphere: These two tasks include sophisticated deformation. Deformable objects are rubbed into a cylinder or kneaded into a sphere by the sensor.
In the sphere task, the deformable object is kneaded into a sphere. It is difficult to propose a parameter representing an object’s sphere degree. In this case, suppose that there are deformable object particles, we calculate the distance between the deformable object’s middle point and the farthest particle:
(6) |
where denotes the reward in the -th timestep; denotes the -th particle position in the -th timestep. Then the applied reward is derived as:
(7) |
Although this reward is not directly related to the object’s shape, a sphere will obtain a higher reward than an object with other shapes sharing the same volume.
Similar to the sphere, the cylinder also calculates the distance between the deformable object’s middle point and the farthest particle. However, the cylinder applies for particle position in the x-z plane instead of 3-D space.
These tasks are selected because they include classical manipulations with different difficulties. The appearance of deformation does not affect the rewards in the first task, and the tasks are relatively easy. In squeeze, the deformable object will be compressed, and simple deformation is included. The cylinder and sphere contain complex deformations, which are the most challenging tasks. In addition, complex tasks can also be achieved through simple tasks. For example, position control can fulfill any task if a desired motion trajectory is given.
Training strategy: We exploit TD3 [4] as the basic RL policy. It is simple for TD3 to learn an effective policy for most tasks, but cylinder and sphere seem challenging. Considering that humans can achieve these tasks, we include human-designed motions as expert demonstrations and apply two strategies, pretraining and multi-task training, to learn from the demonstrations. The policy diagram is shown in Figure 6.
We propose a human-designed trajectory as the baseline. For the cylinder, the plasticine will be rubbed from left to right repeatedly. For the sphere, the gel layers are controlled to move along a trajectory surrounding the deformable object. The diagrams of sensor trajectory in cylinder and sphere baseline are shown in Figure 6-(A) and Figure 6-(B). Although it is not generated from an RL policy, we apply it as the baseline since it is a human-designed motion. Additionally, it will be the expert demonstration for each task.
TD3 is an actor-critic method, and the policy is shown in Figure 6-(C). The double Q network structure and delayed update strategy are not emphasized as they are not the focus of this research. The actor obtains observations and generates actions. The critic receives samples from the replay buffer and updates the Q network for actor training. The actor training loss is divided as:
(8) |
where and denote actor and critic network; denotes states.
Considering the challenging tasks, namely cylinder and sphere, we train the network to learn from expert demonstrations. In this case, we introduce learning from expert loss for actor training:
(9) |
where is the desired velocity in the state as shown in Figure 6-(A) for cylinder and Figure 6-(B) for sphere. Besides, to take advantage of exploring RL training policy, classical TD3 loss is still used. Two strategies, pretraining and multi-task training, are proposed for exploring and exploiting tradeoffs.
Pretraining denotes training a network with learning from expert loss first and TD3 loss in the latter training process. The loss is:
(10) |
where denotes the current training episode; denotes the maximal training episode number.
Multi-task training aims to achieve multiple goals in the whole training process. In our work, the multiple goals are reward improvement and trajectory following. The training loss is:
(11) |
In this case, the network aims to follow the trajectory at first and focuses on reward improvement in the following training process.
V Experiments and Results
In this section, we present the experimental results for simulated deformable object manipulation and sim-to-real. We use the plasticine as the deformable object in all of the experiments. Plasticine is easily accessible and can be adjusted in size and hardness. Subsection V-A demonstrates that the MLS-MPM can simulate deformable object deformation with a proper yield stress parameter, and Subsection V-B presents the RL training performance for the manipulation benchmark. Learning from Demonstrations shows relatively high rewards on complex tasks. Subsection V-C introduces the result of the sim-to-real. In order to verify the robustness of the RL strategy, we chose plasticine with different hardnesses and sizes for the test in Subsection V-D.
V-A Plasticine Simulation
We control the gel layers to press against cubes with different yield stress parameters representing plasticity properties to validate our contact simulation method in graphics. Optic simulation is unsuitable for this work due to the lack of a reflective layer. To address this issue, We perform optical simulations using publicly available data sets[37]. The experimental results are shown in Figure 7.
In Figure 7-(A), the yield stress parameter is . This cube is totally compressed by two gel layers, and the tactile image illustrates that the cube applies a small force to the gel. These results show that this cube deforms plastically, and this parameter is suitable for some plastic materials such as sand and snow. In Figure 7-(C), the yield stress parameter is . This cube shows strong resistance in the x-z plane diagram, and the edges of the square in the tactile image are sharp and obvious. The results prove that this cube is elastic, and this parameter can be used for silicone gel and rubber.
The cube with the parameter shows elastoplasticity in Figure 7-(B). This cube maintains its initial shape partly and is compressed into a drum-like shape, different from elastic and plastic cubes. The tactile image reveals that the gel deforms due to cube resilience, but the trace shape is a circle rather than a square. Hence, this cube presents both elasticity and plasticity, similar to plasticine.
V-B Reinforcement Learning Results
TD3: For each task, the RL network is trained for 400 episodes, and each episode contains 100 timesteps except cylinder and sphere, which contain 400 timesteps due to their complexity. After every ten training episodes, we test the RL policy in ten environments with different random seeds. Their rewards and variances are shown in Figure 7. The manipulation videos are available in the supplementary materials.
According to the experiment videos and reward results, TD3 achieves most tasks except cylinder and sphere, which is caused by their complexity. Reciprocating and circling motions are required by these tasks based on human experience, which is challenging for TD3. In this case, we employ the learning from demonstration strategies such as pretraining and multi-task training. For more information, we employ the informative observation named squeezed area for the task of cylinder and sphere.
Learning from expert demonstrations: Two demonstration learning strategies are applied to complex tasks: cylinder and sphere. Four policies are compared: baseline, TD3, pertaining, and multi-task training. All the related training policies and observations are mentioned in Subsection IV. The baseline is a human-designed motion, so the reward is a certain value, and for cylinder and sphere respectively. In this series of experiments, is 400. In Figure 8-(C) and Figure 8-(D), we compare the test results of such strategies. Table I shows the rewards and standard derivations for these two tasks with different learning strategies in the latter 200 episodes.
TD3 | pretraining | multi-task | |
---|---|---|---|
Squeezed area | |||
Object contour |
TD3 | pretraining | multi-task | |
---|---|---|---|
Squeezed area | |||
Object contour |
Figure 8-(C) and Figure 8-(D) show the rewards during baseline manipulation. Final states are concerned instead of the whole process, so we compare the average rewards in the latter 130 and 90 timesteps of different strategies for cylinder and sphere. In Figure 8-(C) and 8-(D), TD3 fails to manipulate correctly compared with baseline due to the high resolution of the observations and the high DOF of deformation. On the other hand, pretraining and multi-task training obtain at least similar performances to baseline.
Figure 8-(C) and 8-(D) also show that the cylinder reaches convergence faster than the sphere due to its motion properties. The sphere requires a circling motion, and the desired motion includes two directions for every state. However, the cylinder acquires a reciprocating motion in one direction. The use of expert demonstration learning strategies allows agents to realize this quickly. The agent quickly converges in one direction and only needs to consider the desired motion in the other direction.
Multi-task training and pretraining achieve similar rewards, but the former strategy fluctuates more during the training process. The TD3 training part of pretraining shows limited reward improvement ability. This difference reveals that a steady training loss transition performs better than an abrupt change.
Object contour obtains more robust results than squeezed area. The rewards of pretraining and multi-task training are more robust under object contour. This comparison can be found from their stand derivations in Table I and II. Besides, the model trained under object contour obtained a faster convergence than the squeezed area. This is because object contour contains more information than squeezed area, which also leads to a longer training time.
All of the reinforcement learning results are implemented with Taichi 1.4.0, Pytorch 1.8, and Python 3.8. The hardware uses an Intel Core i7-8750H processor, two 8 GB memory chips (DDR4), and one GPU (GeForce RTX 3080Ti 12 G).
V-C Results of the Sim-to-real
To complete the sim-to-real, we build an experimental platform as shown in Figure 1. The VBTS is the end-effector of the UR5 robotic to perform the corresponding tasks. A gasket made of silicone is placed on the experimental platform to support the plasticine like a human hand. At the same time, the gel layer of the VBTS works as the other hand to manipulate the plasticine. This is consistent with our setup in the simulation. The result is shown in Figure 9.
In Figure 9-(A, B, C), the tasks of position control, squeeze, cylinder, and sphere use the RL strategy trained by TD3, and the observation is relative position. The UR5 controls the plasticine from the initial position to the target position in Figure 9-(A). Two different sizes of plasticine are squeezed to the same thickness in Figure 9-(B). The sim-to-real results of position control and squeeze are consistent with the simulation. The desired motion of these two tasks is unidirectional and non-reciprocal, which makes it easier for the agent to control.
On the contrary, the tasks of cylinder and sphere are challenging. The results in Figure 9-(C) proves this. The UR5 drastically deviates from the workspace and is not moving according to desired motion. This is not surprising and proves the need for the expert demonstration strategy. The desired motion of the cylinder and sphere is more complex. The sphere requires a circling motion, which involves moving in two directions. The cylinder, on the other hand, requires reciprocal motion. The complex motion makes it more difficult for agents to learn. For some complex tasks, the agents need some expert demonstrations to help them converge quickly and learn the correct action.
In order to verify the reliability of the expert demonstration strategy, we conduct the corresponding experiments. The result of them is shown in Figure 9-(D) and Figure 9-(E). The tasks of cylinder and sphere use the RL strategy is trained by pretraining and multi-task training which are described in Subsection IV-C. For better results, we chose the more informative observations named squeezed area and object contour which are described in Subsection IV-B for the RL strategy. The manipulation videos are available in the supplementary materials.
According to Figure 9-(D), Figure 9-(E) and manipulation videos, expert demonstration strategies achieve the tasks of cylinder and sphere. The UR5 is controlled by the agent to perform reciprocal and circling motions, respectively. In the tasks of the cylinder, the plasticine is rubbed from a square to a cylinder-like object. Similarly, the plasticine is rubbed from a square to a sphere-like object. In terms of object shape, our method has better performance on cylinder than on sphere due to fewer deformation surfaces. Because two faces are not to be deformed, the task of the cylinder only needs to deform four surfaces. On the contrary, the task of the sphere needs to deform six surfaces. In order to make the plasticine more circular, the agent needs to perform more complex motions, such as multiple circling motions with different radii or turning the plasticine over and continuing the circular motion. At the same time, it is a difficult problem to determine whether the plasticine deforms into a sphere and ends its motion. All of these issues increase the difficulty of the control strategy significantly.
To evaluate the expert demonstration strategies, we perform a roundness evaluation. We repeat the cylinder and sphere tasks 10 times each and evaluate the roundness of the results. The ratio of the and of the cross-section is used as an evaluation criterion introduced in [38]. In order to obtain a rubric, we set two baselines named Baseline and Satisfactory baseline, respectively. Five people are invited to conduct manipulation experiments. They are asked to manipulate the plasticine cube as the expert demonstration strategy and the mean of the roundness evaluation results as the Baseline. On the contrary, they can use any method to manipulate the plasticine cube including using their eyes to observe and adjust the manipulation until they are satisfied. The mean of the roundness evaluation results as the Satisfactory baseline. We use the front surface of the cylinder results and three surfaces including the front, top, and side of sphere results to evaluate roundness. The Sphere (mean) is the mean of those three surfaces’ roundness in sphere results. The roundness evaluation results are shown in Figure 9-(F) and Table III.
Our method | Baseline | Satisfactory baseline | |
---|---|---|---|
Cylinder (Front) | |||
Sphere (Front) | |||
Sphere (Top) | |||
Sphere (Side) | |||
Sphere (Mean) |
According to Figure 9-(E) and Table III, the sphere tasks are achieved better than the cylinder task. Our method has a lower average roundness on the cylinder compared to the baseline and satisfactory baseline. On the sphere tasks, our method has a higher average roundness than the baseline, but lower than the satisfactory baseline including three-face and mean roundness. The cylinder task is simpler than the sphere task, not only for the agent but also for humans. People can complete this task with reciprocating motion, which is a simple manipulation. In contrast, the sphere task involves more complex movement, which is explained above. The results of the sphere task show that our method is more suitable for complex tasks. The agents trained by expert demonstration strategies outperform the people who asked to manipulate the plasticine as the expert demonstration strategies. It should be noted that the Satisfactory baseline is established when humans are allowed to perform any manipulation until they are satisfied. However, the ratio of and still have an error of about . This indicates that it is difficult to manipulate plasticine from a cube into a perfect cylinder and sphere. In the other hand, our method’s roundness curves are close to the satisfactory baseline serval times without adjusting manipulation by external vision, demonstrating the enormous potential of our method in complex manipulation. We set a criterion that if the roundness of manipulation results is within below the baseline, we consider the manipulation successful. Among our repeated tasks, we achieve a success rate of .
V-D Robustness experiments
In this subsection, we choose different hardnesses and sizes of plasticine to test the robustness of our method. The more difficult tasks of sphere and cylinder are used in this experiment. All the RL strategies are trained by expert demonstration strategy and observation named squeezed area. The results of the experiment are shown in Figure 10, and the manipulation videos are supported in the supplementary materials.
According to Figure 10 and manipulation videos, the movement trajectories of the UR5 are as expected. The experiment of different sizes achieve similar results. The size of the object has no effect on the elasticity of the object, and it only changes the observation of the squeezed area. RL strategies are sufficient to overcome this problem and control UR5 to move along the desired motion. However, the hardness of a deformable object changes its elasticity, which cannot be overcome by RL strategies. It is obvious that objects of different hardness do not show similar deformation even under the same desired motion. This increases the difficulty of the RL strategies due to the object hardness affects the robustness of RL strategies. This makes it impossible to manipulate objects with different hardness using only one strategy. To achieve better results, different strategies need to be trained for different hardness objects.
VI Conclusion
In this work, we build a system for manipulating deformable objects, including contact simulation, the design of VBTS, simulation-based training, and sim-to-real transfer. Our aim is to design a contact simulation of deformation objects and simulation-based training to achieve the manipulation of deformation objects. We design a new simulation environment based on [21] to simulate the contact of deformation objects. The experiments in V-A proves that the simulation environment can simulate elastic, plastic, and elastoplastic deformation reliably. In order to get transferable observations, we propose the use of VBTS. The design of the VBTS is introduced in IV-A and the methods to get the transfer observations in IV-B. We use TD3 and expert demonstration strategies for the simulation-based training. The experiments in V-B shows the results of the training including Position Control, Squeeze, Cylinder, and Sphere that achieve the desired results in simulation under the control strategies. For sim-to-real transfer, we build an experiment platform and achieve the transfer of all the tasks. The sim-to-real transfer results are shown in V-C and the supplementary materials. In the end, we test the robustness of our method using plasticine of different hardnesses and sizes. Our method is robust to objects of different sizes but performs poorly for objects of different hardness.
This work mainly provides ideas and methods for the contact simulation and manipulation of deformable objects. The planar optical tactile sensors can choose different observations to follow our work. The VBTS designed the same as us can follow our observations to accomplish sim-to-real. The VBTS designed like Gelsight[10] can use the observation named squeezed area. It is worth noting that our simulation method provides freedom of choice and is not limited to optical tactile sensors. Through our contact simulation environment, you can obtain information such as the speed, position, and force of the objects. Regardless of the method you use to align the information between the sim and the real, you can attempt sim-to-real transfer learning by our environment.
In our future work, we aim to enhance our research outcomes by making several improvements. Firstly, we plan to provide our agents with more comprehensive information, including the three-dimensional shape of objects, enabling them to optimize their manipulation abilities similar to human capabilities. Our ultimate goal is to empower our agents to adjust their actions based on a broader range of information, enhancing their performance in manipulation tasks involving objects such as cylinders and spheres, surpassing the current satisfactory baseline. Additionally, we aspire to develop a versatile simulation platform for soft robots, utilizing the contact simulation environment introduced in this paper. This platform would serve as a valuable tool akin to existing frameworks like Pybullet and Mujoco.
References
- [1] J. Zhu, A. Cherubini, C. Dune, D. Navarro-Alarcon, F. Alambeigi, D. Berenson, F. Ficuciello, K. Harada, J. Kober, X. Li et al., “Challenges and outlook in robotic manipulation of deformable objects,” IEEE Robotics & Automation Magazine, vol. 29, no. 3, pp. 67–77, 2022.
- [2] F. Liu, F. Sun, B. Fang, X. Li, S. Sun, and H. Liu, “Hybrid robotic grasping with a soft multimodal gripper and a deep multistage learning scheme,” IEEE Transactions on Robotics, vol. 39, no. 3, pp. 2379–2399, 2023.
- [3] R. S. Sutton, A. G. Barto et al., “Introduction to reinforcement learning,” 1998.
- [4] S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in International conference on machine learning. PMLR, 2018, pp. 1587–1596.
- [5] C. Lu, J. Wang, and S. Luo, “Surface following using deep reinforcement learning and a gelsighttactile sensor,” arXiv preprint arXiv:1912.00745, 2019.
- [6] A. Church, J. Lloyd, R. Hadsell, and N. F. Lepora, “Deep reinforcement learning for tactile robotics: Learning to type on a braille keyboard,” IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 6145–6152, 2020.
- [7] T. Bi, C. Sferrazza, and R. D’Andrea, “Zero-shot sim-to-real transfer of tactile control policies for aggressive swing-up manipulation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5761–5768, 2021.
- [8] Z. Huang, Y. Hu, T. Du, S. Zhou, H. Su, J. B. Tenenbaum, and C. Gan, “Plasticinelab: A soft-body manipulation benchmark with differentiable physics,” arXiv preprint arXiv:2104.03311, 2021.
- [9] S. Zhang, Z. Chen, Y. Gao, W. Wan, J. Shan, H. Xue, F. Sun, Y. Yang, and B. Fang, “Hardware technology of vision-based tactile sensor: A review,” IEEE Sensors Journal, 2022.
- [10] W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot tactile sensors for estimating geometry and force,” Sensors, vol. 17, no. 12, p. 2762, 2017.
- [11] B. Ward-Cherrier, N. Pestell, L. Cramphorn, B. Winstone, M. E. Giannaccini, J. Rossiter, and N. F. Lepora, “The tactip family: Soft optical tactile sensors with 3d-printed biomimetic morphologies,” Soft robotics, vol. 5, no. 2, pp. 216–227, 2018.
- [12] B. Fang, H. Xue, F. Sun, Y. Yang, and R. Zhu, “A cross-modal tactile sensor design for measuring robotic grasping forces,” Industrial Robot: the international journal of robotics research and application, 2019.
- [13] S. Zhang, Y. Yang, J. Shan, F. Sun, and B. Fang, “A novel vision-based tactile sensor using lamination and gilding process for improvement of outdoor detection and maintainability,” IEEE Sensors Journal, 2023.
- [14] S. Luo, W. Yuan, E. Adelson, A. G. Cohn, and R. Fuentes, “Vitac: Feature sharing between vision and tactile sensing for cloth texture recognition,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 2722–2727.
- [15] Y. Chen, J. Lin, X. du, B. Fang, and S. Li, “Non-destructive fruit firmness evaluation using vision-based tactile information,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2022, pp. 2303–2309.
- [16] J. Lloyd and N. F. Lepora, “Goal-driven robotic pushing using tactile and proprioceptive feedback,” IEEE Transactions on Robotics, 2021.
- [17] Y. She, S. Wang, S. Dong, N. Sunil, A. Rodriguez, and E. Adelson, “Cable manipulation with a tactile-reactive gripper,” The International Journal of Robotics Research, vol. 40, no. 12-14, pp. 1385–1401, 2021.
- [18] C. Sferrazza, A. Wahlsten, C. Trueeb, and R. D’Andrea, “Ground truth force distribution for learning-based tactile sensing: A finite element approach,” IEEE Access, vol. 7, pp. 173 438–173 449, 2019.
- [19] Z. Hu, T. Han, P. Sun, J. Pan, and D. Manocha, “3-d deformable object manipulation using deep neural networks,” IEEE Robotics and Automation Letters, vol. 4, no. 4, pp. 4255–4261, 2019.
- [20] Y. Hu, Y. Fang, Z. Ge, Z. Qu, Y. Zhu, A. Pradhana, and C. Jiang, “A moving least squares material point method with displacement discontinuity and two-way rigid body coupling,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018.
- [21] Z. Chen, S. Zhang, S. Luo, F. Sun, and B. Fang, “Tacchi: A pluggable and low computational cost elastomer deformation simulator for optical tactile sensors,” IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1239–1246, 2023.
- [22] Z. Chen, S. Zhang, Y. Sun, S. Luo, F. Sun, and B. Fang, “Plasticine manipulation simulation with optical tactile sensing,” in ICRA ViTac Workshop, 2023.
- [23] M. Gao, A. P. Tampubolon, C. Jiang, and E. Sifakis, “An adaptive generalized interpolation material point method for simulating elastoplastic materials,” ACM Transactions on Graphics (TOG), vol. 36, no. 6, pp. 1–12, 2017.
- [24] S. Li, Z. Huang, T. Du, H. Su, J. B. Tenenbaum, and C. Gan, “Contact points discovery for soft-body manipulations with differentiable physics,” arXiv preprint arXiv:2205.02835, 2022.
- [25] C. Matl and R. Bajcsy, “Deformable elasto-plastic object shaping using an elastic hand and model-based reinforcement learning,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 3955–3962.
- [26] A. Church, J. Lloyd, and N. F. Lepora, “Tactile sim-to-real policy transfer via real-to-sim image translation,” in Conference on Robot Learning. PMLR, 2022, pp. 1645–1654.
- [27] Z. Zhao and Z. Lu, “Multi-purpose tactile perception based on deep learning in a new tendon-driven optical tactile sensor,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 2099–2104.
- [28] Y. Wu, W. Yan, T. Kurutach, L. Pinto, and P. Abbeel, “Learning to manipulate deformable objects without demonstrations,” arXiv preprint arXiv:1910.13439, 2019.
- [29] Z. Si, Z. Zhu, A. Agarwal, S. Anderson, and W. Yuan, “Grasp stability prediction with sim-to-real transfer from tactile sensing,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 7809–7816.
- [30] M. L. Preti, M. Totaro, E. Falotico, M. Crepaldi, and L. Beccai, “Online pressure map reconstruction in a multitouch soft optical waveguide skin,” IEEE/ASME Transactions on Mechatronics, 2022.
- [31] H. Wang, M. Totaro, and L. Beccai, “Development of fully shielded soft inductive tactile sensors,” in 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 2019, pp. 246–249.
- [32] D. F. Gomes, Z. Lin, and S. Luo, “Geltip: A finger-shaped optical tactile sensor for robotic manipulation,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 9903–9909.
- [33] Y. Hu, T.-M. Li, L. Anderson, J. Ragan-Kelley, and F. Durand, “Taichi: a language for high-performance computation on spatially sparse data structures,” ACM Transactions on Graphics (TOG), vol. 38, no. 6, p. 201, 2019.
- [34] Y. Wang, W. Huang, B. Fang, F. Sun, and C. Li, “Elastic tactile simulation towards tactile-visual perception,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 2690–2698.
- [35] S. Zhang, Y. Sun, J. Shan, Z. Chen, F. Sun, Y. Yang, and B. Fang, “Tirgel: A visuo-tactile sensor with total internal reflection mechanism for external observation and contact detection,” IEEE Robotics and Automation Letters, 2023.
- [36] K. Shimonomura, “Tactile image sensors employing camera: A review,” Sensors, vol. 19, no. 18, p. 3933, 2019.
- [37] D. F. Gomes, P. Paoletti, and S. Luo, “Generation of gelsight tactile images for sim2real learning,” IEEE Robot. Automat. Lett., vol. 6, no. 2, pp. 4177–4184, 2021.
- [38] W. Sui and D. Zhang, “Four methods for roundness evaluation,” Physics Procedia, vol. 24, pp. 2159–2164, 2012.