Search | arXiv e-print repository

Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models

Authors: Dionis Totsila, Quentin Rouxel, Jean-Baptiste Mouret, Serena Ivaldi

Abstract: This paper presents Words2Contact, a language-guided multi-contact placement pipeline leveraging large language models and vision language models. Our method is a key component for language-assisted teleoperation and human-robot cooperation, where human operators can instruct the robots where to place their support contacts before whole-body reaching or manipulation using natural language. Words2C… ▽ More This paper presents Words2Contact, a language-guided multi-contact placement pipeline leveraging large language models and vision language models. Our method is a key component for language-assisted teleoperation and human-robot cooperation, where human operators can instruct the robots where to place their support contacts before whole-body reaching or manipulation using natural language. Words2Contact transforms the verbal instructions of a human operator into contact placement predictions; it also deals with iterative corrections, until the human is satisfied with the contact location identified in the robot's field of view. We benchmark state-of-the-art LLMs and VLMs for size and performance in contact prediction. We demonstrate the effectiveness of the iterative correction process, showing that users, even naive, quickly learn how to instruct the system to obtain accurate locations. Finally, we validate Words2Contact in real-world experiments with the Talos humanoid robot, instructed by human operators to place support contacts on different locations and surfaces to avoid falling when reaching for distant objects. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.12381 [pdf, other]

Flow Matching Imitation Learning for Multi-Support Manipulation

Authors: Quentin Rouxel, Andrea Ferrari, Serena Ivaldi, Jean-Baptiste Mouret

Abstract: Humanoid robots could benefit from using their upper bodies for support contacts, enhancing their workspace, stability, and ability to perform contact-rich and pushing tasks. In this paper, we propose a unified approach that combines an optimization-based multi-contact whole-body controller with Flow Matching, a recently introduced method capable of generating multi-modal trajectory distributions… ▽ More Humanoid robots could benefit from using their upper bodies for support contacts, enhancing their workspace, stability, and ability to perform contact-rich and pushing tasks. In this paper, we propose a unified approach that combines an optimization-based multi-contact whole-body controller with Flow Matching, a recently introduced method capable of generating multi-modal trajectory distributions for imitation learning. In simulation, we show that Flow Matching is more appropriate for robotics than Diffusion and traditional behavior cloning. On a real full-size humanoid robot (Talos), we demonstrate that our approach can learn a whole-body non-prehensile box-pushing task and that the robot can close dishwasher drawers by adding contacts with its free hand when needed for balance. We also introduce a shared autonomy mode for assisted teleoperation, providing automatic contact placement for tasks not covered in the demonstrations. Full experimental videos are available at: https://hucebot.github.io/flow_multisupport_website/ △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2401.05290 [pdf, other]

Analysis and Perspectives on the ANA Avatar XPRIZE Competition

Authors: Kris Hauser, Eleanor Watson, Joonbum Bae, Josh Bankston, Sven Behnke, Bill Borgia, Manuel G. Catalano, Stefano Dafarra, Jan B. F. van Erp, Thomas Ferris, Jeremy Fishel, Guy Hoffman, Serena Ivaldi, Fumio Kanehiro, Abderrahmane Kheddar, Gaelle Lannuzel, Jacqueline Ford Morie, Patrick Naughton, Steve NGuyen, Paul Oh, Taskin Padir, Jim Pippine, Jaeheung Park, Daniele Pucci, Jean Vaz , et al. (3 additional authors not shown)

Abstract: The ANA Avatar XPRIZE was a four-year competition to develop a robotic "avatar" system to allow a human operator to sense, communicate, and act in a remote environment as though physically present. The competition featured a unique requirement that judges would operate the avatars after less than one hour of training on the human-machine interfaces, and avatar systems were judged on both objective… ▽ More The ANA Avatar XPRIZE was a four-year competition to develop a robotic "avatar" system to allow a human operator to sense, communicate, and act in a remote environment as though physically present. The competition featured a unique requirement that judges would operate the avatars after less than one hour of training on the human-machine interfaces, and avatar systems were judged on both objective and subjective scoring metrics. This paper presents a unified summary and analysis of the competition from technical, judging, and organizational perspectives. We study the use of telerobotics technologies and innovations pursued by the competing teams in their avatar systems, and correlate the use of these technologies with judges' task performance and subjective survey ratings. It also summarizes perspectives from team leads, judges, and organizers about the competition's execution and impact to inform the future development of telerobotics and telepresence. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 26 pages, preprint of article appearing in International Journal of Social Robotics

arXiv:2312.16465 [pdf, other]

doi 10.1109/LRA.2024.3396094

Multi-Contact Whole-Body Force Control for Position-Controlled Robots

Authors: Quentin Rouxel, Serena Ivaldi, Jean-Baptiste Mouret

Abstract: Many humanoid and multi-legged robots are controlled in positions rather than in torques, which prevents direct control of contact forces, and hampers their ability to create multiple contacts to enhance their balance, such as placing a hand on a wall or a handrail. This letter introduces the SEIKO (Sequential Equilibrium Inverse Kinematic Optimization) pipeline, and proposes a unified formulation… ▽ More Many humanoid and multi-legged robots are controlled in positions rather than in torques, which prevents direct control of contact forces, and hampers their ability to create multiple contacts to enhance their balance, such as placing a hand on a wall or a handrail. This letter introduces the SEIKO (Sequential Equilibrium Inverse Kinematic Optimization) pipeline, and proposes a unified formulation that exploits an explicit model of flexibility to indirectly control contact forces on traditional position-controlled robots. SEIKO formulates whole-body retargeting from Cartesian commands and admittance control using two quadratic programs solved in real-time. Our pipeline is validated with experiments on the real, full-scale humanoid robot Talos in various multi-contact scenarios, including pushing tasks, far-reaching tasks, stair climbing, and stepping on sloped surfaces. Code and videos are available at: https://hucebot.github.io/seiko_controller_website/ △ Less

Submitted 22 May, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Journal ref: IEEE Robotics and Automation Letters, 2024, 9 (6), pp.5639-5646

arXiv:2308.03479 [pdf, other]

doi 10.1109/tmech.2022.3152844/mm2

Feasibility Retargeting for Multi-contact Teleoperation and Physical Interaction

Authors: Quentin Rouxel, Ruoshi Wen, Zhibin Li, Carlo Tiseo, Jean-Baptiste Mouret, Serena Ivaldi

Abstract: This short paper outlines two recent works on multi-contact teleoperation and the development of the SEIKO (Sequential Equilibrium Inverse Kinematic Optimization) framework. SEIKO adapts commands from the operator in real-time and ensures that the reference configuration sent to the underlying controller is feasible. Additionally, an admittance scheme is used to implement physical interaction, whi… ▽ More This short paper outlines two recent works on multi-contact teleoperation and the development of the SEIKO (Sequential Equilibrium Inverse Kinematic Optimization) framework. SEIKO adapts commands from the operator in real-time and ensures that the reference configuration sent to the underlying controller is feasible. Additionally, an admittance scheme is used to implement physical interaction, which is then combined with the operator's command and retargeted. SEIKO has been applied in simulations on various robots, including humanoid and quadruped robots designed for loco-manipulation. Furthermore, SEIKO has been tested on real hardware for bimanual heavy object carrying tasks. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 2nd Workshop Toward Robot Avatars, 2023 IEEE International Conference on Robotics and Automation (ICRA), Jun 2023, London, United Kingdom

arXiv:2303.07655 [pdf, other]

doi 10.1109/Humanoids53995.2022.10000122

Simultaneous Action Recognition and Human Whole-Body Motion and Dynamics Prediction from Wearable Sensors

Authors: Kourosh Darvish, Serena Ivaldi, Daniele Pucci

Abstract: This paper presents a novel approach to solve simultaneously the problems of human activity recognition and whole-body motion and dynamics prediction for real-time applications. Starting from the dynamics of human motion and motor system theory, the notion of mixture of experts from deep learning has been extended to address this problem. In the proposed approach, experts are modelled as a sequenc… ▽ More This paper presents a novel approach to solve simultaneously the problems of human activity recognition and whole-body motion and dynamics prediction for real-time applications. Starting from the dynamics of human motion and motor system theory, the notion of mixture of experts from deep learning has been extended to address this problem. In the proposed approach, experts are modelled as a sequence-to-sequence recurrent neural networks (RNN) architecture. Experiments show the results of 66-DoF real-world human motion prediction and action recognition during different tasks like walking and rotating. The code associated with this paper is available at: \url{github.com/ami-iit/paper_darvish_2022_humanoids_action-kindyn-predicition} △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2301.04317 [pdf, other]

Teleoperation of Humanoid Robots: A Survey

Authors: Kourosh Darvish, Luigi Penco, Joao Ramos, Rafael Cisneros, Jerry Pratt, Eiichi Yoshida, Serena Ivaldi, Daniele Pucci

Abstract: Teleoperation of humanoid robots enables the integration of the cognitive skills and domain expertise of humans with the physical capabilities of humanoid robots. The operational versatility of humanoid robots makes them the ideal platform for a wide range of applications when teleoperating in a remote environment. However, the complexity of humanoid robots imposes challenges for teleoperation, pa… ▽ More Teleoperation of humanoid robots enables the integration of the cognitive skills and domain expertise of humans with the physical capabilities of humanoid robots. The operational versatility of humanoid robots makes them the ideal platform for a wide range of applications when teleoperating in a remote environment. However, the complexity of humanoid robots imposes challenges for teleoperation, particularly in unstructured dynamic environments with limited communication. Many advancements have been achieved in the last decades in this area, but a comprehensive overview is still missing. This survey paper gives an extensive overview of humanoid robot teleoperation, presenting the general architecture of a teleoperation system and analyzing the different components. We also discuss different aspects of the topic, including technological and methodological advances, as well as potential applications. A web-based version of the paper can be found at https://humanoid-teleoperation.github.io/. △ Less

Submitted 11 January, 2023; originally announced January 2023.

arXiv:2203.00384 [pdf, other]

Data-efficient learning of object-centric grasp preferences

Authors: Yoann Fleytoux, Anji Ma, Serena Ivaldi, Jean-Baptiste Mouret

Abstract: Grasping made impressive progress during the last few years thanks to deep learning. However, there are many objects for which it is not possible to choose a grasp by only looking at an RGB-D image, might it be for physical reasons (e.g., a hammer with uneven mass distribution) or task constraints (e.g., food that should not be spoiled). In such situations, the preferences of experts need to be ta… ▽ More Grasping made impressive progress during the last few years thanks to deep learning. However, there are many objects for which it is not possible to choose a grasp by only looking at an RGB-D image, might it be for physical reasons (e.g., a hammer with uneven mass distribution) or task constraints (e.g., food that should not be spoiled). In such situations, the preferences of experts need to be taken into account. In this paper, we introduce a data-efficient grasping pipeline (Latent Space GP Selector -- LGPS) that learns grasp preferences with only a few labels per object (typically 1 to 4) and generalizes to new views of this object. Our pipeline is based on learning a latent space of grasps with a dataset generated with any state-of-the-art grasp generator (e.g., Dex-Net). This latent space is then used as a low-dimensional input for a Gaussian process classifier that selects the preferred grasp among those proposed by the generator. The results show that our method outperforms both GR-ConvNet and GG-CNN (two state-of-the-art methods that are also based on labeled grasps) on the Cornell dataset, especially when only a few labels are used: only 80 labels are enough to correctly choose 80% of the grasps (885 scenes, 244 objects). Results are similar on our dataset (91 scenes, 28 objects). △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: Video: https://youtu.be/dJ1fkcught4

arXiv:2203.00316 [pdf, other]

doi 10.1109/LRA.2022.3188884

First do not fall: learning to exploit a wall with a damaged humanoid robot

Authors: Timothée Anne, Eloïse Dalin, Ivan Bergonzani, Serena Ivaldi, Jean-Baptiste Mouret

Abstract: Humanoid robots could replace humans in hazardous situations but most of such situations are equally dangerous for them, which means that they have a high chance of being damaged and falling. We hypothesize that humanoid robots would be mostly used in buildings, which makes them likely to be close to a wall. To avoid a fall, they can therefore lean on the closest wall, as a human would do, provide… ▽ More Humanoid robots could replace humans in hazardous situations but most of such situations are equally dangerous for them, which means that they have a high chance of being damaged and falling. We hypothesize that humanoid robots would be mostly used in buildings, which makes them likely to be close to a wall. To avoid a fall, they can therefore lean on the closest wall, as a human would do, provided that they find in a few milliseconds where to put the hand(s). This article introduces a method, called D-Reflex, that learns a neural network that chooses this contact position given the wall orientation, the wall distance, and the posture of the robot. This contact position is then used by a whole-body controller to reach a stable posture. We show that D-Reflex allows a simulated TALOS robot (1.75m, 100kg, 30 degrees of freedom) to avoid more than 75% of the avoidable falls and can work on the real robot. △ Less

Submitted 28 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: Accepted in IEEE Robotics and Automation Letters, June, 2022 Video presenting the results: https://youtu.be/hbuWr-ZNAtg

arXiv:2109.12694 [pdf, other]

VP-GO: a "light" action-conditioned visual prediction model

Authors: Anji Ma, Yoann Fleytoux, Jean-Bapstiste Mouret, Serena Ivaldi

Abstract: Visual prediction models are a promising solution for visual-based robotic grasping of cluttered, unknown soft objects. Previous models from the literature are computationally greedy, which limits reproducibility; although some consider stochasticity in the prediction model, it is often too weak to catch the reality of robotics experiments involving grasping such objects. Furthermore, previous wor… ▽ More Visual prediction models are a promising solution for visual-based robotic grasping of cluttered, unknown soft objects. Previous models from the literature are computationally greedy, which limits reproducibility; although some consider stochasticity in the prediction model, it is often too weak to catch the reality of robotics experiments involving grasping such objects. Furthermore, previous work focused on elementary movements that are not efficient to reason in terms of more complex semantic actions. To address these limitations, we propose VP-GO, a ``light'' stochastic action-conditioned visual prediction model. We propose a hierarchical decomposition of semantic grasping and manipulation actions into elementary end-effector movements, to ensure compatibility with existing models and datasets for visual prediction of robotic actions such as RoboNet. We also record and release a new open dataset for visual prediction of object grasping, called PandaGrasp. Our model can be pre-trained on RoboNet and fine-tuned on PandaGrasp, and performs similarly to more complex models in terms of signal prediction metrics. Qualitatively, it outperforms when predicting the outcome of complex grasps performed by our robot. △ Less

Submitted 26 September, 2021; originally announced September 2021.

arXiv:2107.01281 [pdf, other]

Prescient teleoperation of humanoid robots

Authors: Luigi Penco, Jean-Baptiste Mouret, Serena Ivaldi

Abstract: Humanoid robots could be versatile and intuitive human avatars that operate remotely in inaccessible places: the robot could reproduce in the remote location the movements of an operator equipped with a wearable motion capture device while sending visual feedback to the operator. While substantial progress has been made on transferring ("retargeting") human motions to humanoid robots, a major prob… ▽ More Humanoid robots could be versatile and intuitive human avatars that operate remotely in inaccessible places: the robot could reproduce in the remote location the movements of an operator equipped with a wearable motion capture device while sending visual feedback to the operator. While substantial progress has been made on transferring ("retargeting") human motions to humanoid robots, a major problem preventing the deployment of such systems in real applications is the presence of communication delays between the human input and the feedback from the robot: even a few hundred milliseconds of delay can irreversibly disturb the operator, let alone a few seconds. To overcome these delays, we introduce a system in which a humanoid robot executes commands before it actually receives them, so that the visual feedback appears to be synchronized to the operator, whereas the robot executed the commands in the past. To do so, the robot continuously predicts future commands by querying a machine learning model that is trained on past trajectories and conditioned on the last received commands. In our experiments, an operator was able to successfully control a humanoid robot (32 degrees of freedom) with stochastic delays up to 2 seconds in several whole-body manipulation tasks, including reaching different targets, picking up a bottle, and placing a box at distinct locations. △ Less

Submitted 28 March, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: Video: https://www.youtube.com/watch?v=N3u4ot3aIyQ

arXiv:2102.08760 [pdf]

Using exoskeletons to assist medical staff during prone positioning of mechanically ventilated COVID-19 patients: a pilot study

Authors: Serena Ivaldi, Pauline Maurice, Waldez Gomes, Jean Theurel, Liên Wioland, Jean-Jacques Atain-Kouadio, Laurent Claudon, Hind Hani, Antoine Kimmoun, Jean-Marc Sellal, Bruno Levy, Jean Paysant, Sergueï Malikov, Bruno Chenuel, Nicla Settembre

Abstract: We conducted a pilot study to evaluate the potential and feasibility of back-support exoskeletons to help the caregivers in the Intensive Care Unit (ICU) of the University Hospital of Nancy (France) executing Prone Positioning (PP) maneuvers on patients suffering from severe COVID-19-related Acute Respiratory Distress Syndrome. After comparing four commercial exoskeletons, the Laevo passive exoske… ▽ More We conducted a pilot study to evaluate the potential and feasibility of back-support exoskeletons to help the caregivers in the Intensive Care Unit (ICU) of the University Hospital of Nancy (France) executing Prone Positioning (PP) maneuvers on patients suffering from severe COVID-19-related Acute Respiratory Distress Syndrome. After comparing four commercial exoskeletons, the Laevo passive exoskeleton was selected and used in the ICU in April 2020. The first volunteers using the Laevo reported very positive feedback and reduction of effort, confirmed by EMG and ECG analysis. Laevo has been since used to physically assist during PP in the ICU of the Hospital of Nancy, following the recrudescence of COVID-19, with an overall positive feedback. △ Less

Submitted 11 February, 2021; originally announced February 2021.

arXiv:1807.02350 [pdf, other]

A Variational Time Series Feature Extractor for Action Prediction

Authors: Maxime Chaveroche, Adrien Malaisé, Francis Colas, François Charpillet, Serena Ivaldi

Abstract: We propose a Variational Time Series Feature Extractor (VTSFE), inspired by the VAE-DMP model of Chen et al., to be used for action recognition and prediction. Our method is based on variational autoencoders. It improves VAE-DMP in that it has a better noise inference model, a simpler transition model constraining the acceleration in the trajectories of the latent space, and a tighter lower bound… ▽ More We propose a Variational Time Series Feature Extractor (VTSFE), inspired by the VAE-DMP model of Chen et al., to be used for action recognition and prediction. Our method is based on variational autoencoders. It improves VAE-DMP in that it has a better noise inference model, a simpler transition model constraining the acceleration in the trajectories of the latent space, and a tighter lower bound for the variational inference. We apply the method for classification and prediction of whole-body movements on a dataset with 7 tasks and 10 demonstrations per task, recorded with a wearable motion capture suit. The comparison with VAE and VAE-DMP suggests the better performance of our method for feature extraction. An open-source software implementation of each method with TensorFlow is also provided. In addition, a more detailed version of this work can be found in the indicated code repository. Although it was meant to, the VTSFE hasn't been tested for action prediction, due to a lack of time in the context of Maxime Chaveroche's Master thesis at INRIA. △ Less

Submitted 26 September, 2018; v1 submitted 6 July, 2018; originally announced July 2018.

arXiv:1510.03678 [pdf]

Trust as indicator of robot functional and social acceptance. An experimental study on user conformation to the iCub's answers

Authors: Ilaria Gaudiello, Elisabetta Zibetti, Sebastien Lefort, Mohamed Chetouani, Serena Ivaldi

Abstract: To investigate the functional and social acceptance of a humanoid robot, we carried out an experimental study with 56 adult participants and the iCub robot. Trust in the robot has been considered as a main indicator of acceptance in decision-making tasks characterized by perceptual uncertainty (e.g., evaluating the weight of two objects) and socio-cognitive uncertainty (e.g., evaluating which is t… ▽ More To investigate the functional and social acceptance of a humanoid robot, we carried out an experimental study with 56 adult participants and the iCub robot. Trust in the robot has been considered as a main indicator of acceptance in decision-making tasks characterized by perceptual uncertainty (e.g., evaluating the weight of two objects) and socio-cognitive uncertainty (e.g., evaluating which is the most suitable item in a specific context), and measured by the participants' conformation to the iCub's answers to specific questions. In particular, we were interested in understanding whether specific (i) user-related features (i.e. desire for control), (ii) robot-related features (i.e., attitude towards social influence of robots), and (iii) context-related features (i.e., collaborative vs. competitive scenario), may influence their trust towards the iCub robot. We found that participants conformed more to the iCub's answers when their decisions were about functional issues than when they were about social issues. Moreover, the few participants conforming to the iCub's answers for social issues also conformed less for functional issues. Trust in the robot's functional savvy does not thus seem to be a pre-requisite for trust in its social savvy. Finally, desire for control, attitude towards social influence of robots and type of interaction scenario did not influence the trust in iCub. Results are discussed with relation to methodology of HRI research. △ Less

Submitted 13 October, 2015; originally announced October 2015.

Comments: 49 pages, under review

arXiv:1508.04603 [pdf, other]

Towards engagement models that consider individual factors in HRI: on the relation of extroversion and negative attitude towards robots to gaze and speech during a human-robot assembly task

Authors: Serena Ivaldi, Sebastien Lefort, Jan Peters, Mohamed Chetouani, Joelle Provasi, Elisabetta Zibetti

Abstract: Estimating the engagement is critical for human - robot interaction. Engagement measures typically rely on the dynamics of the social signals exchanged by the partners, especially speech and gaze. However, the dynamics of these signals is likely to be influenced by individual and social factors, such as personality traits, as it is well documented that they critically influence how two humans inte… ▽ More Estimating the engagement is critical for human - robot interaction. Engagement measures typically rely on the dynamics of the social signals exchanged by the partners, especially speech and gaze. However, the dynamics of these signals is likely to be influenced by individual and social factors, such as personality traits, as it is well documented that they critically influence how two humans interact with each other. Here, we assess the influence of two factors, namely extroversion and negative attitude toward robots, on speech and gaze during a cooperative task, where a human must physically manipulate a robot to assemble an object. We evaluate if the scores of extroversion and negative attitude towards robots co-variate with the duration and frequency of gaze and speech cues. The experiments were carried out with the humanoid robot iCub and N=56 adult participants. We found that the more people are extrovert, the more and longer they tend to talk with the robot; and the more people have a negative attitude towards robots, the less they will look at the robot face and the more they will look at the robot hands where the assembly and the contacts occur. Our results confirm and provide evidence that the engagement models classically used in human-robot interaction should take into account attitudes and personality traits. △ Less

Submitted 19 August, 2015; originally announced August 2015.

Comments: 24 pages, submitted to IJSR

arXiv:1402.7050 [pdf, other]

Tools for dynamics simulation of robots: a survey based on user feedback

Authors: Serena Ivaldi, Vincent Padois, Francesco Nori

Abstract: The number of tools for dynamics simulation has grown in the last years. It is necessary for the robotics community to have elements to ponder which of the available tools is the best for their research. As a complement to an objective and quantitative comparison, difficult to obtain since not all the tools are open-source, an element of evaluation is user feedback. With this goal in mind, we crea… ▽ More The number of tools for dynamics simulation has grown in the last years. It is necessary for the robotics community to have elements to ponder which of the available tools is the best for their research. As a complement to an objective and quantitative comparison, difficult to obtain since not all the tools are open-source, an element of evaluation is user feedback. With this goal in mind, we created an online survey about the use of dynamical simulation in robotics. This paper reports the analysis of the participants' answers and a descriptive information fiche for the most relevant tools. We believe this report will be helpful for roboticists to choose the best simulation tool for their researches. △ Less

Submitted 27 February, 2014; originally announced February 2014.

Comments: 15 pages

Showing 1–16 of 16 results for author: Ivaldi, S