ExTraCT - Explainable trajectory corrections for language-based human-robot interaction using textual feature descriptions

Front Robot AI. 2024 Sep 23:11:1345693. doi: 10.3389/frobt.2024.1345693. eCollection 2024.

Abstract

Introduction: In human-robot interaction (HRI), understanding human intent is crucial for robots to perform tasks that align with user preferences. Traditional methods that aim to modify robot trajectories based on language corrections often require extensive training to generalize across diverse objects, initial trajectories, and scenarios. This work presents ExTraCT, a modular framework designed to modify robot trajectories (and behaviour) using natural language input.

Methods: Unlike traditional end-to-end learning approaches, ExTraCT separates language understanding from trajectory modification, allowing robots to adapt language corrections to new tasks-including those with complex motions like scooping-as well as various initial trajectories and object configurations without additional end-to-end training. ExTraCT leverages Large Language Models (LLMs) to semantically match language corrections to predefined trajectory modification functions, allowing the robot to make necessary adjustments to its path. This modular approach overcomes the limitations of pre-trained datasets and offers versatility across various applications.

Results: Comprehensive user studies conducted in simulation and with a physical robot arm demonstrated that ExTraCT's trajectory corrections are more accurate and preferred by users in 80% of cases compared to the baseline.

Discussion: ExTraCT offers a more explainable approach to understanding language corrections, which could facilitate learning human preferences. We also demonstrated the adaptability and effectiveness of ExTraCT in a complex scenarios like assistive feeding, presenting it as a versatile solution across various HRI applications.

Keywords: assistive robots; foundational models; human-robot interaction; language in robotics; large language models; natural language processing.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. The research was supported by the Rehabilitation Research Institute of Singapore and the National Research Foundation Singapore (NRF) under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.