\section

Einführung

The advent of low-cost mobile robotic platforms has seen a surge in the usage of robots in our daily surroundings. The utility of a mobile robot expanded from industry to healthcare, shop floor, office, home, etc. Additionally, the usage of modern robots has expanded from in-house operations to remote operations. As a result, a diverse set of applications have emerged where a mobile robotic platform is used by a remote user to interact with the environment where the robot is located. These applications can be broadly categorized as telepresence and teleoperation. In the first category, the robot is essentially a mobile platform where it can move within the environment based on the remote user’s command/intent. In the second category, a robot can perform a set of real-time actuation based on the remote user’s instruction. Of course, there can be a hybrid category where a mobile teleoperation robot can be used for telepresence purposes as well. In this article, we primarily focus on a telepresence robotic system in the context of remote caregiving.

\includegraphics

[width=]figs/intro_use_case.png

Figure \thefigure: Illustration of ‘Teledrive’ as an use case in isolation center.

Motivation. The prevalent pandemic situation demands ‘social distancing’ as the new normal. Yet careful monitoring of patients in isolation must be taken care of without risking the lives of ‘caregivers’. Even without the pandemic, there is a shortage of caregivers in different parts of the world, which is exposed to be acute during the pandemic. The availability of caregiver service must be done in a democratized manner such that individual care is possible for geographically distant individuals. A telepresence robot can address part of this issue. However, the major hindrance in the wider deployment of telepresence systems is the ease of use, particularly for a non-expert user. Existing telepresence systems, provide a manual navigation capability , which is often cumbersome for a user in a non-familiar environment. Moreover, manual navigation requires continuous user intervention to move the remote robot from place to place within the remote environment. Additionally, existing telepresence systems are co-developed with the robot hardware , which makes it difficult for enhancement, particularly by third-party developers. This is mainly done due to the resource constraints existing on the robot hardware. As a result, hardware-independent software development should also be agnostic of resource constraints. Problem statement. A telepresence system typically maintains a real-time connection with an application at the caregiver’s 111Caregiver is mentioned in a remote healthcare context, who is nothing but a remote user in a general sense. end and acts as an Avatar of the caregiver at the patient’s premise. The caregiver must navigate the patient’s premise in real-time through the Avatar based on the audio-visual feedback as part of the ongoing real-time multimedia chat (Fig. LABEL:fig:areanav_intro). In most of the systems, the robot Avatar is maneuvered by the remote caregiver through manual instructions using on-screen navigation buttons, joystick, keyboard, etc. However, in an unknown premise (in the case of a tele-doctor), it would be too tedious for the caregiver to manually navigate the robot to the patient’s location. Hence, we conceive a system such that the caregiver can provide a remote verbal instruction to the robot to navigate near a desired location inside the room (e.g., ‘bedroom’). This can be further extended wards in isolation centers. The speech-based human-robot interaction (HRI) increases the usability and acceptability of the robot. Recent developments on semantic map building and speech-based HRI can be used as a building block for a telepresence robot software system. However, modern neural network-based systems provide a highly accurate outcome at the expense of higher computing and memory resources, which become scarce when multiple neural network (NN) models are to run the robot for different tasks. Thus, a software architecture is required that can facilitate various NN models to be integrated with the system without compromising on accuracy. Approach. Based on the in-situ analysis of the semantic mapping derived from the live captured frames, the robot is able to move to a position near to the intended location. Once the robot reaches ‘near’ that position, the caregiver can take manual control and perform finer control through on-screen navigation buttons. The scenario is depicted in Fig. 1. The robot is connected with the caregiver’s PDA over the Internet through a WebRTC based communication protocol. The robot is at the entrance of the patient’s premises. The old patient is in the bedroom. The caregiver verbally instructs the robot to navigate to the bedroom. Once the robot is able to locate itself around the bedroom, the caregiver can manually lead the robot to the bed where the old patient is waiting. This motivates us to develop a navigation capability along with a software architecture that makes it platform-independent. Contributions. The prime contributions of this article are two folds. Firstly, we describe the cognitive navigation problem, especially in the light of navigating the ‘AreaGoal’. We show the efficacy of our ‘AreaGoal’ navigation system by results derived from benchmark experiments. Secondly, we present the unique software system architecture comprising the speech-based HRI, the navigation module, and the real-time WebRTC based communication framework. The software architecture supports adding any further AI-based software modules. The communication framework holds the entire system together to serve multiple use cases such as the ‘caregiver’ telepresence scenario.

Table \thetable: Comparison of features in existing telepresence systems.
\rotatebox

90 Features Teledrive Double 3 ENRICH ME Amy A3 Kuby Ava 500 PadBot P3 Vgo Ohmni SuperCam BotEyes Pad Giraff Beam Pro FURo-I home Temi \nohyphensPlatform independence y n n n n n n n n n n n n n \nohyphensBrowser        GUI y y n n n n n y y y n n n n \nohyphensManual navigation y y y y n y y y y y y y y n \nohyphensAR goal based navigation y y n y n n n n n n n n n y \nohyphensMap based navigation y n y n n y n n n n n n n y \nohyphensArea goal based navigation y n n n n n n n n n n n n n \nohyphensSpeech based navigation y n n n n n n n n n n n n y \nohyphensInformation mashup y n n n n n n n n n n n n n \nohyphensSpeaker localization y n n n y n n n n n n n n n \nohyphensFace identification y n y y n n y n n n n n y y \nohyphensPerson following y n n n n n n n n n n n n y \nohyphensMulti-party federated control y n n n n n n n n n n n n n \nohyphensAutomatic map generation y n y n n n n n n n n n n y \nohyphensDialogue based disambiguation y n n n n n n n n n n n n n

The overall organization of the paper is as follows. Section  LABEL:related_work provides a survey of the state-of-the-art telepresence systems along with different evaluation metrics of a cognitive navigation system. Then it further describes the different types of cognitive navigation components. In section LABEL:, we present the software system architecture with a description of each major sub-component. Section LABEL: describes the experimental evaluation of the ‘AreaGoal’ navigation. Section LABEL: describes a practical implementation of the system on a commercial telepresence robot hardware. Finally, we conclude the article in section LABEL: along with the future research endeavors.