Search | arXiv e-print repository

How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey

Authors: Zhonghao Shi, Ellen Landrum, Amy O' Connell, Mina Kian, Leticia Pinto-Alva, Kaleen Shrestha, Xiaoyuan Zhu, Maja J Matarić

Abstract: Socially assistive robots (SARs) have shown great success in providing personalized cognitive-affective support for user populations with special needs such as older adults, children with autism spectrum disorder (ASD), and individuals with mental health challenges. The large body of work on SAR demonstrates its potential to provide at-home support that complements clinic-based interventions deliv… ▽ More Socially assistive robots (SARs) have shown great success in providing personalized cognitive-affective support for user populations with special needs such as older adults, children with autism spectrum disorder (ASD), and individuals with mental health challenges. The large body of work on SAR demonstrates its potential to provide at-home support that complements clinic-based interventions delivered by mental health professionals, making these interventions more effective and accessible. However, there are still several major technical challenges that hinder SAR-mediated interactions and interventions from reaching human-level social intelligence and efficacy. With the recent advances in large language models (LLMs), there is an increased potential for novel applications within the field of SAR that can significantly expand the current capabilities of SARs. However, incorporating LLMs introduces new risks and ethical concerns that have not yet been encountered, and must be carefully be addressed to safely deploy these more advanced systems. In this work, we aim to conduct a brief survey on the use of LLMs in SAR technologies, and discuss the potentials and risks of applying LLMs to the following three major technical challenges of SAR: 1) natural language dialog; 2) multimodal understanding; 3) LLMs as robot policies. △ Less

Submitted 5 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 2 pages, accepted to the Proceedings of the AAAI Symposium Series, 2024

arXiv:2403.14814 [pdf, other]

doi 10.2196/59479

The opportunities and risks of large language models in mental health

Authors: Hannah R. Lawrence, Renee A. Schneider, Susan B. Rubin, Maja J. Mataric, Daniel J. McDuff, Megan Jones Bell

Abstract: Global rates of mental health concerns are rising, and there is increasing realization that existing models of mental health care will not adequately expand to meet the demand. With the emergence of large language models (LLMs) has come great optimism regarding their promise to create novel, large-scale solutions to support mental health. Despite their nascence, LLMs have already been applied to m… ▽ More Global rates of mental health concerns are rising, and there is increasing realization that existing models of mental health care will not adequately expand to meet the demand. With the emergence of large language models (LLMs) has come great optimism regarding their promise to create novel, large-scale solutions to support mental health. Despite their nascence, LLMs have already been applied to mental health related tasks. In this paper, we summarize the extant literature on efforts to use LLMs to provide mental health education, assessment, and intervention and highlight key opportunities for positive impact in each area. We then highlight risks associated with LLMs' application to mental health and encourage the adoption of strategies to mitigate these risks. The urgent need for mental health support must be balanced with responsible development, testing, and deployment of mental health LLMs. It is especially critical to ensure that mental health LLMs are fine-tuned for mental health, enhance mental health equity, and adhere to ethical standards and that people, including those with lived experience with mental health concerns, are involved in all stages from development through deployment. Prioritizing these efforts will minimize potential harms to mental health and maximize the likelihood that LLMs will positively impact mental health globally. △ Less

Submitted 1 August, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 15 pages, 2 tables, 4 figures

Journal ref: JMIR Ment Health 2024;11:e59479

arXiv:2402.17937 [pdf, other]

Can an LLM-Powered Socially Assistive Robot Effectively and Safely Deliver Cognitive Behavioral Therapy? A Study With University Students

Authors: Mina J. Kian, Mingyu Zong, Katrin Fischer, Abhyuday Singh, Anna-Maria Velentza, Pau Sang, Shriya Upadhyay, Anika Gupta, Misha A. Faruki, Wallace Browning, Sebastien M. R. Arnold, Bhaskar Krishnamachari, Maja J. Mataric

Abstract: Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performa… ▽ More Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performance of the SAR through a 15-day study with 38 university students randomly assigned to interact daily with the robot or a chatbot (using the same LLM), or complete traditional CBT worksheets throughout the duration of the study. We measured weekly therapeutic outcomes, changes in pre-/post-session anxiety measures, and adherence to completing CBT exercises. We found that self-reported measures of general psychological distress significantly decreased over the study period in the robot and worksheet conditions but not the chatbot condition. Furthermore, the SAR enabled significant single-session improvements for more sessions than the other two conditions combined. Our findings suggest that SAR-guided LLM-powered CBT may be as effective as traditional worksheet methods in supporting therapeutic progress from the beginning to the end of the study and superior in decreasing user anxiety immediately after completing the CBT exercise. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.15513 [pdf, other]

doi 10.1109/BIBM58861.2023.10385292

Investigating the Generalizability of Physiological Characteristics of Anxiety

Authors: Emily Zhou, Mohammad Soleymani, Maja J. Matarić

Abstract: Recent works have demonstrated the effectiveness of machine learning (ML) techniques in detecting anxiety and stress using physiological signals, but it is unclear whether ML models are learning physiological features specific to stress. To address this ambiguity, we evaluated the generalizability of physiological features that have been shown to be correlated with anxiety and stress to high-arous… ▽ More Recent works have demonstrated the effectiveness of machine learning (ML) techniques in detecting anxiety and stress using physiological signals, but it is unclear whether ML models are learning physiological features specific to stress. To address this ambiguity, we evaluated the generalizability of physiological features that have been shown to be correlated with anxiety and stress to high-arousal emotions. Specifically, we examine features extracted from electrocardiogram (ECG) and electrodermal (EDA) signals from the following three datasets: Anxiety Phases Dataset (APD), Wearable Stress and Affect Detection (WESAD), and the Continuously Annotated Signals of Emotion (CASE) dataset. We aim to understand whether these features are specific to anxiety or general to other high-arousal emotions through a statistical regression analysis, in addition to a within-corpus, cross-corpus, and leave-one-corpus-out cross-validation across instances of stress and arousal. We used the following classifiers: Support Vector Machines, LightGBM, Random Forest, XGBoost, and an ensemble of the aforementioned models. We found that models trained on an arousal dataset perform relatively well on a previously unseen stress dataset, and vice versa. Our experimental results suggest that the evaluated models may be identifying emotional arousal instead of stress. This work is the first cross-corpus evaluation across stress and arousal from ECG and EDA signals, contributing new findings about the generalizability of stress detection. △ Less

Submitted 23 January, 2024; originally announced February 2024.

Journal ref: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2023, pp. 4848-4855

arXiv:2402.01647 [pdf, other]

Build Your Own Robot Friend: An Open-Source Learning Module for Accessible and Engaging AI Education

Authors: Zhonghao Shi, Allison O'Connell, Zongjian Li, Siqi Liu, Jennifer Ayissi, Guy Hoffman, Mohammad Soleymani, Maja J. Matarić

Abstract: As artificial intelligence (AI) is playing an increasingly important role in our society and global economy, AI education and literacy have become necessary components in college and K-12 education to prepare students for an AI-powered society. However, current AI curricula have not yet been made accessible and engaging enough for students and schools from all socio-economic backgrounds with diffe… ▽ More As artificial intelligence (AI) is playing an increasingly important role in our society and global economy, AI education and literacy have become necessary components in college and K-12 education to prepare students for an AI-powered society. However, current AI curricula have not yet been made accessible and engaging enough for students and schools from all socio-economic backgrounds with different educational goals. In this work, we developed an open-source learning module for college and high school students, which allows students to build their own robot companion from the ground up. This open platform can be used to provide hands-on experience and introductory knowledge about various aspects of AI, including robotics, machine learning (ML), software engineering, and mechanical engineering. Because of the social and personal nature of a socially assistive robot companion, this module also puts a special emphasis on human-centered AI, enabling students to develop a better understanding of human-AI interaction and AI ethics through hands-on learning activities. With open-source documentation, assembling manuals and affordable materials, students from different socio-economic backgrounds can personalize their learning experience based on their individual educational goals. To evaluate the student-perceived quality of our module, we conducted a usability testing workshop with 15 college students recruited from a minority-serving institution. Our results indicate that our AI module is effective, easy-to-follow, and engaging, and it increases student interest in studying AI/ML and robotics in the future. We hope that this work will contribute toward accessible and engaging AI education in human-AI interaction for college and high school students. △ Less

Submitted 6 January, 2024; originally announced February 2024.

Comments: Accepted to the Proceedings of the AAAI Conference on Artificial Intelligence (2024)

arXiv:2401.06974 [pdf, other]

doi 10.1126/scirobotics.adf7723

A metric for characterizing the arm nonuse workspace in poststroke individuals using a robot arm

Authors: Nathaniel Dennler, Amelia Cain, Erica De Guzman, Claudia Chiu, Carolee J. Winstein, Stefanos Nikolaidis, Maja J. Matarić

Abstract: An over-reliance on the less-affected limb for functional tasks at the expense of the paretic limb and in spite of recovered capacity is an often-observed phenomenon in survivors of hemispheric stroke. The difference between capacity for use and actual spontaneous use is referred to as arm nonuse. Obtaining an ecologically valid evaluation of arm nonuse is challenging because it requires the obser… ▽ More An over-reliance on the less-affected limb for functional tasks at the expense of the paretic limb and in spite of recovered capacity is an often-observed phenomenon in survivors of hemispheric stroke. The difference between capacity for use and actual spontaneous use is referred to as arm nonuse. Obtaining an ecologically valid evaluation of arm nonuse is challenging because it requires the observation of spontaneous arm choice for different tasks, which can easily be influenced by instructions, presumed expectations, and awareness that one is being tested. To better quantify arm nonuse, we developed the Bimanual Arm Reaching Test with a Robot (BARTR) for quantitatively assessing arm nonuse in chronic stroke survivors. The BARTR is an instrument that utilizes a robot arm as a means of remote and unbiased data collection of nuanced spatial data for clinical evaluations of arm nonuse. This approach shows promise for determining the efficacy of interventions designed to reduce paretic arm nonuse and enhance functional recovery after stroke. We show that the BARTR satisfies the criteria of an appropriate metric for neurorehabilitative contexts: it is valid, reliable, and simple to use. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted to Science Robotics at https://www.science.org/doi/10.1126/scirobotics.adf7723 on November 15th, 2023

Journal ref: Science Robotics 8, eadf7723(2023)

arXiv:2401.03329 [pdf, other]

doi 10.1007/978-3-030-90525-5_38

Designing a Socially Assistive Robot to Support Older Adults with Low Vision

Authors: Emily Zhou, Zhonghao Shi, Xiaoyang Qiao, Maja J Matarić, Ava K Bittner

Abstract: Socially assistive robots (SARs) have shown great promise in supplementing and augmenting interventions to support the physical and mental well-being of older adults. However, past work has not yet explored the potential of applying SAR to lower the barriers of long-term low vision rehabilitation (LVR) interventions for older adults. In this work, we present a user-informed design process to valid… ▽ More Socially assistive robots (SARs) have shown great promise in supplementing and augmenting interventions to support the physical and mental well-being of older adults. However, past work has not yet explored the potential of applying SAR to lower the barriers of long-term low vision rehabilitation (LVR) interventions for older adults. In this work, we present a user-informed design process to validate the motivation and identify major design principles for developing SAR for long-term LVR. To evaluate user-perceived usefulness and acceptance of SAR in this novel domain, we performed a two-phase study through user surveys. First, a group (n=38) of older adults with LV completed a mailed-in survey. Next, a new group (n=13) of older adults with LV saw an in-clinic SAR demo and then completed the survey. The study participants reported that SARs would be useful, trustworthy, easy to use, and enjoyable while providing socio-emotional support to augment LVR interventions. The in-clinic demo group reported significantly more positive opinions of the SAR's capabilities than did the baseline survey group that used mailed-in forms without the SAR demo. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: Published in Social Robotics: 13th International Conference, ICSR 2021. Springer International Publishing

arXiv:2312.14369 [pdf, other]

Quality-Diversity Generative Sampling for Learning with Synthetic Data

Authors: Allen Chang, Matthew C. Fontaine, Serena Booth, Maja J. Matarić, Stefanos Nikolaidis

Abstract: Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating synthetic training datasets. We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space, desp… ▽ More Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating synthetic training datasets. We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space, despite the data coming from a biased generator. QDGS is a model-agnostic framework that uses prompt guidance to optimize a quality objective across measures of diversity for synthetically generated data, without fine-tuning the generative model. Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks. Code available at: https://github.com/Cylumn/qd-generative-sampling. △ Less

Submitted 27 February, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted at AAAI 2024; 7 pages main, 12 pages total, 9 figures

arXiv:2305.10827 [pdf, other]

Expanding the Role of Affective Phenomena in Multimodal Interaction Research

Authors: Leena Mathur, Maja J Matarić, Louis-Philippe Morency

Abstract: In recent decades, the field of affective computing has made substantial progress in advancing the ability of AI systems to recognize and express affective phenomena, such as affect and emotions, during human-human and human-machine interactions. This paper describes our examination of research at the intersection of multimodal interaction and affective computing, with the objective of observing t… ▽ More In recent decades, the field of affective computing has made substantial progress in advancing the ability of AI systems to recognize and express affective phenomena, such as affect and emotions, during human-human and human-machine interactions. This paper describes our examination of research at the intersection of multimodal interaction and affective computing, with the objective of observing trends and identifying understudied areas. We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing: ACM International Conference on Multimodal Interaction, AAAC International Conference on Affective Computing and Intelligent Interaction, Annual Meeting of the Association for Computational Linguistics, and Conference on Empirical Methods in Natural Language Processing. We identified 910 affect-related papers and present our analysis of the role of affective phenomena in these papers. We find that this body of research has primarily focused on enabling machines to recognize and express affect and emotion. However, we find limited research on how affect and emotion predictions might be used by AI systems to enhance machine understanding of human social behaviors and cognitive states. Based on our analysis, we discuss directions to expand the role of affective phenomena in multimodal interaction research. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: 4 pages, 4 figures

arXiv:2209.03496 [pdf]

Evaluating Temporal Patterns in Applied Infant Affect Recognition

Authors: Allen Chang, Lauren Klein, Marcelo R. Rosales, Weiyang Deng, Beth A. Smith, Maja J. Matarić

Abstract: Agents must monitor their partners' affective states continuously in order to understand and engage in social interactions. However, methods for evaluating affect recognition do not account for changes in classification performance that may occur during occlusions or transitions between affective states. This paper addresses temporal patterns in affect classification performance in the context of… ▽ More Agents must monitor their partners' affective states continuously in order to understand and engage in social interactions. However, methods for evaluating affect recognition do not account for changes in classification performance that may occur during occlusions or transitions between affective states. This paper addresses temporal patterns in affect classification performance in the context of an infant-robot interaction, where infants' affective states contribute to their ability to participate in a therapeutic leg movement activity. To support robustness to facial occlusions in video recordings, we trained infant affect recognition classifiers using both facial and body features. Next, we conducted an in-depth analysis of our best-performing models to evaluate how performance changed over time as the models encountered missing data and changing infant affect. During time windows when features were extracted with high confidence, a unimodal model trained on facial features achieved the same optimal performance as multimodal models trained on both facial and body features. However, multimodal models outperformed unimodal models when evaluated on the entire dataset. Additionally, model performance was weakest when predicting an affective state transition and improved after multiple predictions of the same affective state. These findings emphasize the benefits of incorporating body features in continuous affect recognition for infants. Our work highlights the importance of evaluating variability in model performance both over time and in the presence of missing data when applying affect recognition to social interactions. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: 8 pages, 6 figures, 10th International Conference on Affective Computing and Intelligent Interaction (ACII 2022)

arXiv:2208.00344 [pdf, other]

Towards Intercultural Affect Recognition: Audio-Visual Affect Recognition in the Wild Across Six Cultures

Authors: Leena Mathur, Ralph Adolphs, Maja J Matarić

Abstract: In our multicultural world, affect-aware AI systems that support humans need the ability to perceive affect across variations in emotion expression patterns across cultures. These systems must perform well in cultural contexts without annotated affect datasets available for training models. A standard assumption in affective computing is that affect recognition models trained and used within the s… ▽ More In our multicultural world, affect-aware AI systems that support humans need the ability to perceive affect across variations in emotion expression patterns across cultures. These systems must perform well in cultural contexts without annotated affect datasets available for training models. A standard assumption in affective computing is that affect recognition models trained and used within the same culture (intracultural) will perform better than models trained on one culture and used on different cultures (intercultural). We test this assumption and present the first systematic study of intercultural affect recognition models using videos of real-world dyadic interactions from six cultures. We develop an attention-based feature selection approach under temporal causal discovery to identify behavioral cues that can be leveraged in intercultural affect recognition models. Across all six cultures, our findings demonstrate that intercultural affect recognition models were as effective or more effective than intracultural models. We identify and contribute useful behavioral features for intercultural affect recognition; facial features from the visual modality were more useful than the audio modality in this study's context. Our paper presents a proof-of-concept and motivation for the future development of intercultural affect recognition systems, especially those deployed in low-resource situations without annotated data. △ Less

Submitted 31 October, 2022; v1 submitted 30 July, 2022; originally announced August 2022.

Comments: Accepted at IEEE International Conference on Automatic Face and Gesture Recognition (FG 2023), publication and presentation at refereed IEEE workshop

arXiv:2205.06747 [pdf, other]

Augmented Reality Appendages for Robots: Design Considerations and Recommendations for Maximizing Social and Functional Perception

Authors: Ipek Goktan, Karen Ly, Thomas R. Groechel, Maja J. Mataric

Abstract: In order to address the limitations of gestural capabilities in physical robots, researchers in Virtual, Augmented, Mixed Reality Human-Robot Interaction (VAM-HRI) have been using augmented-reality visualizations that increase robot expressivity and improve user perception (e.g., social presence). While a multitude of virtual robot deictic gestures (e.g., pointing to an object) have been implement… ▽ More In order to address the limitations of gestural capabilities in physical robots, researchers in Virtual, Augmented, Mixed Reality Human-Robot Interaction (VAM-HRI) have been using augmented-reality visualizations that increase robot expressivity and improve user perception (e.g., social presence). While a multitude of virtual robot deictic gestures (e.g., pointing to an object) have been implemented to improve interactions within VAM-HRI, such systems are often reported to have tradeoffs between functional and social user perceptions of robots, creating a need for a unified approach that considers both attributes. We performed a literature analysis that selected factors that were noted to significantly influence either user perception or task efficiency and propose a set of design considerations and recommendations that address those factors by combining anthropomorphic and non-anthropomorphic virtual gestures based on the motivation of the interaction, visibility of the target and robot, salience of the target, and distance between the target and robot. The proposed recommendations provide the VAM-HRI community with starting points for selecting appropriate gesture types for a multitude of interaction contexts. △ Less

Submitted 13 May, 2022; originally announced May 2022.

Comments: In Refereed ACM/IEEE International Conference on Human Robot Interaction (HRI) Workshop on Virtual, Augmented, and Mixed Reality for Human-Robot Interaction (VAM-HRI 2022)

arXiv:2201.09114 [pdf, other]

What and How Are We Reporting in HRI? A Review and Recommendations for Reporting Recruitment, Compensation, and Gender

Authors: Julia R. Cordero, Thomas R. Groechel, Maja J. Matarić

Abstract: Study reproducibility and generalizability of results to broadly inclusive populations is crucial in any research. Previous meta-analyses in HRI have focused on the consistency of reported information from papers in various categories. However, members of the HRI community have noted that much of the information needed for reproducible and generalizable studies is not found in published papers. We… ▽ More Study reproducibility and generalizability of results to broadly inclusive populations is crucial in any research. Previous meta-analyses in HRI have focused on the consistency of reported information from papers in various categories. However, members of the HRI community have noted that much of the information needed for reproducible and generalizable studies is not found in published papers. We address this issue by surveying the reported study metadata over the past three years (2019 through 2021) of the main proceedings of the International Conference on Human-Robot Interaction (HRI) as well as alt.HRI. Based on the analysis results, we propose a set of recommendations for the HRI community that follow the longer-standing reporting guidelines from human-computer interaction (HCI), psychology, and other fields most related to HRI. Finally, we examine three key areas for user study reproducibility: recruitment details, participant compensation, and participant gender. We find a lack of reporting within each of these study metadata categories: of the 236 studies, 139 studies failed to report recruitment method, 118 studies failed to report compensation, and 62 studies failed to report gender data. This analysis therefore provides guidance about specific types of needed reporting improvements for HRI. △ Less

Submitted 3 March, 2022; v1 submitted 22 January, 2022; originally announced January 2022.

Comments: 7 pages, 5 figures, 1 table

arXiv:2108.10456 [pdf, other]

doi 10.1145/3462244.3479893

Long-Term, in-the-Wild Study of Feedback about Speech Intelligibility for K-12 Students Attending Class via a Telepresence Robot

Authors: Matthew Rueben, Mohammad Syed, Emily London, Mark Camarena, Eunsook Shin, Yulun Zhang, Timothy S. Wang, Thomas R. Groechel, Rhianna Lee, Maja J. Matarić

Abstract: Telepresence robots offer presence, embodiment, and mobility to remote users, making them promising options for homebound K-12 students. It is difficult, however, for robot operators to know how well they are being heard in remote and noisy classroom environments. One solution is to estimate the operator's speech intelligibility to their listeners in order to provide feedback about it to the opera… ▽ More Telepresence robots offer presence, embodiment, and mobility to remote users, making them promising options for homebound K-12 students. It is difficult, however, for robot operators to know how well they are being heard in remote and noisy classroom environments. One solution is to estimate the operator's speech intelligibility to their listeners in order to provide feedback about it to the operator. This work contributes the first evaluation of a speech intelligibility feedback system for homebound K-12 students attending class remotely. In our four long-term, in-the-wild deployments we found that students speak at different volumes instead of adjusting the robot's volume, and that detailed audio calibration and network latency feedback are needed. We also contribute the first findings about the types and frequencies of multimodal comprehension cues given to homebound students by listeners in the classroom. By annotating and categorizing over 700 cues, we found that the most common cue modalities were conversation turn timing and verbal content. Conversation turn timing cues occurred more frequently overall, whereas verbal content cues contained more information and might be the most frequent modality for negative cues. Our work provides recommendations for telepresence systems that could intervene to ensure that remote users are being heard. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Journal ref: Proceedings of the 2021 International Conference on Multimodal Interaction (ICMI '21), October 18-22, 2021, Montreal, QC, Canada. ACM, New York, NY, USA, 10 pages

arXiv:2108.07897 [pdf, other]

Affect-Aware Deep Belief Network Representations for Multimodal Unsupervised Deception Detection

Authors: Leena Mathur, Maja J Matarić

Abstract: Automated systems that detect the social behavior of deception can enhance human well-being across medical, social work, and legal domains. Labeled datasets to train supervised deception detection models can rarely be collected for real-world, high-stakes contexts. To address this challenge, we propose the first unsupervised approach for detecting real-world, high-stakes deception in videos withou… ▽ More Automated systems that detect the social behavior of deception can enhance human well-being across medical, social work, and legal domains. Labeled datasets to train supervised deception detection models can rarely be collected for real-world, high-stakes contexts. To address this challenge, we propose the first unsupervised approach for detecting real-world, high-stakes deception in videos without requiring labels. This paper presents our novel approach for affect-aware unsupervised Deep Belief Networks (DBN) to learn discriminative representations of deceptive and truthful behavior. Drawing on psychology theories that link affect and deception, we experimented with unimodal and multimodal DBN-based approaches trained on facial valence, facial arousal, audio, and visual features. In addition to using facial affect as a feature on which DBN models are trained, we also introduce a DBN training procedure that uses facial affect as an aligner of audio-visual representations. We conducted classification experiments with unsupervised Gaussian Mixture Model clustering to evaluate our approaches. Our best unsupervised approach (trained on facial valence and visual features) achieved an AUC of 80%, outperforming human ability and performing comparably to fully-supervised models. Our results motivate future work on unsupervised, affect-aware computational approaches for detecting deception and other social behaviors in the wild. △ Less

Submitted 8 November, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

arXiv:2107.14345 [pdf, other]

Modeling User Empathy Elicited by a Robot Storyteller

Authors: Leena Mathur, Micol Spitale, Hao Xi, Jieyun Li, Maja J Matarić

Abstract: Virtual and robotic agents capable of perceiving human empathy have the potential to participate in engaging and meaningful human-machine interactions that support human well-being. Prior research in computational empathy has focused on designing empathic agents that use verbal and nonverbal behaviors to simulate empathy and attempt to elicit empathic responses from humans. The challenge of develo… ▽ More Virtual and robotic agents capable of perceiving human empathy have the potential to participate in engaging and meaningful human-machine interactions that support human well-being. Prior research in computational empathy has focused on designing empathic agents that use verbal and nonverbal behaviors to simulate empathy and attempt to elicit empathic responses from humans. The challenge of developing agents with the ability to automatically perceive elicited empathy in humans remains largely unexplored. Our paper presents the first approach to modeling user empathy elicited during interactions with a robotic agent. We collected a new dataset from the novel interaction context of participants listening to a robot storyteller (46 participants, 6.9 hours of video). After each storytelling interaction, participants answered a questionnaire that assessed their level of elicited empathy during the interaction with the robot. We conducted experiments with 8 classical machine learning models and 2 deep learning models (long short-term memory networks and temporal convolutional networks) to detect empathy by leveraging patterns in participants' visual behaviors while they were listening to the robot storyteller. Our highest-performing approach, based on XGBoost, achieved an accuracy of 69% and AUC of 72% when detecting empathy in videos. We contribute insights regarding modeling approaches and visual features for automated empathy detection. Our research informs and motivates future development of empathy perception models that can be leveraged by virtual and robotic agents during human-machine interactions. △ Less

Submitted 29 July, 2021; originally announced July 2021.

Comments: Accepted for publication and oral presentation at the International Conference on Affective Computing and Intelligent Interaction (ACII 2021)

arXiv:2103.15256 [pdf, other]

Personalized Affect-Aware Socially Assistive Robot Tutors Aimed at Fostering Social Grit in Children with Autism

Authors: Zhonghao Shi, Manwei Cao, Sophia Pei, Xiaoyang Qiao, Thomas R Groechel, Maja J Matarić

Abstract: Affect-aware socially assistive robotics (SAR) tutors have great potential to augment and democratize professional therapeutic interventions for children with autism spectrum disorders (ASD) from different socioeconomic backgrounds. However, the majority of research on SAR for ASD has been on teaching cognitive and/or social skills, not on addressing users' emotional needs for real-world social si… ▽ More Affect-aware socially assistive robotics (SAR) tutors have great potential to augment and democratize professional therapeutic interventions for children with autism spectrum disorders (ASD) from different socioeconomic backgrounds. However, the majority of research on SAR for ASD has been on teaching cognitive and/or social skills, not on addressing users' emotional needs for real-world social situations. To bridge that gap, this work aims to develop personalized affect-aware SAR tutors to help alleviate social anxiety and foster social grit-the growth mindset for social skill development-in children with ASD. We propose a novel paradigm to incorporate clinically validated Acceptance and Commitment Training (ACT) with personalized SAR interventions. This work paves the way toward developing personalized affect-aware SAR interventions to support the unique and diverse socio-emotional needs and challenges of children with ASD. △ Less

Submitted 28 March, 2021; originally announced March 2021.

Comments: Accepted to ACM/IEEE International Conference on Human-Robot Interaction Workshop on Child-Robot Interaction and Child's Fundamental Rights

arXiv:2103.03079 [pdf, other]

Toward Automated Generation of Affective Gestures from Text:A Theory-Driven Approach

Authors: Micol Spitale, Maja J Matarić

Abstract: Communication in both human-human and human-robot interac-tion (HRI) contexts consists of verbal (speech-based) and non-verbal(facial expressions, eye gaze, gesture, body pose, etc.) components.The verbal component contains semantic and affective information;accordingly, HRI work on the gesture component so far has focusedon rule-based (mapping words to gestures) and data-driven (deep-learning) ap… ▽ More Communication in both human-human and human-robot interac-tion (HRI) contexts consists of verbal (speech-based) and non-verbal(facial expressions, eye gaze, gesture, body pose, etc.) components.The verbal component contains semantic and affective information;accordingly, HRI work on the gesture component so far has focusedon rule-based (mapping words to gestures) and data-driven (deep-learning) approaches to generating speech-paired gestures basedon either semantics or the affective state. Consequently, most ges-ture systems are confined to producing either semantically-linkedor affect-based gesticures. This paper introduces an approach forenabling human-robot communication based on a theory-drivenapproach to generate speech-paired robot gestures using both se-mantic and affective information. Our model takes as input textand sentiment analysis, and generates robot gestures in terms oftheir shape, intensity, and speed. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2102.03673 [pdf, other]

doi 10.1109/ICASSP39728.2021.9413550

Unsupervised Audio-Visual Subspace Alignment for High-Stakes Deception Detection

Authors: Leena Mathur, Maja J Matarić

Abstract: Automated systems that detect deception in high-stakes situations can enhance societal well-being across medical, social work, and legal domains. Existing models for detecting high-stakes deception in videos have been supervised, but labeled datasets to train models can rarely be collected for most real-world applications. To address this problem, we propose the first multimodal unsupervised trans… ▽ More Automated systems that detect deception in high-stakes situations can enhance societal well-being across medical, social work, and legal domains. Existing models for detecting high-stakes deception in videos have been supervised, but labeled datasets to train models can rarely be collected for most real-world applications. To address this problem, we propose the first multimodal unsupervised transfer learning approach that detects real-world, high-stakes deception in videos without using high-stakes labels. Our subspace-alignment (SA) approach adapts audio-visual representations of deception in lab-controlled low-stakes scenarios to detect deception in real-world, high-stakes situations. Our best unsupervised SA models outperform models without SA, outperform human ability, and perform comparably to a number of existing supervised models. Our research demonstrates the potential for introducing subspace-based transfer learning to model high-stakes deception and other social behaviors in real-world contexts with a scarcity of labeled behavioral data. △ Less

Submitted 6 February, 2021; originally announced February 2021.

Comments: Accepted at ICASSP 2021 \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of copyrighted components of this work

arXiv:2101.10580 [pdf, other]

Toward Personalized Affect-Aware Socially Assistive Robot Tutors in Long-Term Interventions for Children with Autism

Authors: Zhonghao Shi, Thomas R Groechel, Shomik Jain, Kourtney Chima, Ognjen Rudovic, Maja J Matarić

Abstract: Affect-aware socially assistive robotics (SAR) has shown great potential for augmenting interventions for children with autism spectrum disorders (ASD). However, current SAR cannot yet perceive the unique and diverse set of atypical cognitive-affective behaviors from children with ASD in an automatic and personalized fashion in long-term (multi-session) real-world interactions. To bridge this gap,… ▽ More Affect-aware socially assistive robotics (SAR) has shown great potential for augmenting interventions for children with autism spectrum disorders (ASD). However, current SAR cannot yet perceive the unique and diverse set of atypical cognitive-affective behaviors from children with ASD in an automatic and personalized fashion in long-term (multi-session) real-world interactions. To bridge this gap, this work designed and validated personalized models of arousal and valence for children with ASD using a multi-session in-home dataset of SAR interventions. By training machine learning (ML) algorithms with supervised domain adaptation (s-DA), the personalized models were able to trade off between the limited individual data and the more abundant less personal data pooled from other study participants. We evaluated the effects of personalization on a long-term multimodal dataset consisting of 4 children with ASD with a total of 19 sessions, and derived inter-rater reliability (IR) scores for binary arousal (IR = 83%) and valence (IR = 81%) labels between human annotators. Our results show that personalized Gradient Boosted Decision Trees (XGBoost) models with s-DA outperformed two non-personalized individualized and generic model baselines not only on the weighted average of all sessions, but also statistically (p < .05) across individual sessions. This work paves the way for the development of personalized autonomous SAR systems tailored toward individuals with atypical cognitive-affective and socio-emotional needs. △ Less

Submitted 29 January, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

arXiv:2008.13369 [pdf]

doi 10.1145/3382507.3418864

Introducing Representations of Facial Affect in Automated Multimodal Deception Detection

Authors: Leena Mathur, Maja J Matarić

Abstract: Automated deception detection systems can enhance health, justice, and security in society by helping humans detect deceivers in high-stakes situations across medical and legal domains, among others. This paper presents a novel analysis of the discriminative power of dimensional representations of facial affect for automated deception detection, along with interpretable features from visual, vocal… ▽ More Automated deception detection systems can enhance health, justice, and security in society by helping humans detect deceivers in high-stakes situations across medical and legal domains, among others. This paper presents a novel analysis of the discriminative power of dimensional representations of facial affect for automated deception detection, along with interpretable features from visual, vocal, and verbal modalities. We used a video dataset of people communicating truthfully or deceptively in real-world, high-stakes courtroom situations. We leveraged recent advances in automated emotion recognition in-the-wild by implementing a state-of-the-art deep neural network trained on the Aff-Wild database to extract continuous representations of facial valence and facial arousal from speakers. We experimented with unimodal Support Vector Machines (SVM) and SVM-based multimodal fusion methods to identify effective features, modalities, and modeling approaches for detecting deception. Unimodal models trained on facial affect achieved an AUC of 80%, and facial affect contributed towards the highest-performing multimodal approach (adaptive boosting) that achieved an AUC of 91% when tested on speakers who were not part of training sets. This approach achieved a higher AUC than existing automated machine learning approaches that used interpretable visual, vocal, and verbal features to detect deception in this dataset, but did not use facial affect. Across all videos, deceptive and truthful speakers exhibited significant differences in facial valence and facial arousal, contributing computational support to existing psychological theories on affect and deception. The demonstrated importance of facial affect in our models informs and motivates the future development of automated, affect-aware machine learning approaches for modeling and detecting deception and other social behaviors in-the-wild. △ Less

Submitted 31 August, 2020; originally announced August 2020.

Comments: 10 pages, Accepted at ACM International Conference on Multimodal Interaction (ICMI), October 2020

Journal ref: Proceedings of the 2020 International Conference on Multimodal Interaction (ICMI)

arXiv:2002.04671 [pdf, other]

Can I Trust You? A User Study of Robot Mediation of a Support Group

Authors: Chris Birmingham, Zijian Hu, Kartik Mahajan, Eli Reber, Maja J Mataric

Abstract: Socially assistive robots have the potential to improve group dynamics when interacting with groups of people in social settings. This work contributes to the understanding of those dynamics through a user study of trust dynamics in the novel context of a robot mediated support group. For this study, a novel framework for robot mediation of a support group was developed and validated. To evaluate… ▽ More Socially assistive robots have the potential to improve group dynamics when interacting with groups of people in social settings. This work contributes to the understanding of those dynamics through a user study of trust dynamics in the novel context of a robot mediated support group. For this study, a novel framework for robot mediation of a support group was developed and validated. To evaluate interpersonal trust in the multi-party setting, a dyadic trust scale was implemented and found to be uni-factorial, validating it as an appropriate measure of general trust. The results of this study demonstrate a significant increase in average interpersonal trust after the group interaction session, and qualitative post-session interview data report that participants found the interaction helpful and successfully supported and learned from one other. The results of the study validate that a robot-mediated support group can improve trust among strangers and allow them to share and receive support for their academic stress. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: 6 pages, 4 figures, accepted for publication in ICRA 2020

Journal ref: IEEE International Conference on Robotics and Automation (ICRA 2020)

arXiv:2002.02453 [pdf, other]

doi 10.1126/scirobotics.aaz3791

Modeling Engagement in Long-Term, In-Home Socially Assistive Robot Interventions for Children with Autism Spectrum Disorders

Authors: Shomik Jain, Balasubramanian Thiagarajan, Zhonghao Shi, Caitlyn Clabaugh, Maja J. Matarić

Abstract: Socially assistive robotics (SAR) has great potential to provide accessible, affordable, and personalized therapeutic interventions for children with autism spectrum disorders (ASD). However, human-robot interaction (HRI) methods are still limited in their ability to autonomously recognize and respond to behavioral cues, especially in atypical users and everyday settings. This work applies supervi… ▽ More Socially assistive robotics (SAR) has great potential to provide accessible, affordable, and personalized therapeutic interventions for children with autism spectrum disorders (ASD). However, human-robot interaction (HRI) methods are still limited in their ability to autonomously recognize and respond to behavioral cues, especially in atypical users and everyday settings. This work applies supervised machine learning algorithms to model user engagement in the context of long-term, in-home SAR interventions for children with ASD. Specifically, we present two types of engagement models for each user: (i) generalized models trained on data from different users; and (ii) individualized models trained on an early subset of the user's data. The models achieved approximately 90% accuracy (AUROC) for post hoc binary classification of engagement, despite the high variance in data observed across users, sessions, and engagement states. Moreover, temporal patterns in model predictions could be used to reliably initiate re-engagement actions at appropriate times. These results validate the feasibility and challenges of recognition and response to user disengagement in long-term, real-world HRI settings. The contributions of this work also inform the design of engaging and personalized HRI, especially for the ASD community. △ Less

Submitted 10 April, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: This manuscript was published in Science Robotics on February 26, 2020

ACM Class: I.2.9

Journal ref: Sci. Robot. 5, eaaz3791 (2020)

arXiv:2001.09981 [pdf, other]

Designing a Socially Assistive Robot for Long-Term In-Home Use for Children with Autism Spectrum Disorders

Authors: Roxanna Pakkar, Caitlyn Clabaugh, Rhianna Lee, Eric Deng, Maja J Mataric

Abstract: Socially assistive robotics (SAR) research has shown great potential for supplementing and augmenting therapy for children with autism spectrum disorders (ASD). However, the vast majority of SAR research has been limited to short-term studies in highly controlled environments. The design and development of a SAR system capable of interacting autonomously {\it in situ} for long periods of time invo… ▽ More Socially assistive robotics (SAR) research has shown great potential for supplementing and augmenting therapy for children with autism spectrum disorders (ASD). However, the vast majority of SAR research has been limited to short-term studies in highly controlled environments. The design and development of a SAR system capable of interacting autonomously {\it in situ} for long periods of time involves many engineering and computing challenges. This paper presents the design of a fully autonomous SAR system for long-term, in-home use with children with ASD. We address design decisions based on robustness and adaptability needs, discuss the development of the robot's character and interactions, and provide insights from the month-long, in-home data collections with children with ASD. This work contributes to a larger research program that is exploring how SAR can be used for enhancing the social and cognitive development of children with ASD. △ Less

Submitted 21 January, 2020; originally announced January 2020.

Comments: 7 pages, 9 figures, In 2019 IEEE International Symposium on Robot and Human Interactactive Communication (RO-MAN '19), New Delhi, India, Oct-2019

arXiv:1911.09713 [pdf, other]

Using Socially Expressive Mixed Reality Arms for Enhancing Low-Expressivity Robots

Authors: Thomas R. Groechel, Zhonghao Shi, Roxanna Pakkar, Maja J. Matarić

Abstract: Expressivity--the use of multiple modalities to convey internal state and intent of a robot--is critical for interaction. Yet, due to cost, safety, and other constraints, many robots lack high degrees of physical expressivity. This paper explores using mixed reality to enhance a robot with limited expressivity by adding virtual arms that extend the robot's expressiveness. The arms, capable of a ra… ▽ More Expressivity--the use of multiple modalities to convey internal state and intent of a robot--is critical for interaction. Yet, due to cost, safety, and other constraints, many robots lack high degrees of physical expressivity. This paper explores using mixed reality to enhance a robot with limited expressivity by adding virtual arms that extend the robot's expressiveness. The arms, capable of a range of non-physically-constrained gestures, were evaluated in a between-subject study ($n=34$) where participants engaged in a mixed reality mathematics task with a socially assistive robot. The study results indicate that the virtual arms added a higher degree of perceived emotion, helpfulness, and physical presence to the robot. Users who reported a higher perceived physical presence also found the robot to have a higher degree of social presence, ease of use, usefulness, and had a positive attitude toward using the robot with mixed reality. The results also demonstrate the users' ability to distinguish the virtual gestures' valence and intent. △ Less

Submitted 30 November, 2019; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: 8 pages, 9 figures, 3 tables, In 2019 IEEE International Symposium on Robot and Human Interactactive Communication (RO-MAN '19), New Delhi, India, Oct-2019

arXiv:cs/0604111 [pdf, ps, other]

Analysis of Dynamic Task Allocation in Multi-Robot Systems

Authors: Kristina Lerman, Chris Jones, Aram Galstyan, Maja J Mataric

Abstract: Dynamic task allocation is an essential requirement for multi-robot systems operating in unknown dynamic environments. It allows robots to change their behavior in response to environmental changes or actions of other robots in order to improve overall system performance. Emergent coordination algorithms for task allocation that use only local sensing and no direct communication between robots a… ▽ More Dynamic task allocation is an essential requirement for multi-robot systems operating in unknown dynamic environments. It allows robots to change their behavior in response to environmental changes or actions of other robots in order to improve overall system performance. Emergent coordination algorithms for task allocation that use only local sensing and no direct communication between robots are attractive because they are robust and scalable. However, a lack of formal analysis tools makes emergent coordination algorithms difficult to design. In this paper we present a mathematical model of a general dynamic task allocation mechanism. Robots using this mechanism have to choose between two types of task, and the goal is to achieve a desired task division in the absence of explicit communication and global knowledge. Robots estimate the state of the environment from repeated local observations and decide which task to choose based on these observations. We model the robots and observations as stochastic processes and study the dynamics of the collective behavior. Specifically, we analyze the effect that the number of observations and the choice of the decision function have on the performance of the system. The mathematical models are validated in a multi-robot multi-foraging scenario. The model's predictions agree very closely with experimental results from sensor-based simulations. △ Less

Submitted 27 April, 2006; originally announced April 2006.

Comments: Preprint version of the paper published in International Journal of Robotics, March 2006, Volume 25, pp. 225-242

Showing 1–26 of 26 results for author: Matarić, M J