Background and objectives: In the current global health landscape, there is an increasing demand for rapid and accurate assessment of mental states. Traditional assessment methods typically rely on face-to-face interactions, which are not only time-consuming but also highly subjective. Addressing this issue, this study aims to develop a client-server-based, non-contact multimodal emotion and behavior recognition system to enhance the efficiency and accuracy of mental state assessments.
Methods: This study designed and implemented a multimodal assessment system integrating voice, text, facial expressions, and body movements. Utilizing a client-server architecture, the system optimizes diagnostic efficiency and decision-making accuracy through an intuitive visual interface. The system's effectiveness was validated and tested in actual hospital settings.
Results: The system demonstrated exceptional performance in multimodal emotion and behavior recognition, achieving a voice recognition accuracy of 92.01%, facial expression recognition accuracy of 91.3%, and an overall multimodal assessment accuracy of 77.9%. Moreover, it reached a behavior analysis accuracy of 94.5%.
Conclusions: The multimodal assessment system developed in this study significantly enhances the accuracy and efficiency of mental state assessments, meeting the needs of clinicians for precise and rapid diagnostics in real-world settings.
Keywords: Contactless; End-to-end; Mental state assessment; Multimodal.
Copyright © 2024 Elsevier B.V. All rights reserved.