Keypoints-Based Multi-Cue Feature Fusion Network (MF-Net) for Action Recognition of ADHD Children in TOVA Assessment

Wanyu Tang; Chao Shi; Yuanyuan Li; Zhonglan Tang; Gang Yang; Jing Zhang; Ling He

doi:10.3390/bioengineering11121210

Keypoints-Based Multi-Cue Feature Fusion Network (MF-Net) for Action Recognition of ADHD Children in TOVA Assessment

Bioengineering (Basel). 2024 Nov 29;11(12):1210. doi: 10.3390/bioengineering11121210.

Authors

Wanyu Tang¹, Chao Shi¹, Yuanyuan Li², Zhonglan Tang¹, Gang Yang¹, Jing Zhang¹, Ling He¹

Affiliations

¹ College of Biomedical Engineering, Sichuan University, Chengdu 610065, China.
² Mental Health Center, West China School of Medicine, Sichuan University, Chengdu 610041, China.

PMID: 39768028
DOI: 10.3390/bioengineering11121210

Abstract

Attention deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder among children and adolescents. Behavioral detection and analysis play a crucial role in ADHD diagnosis and assessment by objectively quantifying hyperactivity and impulsivity symptoms. Existing video-based action recognition algorithms focus on object or interpersonal interactions, they may overlook ADHD-specific behaviors. Current keypoints-based algorithms, although effective in attenuating environmental interference, struggle to accurately model the sudden and irregular movements characteristic of ADHD children. This work proposes a novel keypoints-based system, the Multi-cue Feature Fusion Network (MF-Net), for recognizing actions and behaviors of children with ADHD during the Test of Variables of Attention (TOVA). The system aims to assess ADHD symptoms as described in the DSM-V by extracting features from human body and facial keypoints. For human body keypoints, we introduce the Multi-scale Features and Frame-Attention Adaptive Graph Convolutional Network (MSF-AGCN) to extract irregular and impulsive motion features. For facial keypoints, we transform data into images and employ MobileVitv2 for transfer learning to capture facial and head movement features. Ultimately, a feature fusion module is designed to fuse the features from both branches, yielding the final action category prediction. The system, evaluated on 3801 video samples of ADHD children, achieves 90.6% top-1 accuracy and 97.6% top-2 accuracy across six action categories. Additional validation experiments on public datasets NW-UCLA, NTU-2D, and AFEW-VA verify the network's performance.

Keywords: attention deficit hyperactivity disorder; graph neural network; keypoints-based action recognition; multi-cue feature fusion.

Abstract

Grants and funding