Attention deficit hyperactivity disorder (ADHD) is a prevalent neurodevelopmental disorder among children and adolescents. Behavioral detection and analysis play a crucial role in ADHD diagnosis and assessment by objectively quantifying hyperactivity and impulsivity symptoms. Existing video-based action recognition algorithms focus on object or interpersonal interactions, they may overlook ADHD-specific behaviors. Current keypoints-based algorithms, although effective in attenuating environmental interference, struggle to accurately model the sudden and irregular movements characteristic of ADHD children. This work proposes a novel keypoints-based system, the Multi-cue Feature Fusion Network (MF-Net), for recognizing actions and behaviors of children with ADHD during the Test of Variables of Attention (TOVA). The system aims to assess ADHD symptoms as described in the DSM-V by extracting features from human body and facial keypoints. For human body keypoints, we introduce the Multi-scale Features and Frame-Attention Adaptive Graph Convolutional Network (MSF-AGCN) to extract irregular and impulsive motion features. For facial keypoints, we transform data into images and employ MobileVitv2 for transfer learning to capture facial and head movement features. Ultimately, a feature fusion module is designed to fuse the features from both branches, yielding the final action category prediction. The system, evaluated on 3801 video samples of ADHD children, achieves 90.6% top-1 accuracy and 97.6% top-2 accuracy across six action categories. Additional validation experiments on public datasets NW-UCLA, NTU-2D, and AFEW-VA verify the network's performance.
Keywords: attention deficit hyperactivity disorder; graph neural network; keypoints-based action recognition; multi-cue feature fusion.