With the rapid spreading of in-vehicle information systems such as smartphones, navigation systems, and radios, the number of traffic accidents caused by driver distractions shows an increasing trend. Timely identification and warning are deemed to be crucial for distracted driving and the establishment of driver assistance systems is of great value. However, almost all research on the recognition of the driver's distracted actions using computer vision methods neglected the importance of temporal information for action recognition. This paper proposes a hybrid deep learning model for recognizing the actions of distracted drivers. Specifically, we used OpenPose to obtain skeleton information of the human body and then constructed the vector angle and modulus ratio of the human body structure as features to describe the driver's actions, thereby realizing the fusion of deep network features and artificial features, which improve the information density of spatial features. The K-means clustering algorithm was used to preselect the original frames, and the method of inter-frame comparison was used to obtain the final keyframe sequence by comparing the Euclidean distance between manually constructed vectors representing frames and the vector representing the cluster center. Finally, we constructed a two-layer long short-term memory neural network to obtain more effective spatiotemporal features, and one softmax layer to identify the distracted driver's action. The experimental results based on the collected dataset prove the effectiveness of this framework, and it can provide a theoretical basis for the establishment of vehicle distraction warning systems.
Keywords: LSTM; OpenPose; action recognition; driver distraction; keyframe sequences; nested cross-validation.