Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation

Hexin Wang; Wei Quan; Runjing Zhao; Miaomiao Zhang; Na Jiang

doi:10.3390/s24134422

Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation

Sensors (Basel). 2024 Jul 8;24(13):4422. doi: 10.3390/s24134422.

Authors

Hexin Wang¹, Wei Quan¹, Runjing Zhao¹, Miaomiao Zhang¹, Na Jiang¹

Affiliation

¹ College of Information Engineering, Capital Normal University, Beijing 100048, China.

Abstract

Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.

Keywords: 3D human pose estimation; batch variance loss; dual-adaptive spatial-temporal model; one-more supervised training.

MeSH terms

Algorithms*
Humans
Imaging, Three-Dimensional* / methods
Posture / physiology
Robotics / methods

Abstract

MeSH terms

Grants and funding