TRI-POSE-Net: Adaptive 3D human pose estimation through selective kernel networks and self-supervision with trifocal tensors

PLoS One. 2024 Dec 5;19(12):e0310831. doi: 10.1371/journal.pone.0310831. eCollection 2024.

Abstract

Accurate and flexible 3D pose estimation for virtual entities is a strenuous task in computer vision applications. Conventional methods struggle to capture realistic movements; thus, creative solutions that can handle the complexities of genuine avatar interactions in dynamic virtual environments are imperative. In order to tackle the problem of precise 3D pose estimation, this work introduces TRI-POSE-Net, a model intended for scenarios with limited supervision. The proposed technique, which is based on ResNet-50 and includes integrated Selective Kernel Network (SKNet) blocks, has proven to be efficient for feature extraction customised specifically to pose estimation scenarios. Furthermore, trifocal tensors and their trio-view geometry allow us to generate 3D ground truth poses from 2D poses, resulting in more refined triangulations. Through the proposed approach, the 3D poses can be estimated from a single 2D RGB image. Moreover, the proposed approach was evaluated on the HumanEva-I dataset yielding a Mean-Per-Joint-Position-Error (MPJPE) of 47.6 under self-supervision and an MPJPE of 29.9 under full supervision. In comparison with the other works, the proposed work has performed well in the self-supervision paradigm.

MeSH terms

  • Algorithms
  • Humans
  • Imaging, Three-Dimensional* / methods
  • Neural Networks, Computer
  • Posture / physiology

Grants and funding

This study was supported by Princess Nourah Bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R348), Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia. Nisreen Innab would like to express sincere gratitude to AlMaarefa University, Riyadh, Saudi Arabia, for supporting this research.