TRI-POSE-Net: Adaptive 3D human pose estimation through selective kernel networks and self-supervision with trifocal tensors

Nabeel Ahmed Khan; Aisha Ahmed Alarfaj; Ebtisam Abdullah Alabdulqader; Nuha Zamzami; Muhammad Umer; Nisreen Innab; Tai-Hoon Kim

doi:10.1371/journal.pone.0310831

TRI-POSE-Net: Adaptive 3D human pose estimation through selective kernel networks and self-supervision with trifocal tensors

PLoS One. 2024 Dec 5;19(12):e0310831. doi: 10.1371/journal.pone.0310831. eCollection 2024.

Authors

Nabeel Ahmed Khan¹, Aisha Ahmed Alarfaj², Ebtisam Abdullah Alabdulqader³, Nuha Zamzami⁴, Muhammad Umer⁵, Nisreen Innab⁶, Tai-Hoon Kim⁷

Affiliations

¹ Center For AI and Big Data, Namal University, Mianwali, Pakistan.
² Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
³ Department of Information Technology, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.
⁴ Department of Computer Science and Artificial Intelligence, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia.
⁵ Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan.
⁶ Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Diriyah, Riyadh, Saudi Arabia.
⁷ School of Electrical and Computer Engineering, Yeosu Campus, Chonnam National University, Yeosu-si, Jeollanam-do, Republic of Korea.

Abstract

Accurate and flexible 3D pose estimation for virtual entities is a strenuous task in computer vision applications. Conventional methods struggle to capture realistic movements; thus, creative solutions that can handle the complexities of genuine avatar interactions in dynamic virtual environments are imperative. In order to tackle the problem of precise 3D pose estimation, this work introduces TRI-POSE-Net, a model intended for scenarios with limited supervision. The proposed technique, which is based on ResNet-50 and includes integrated Selective Kernel Network (SKNet) blocks, has proven to be efficient for feature extraction customised specifically to pose estimation scenarios. Furthermore, trifocal tensors and their trio-view geometry allow us to generate 3D ground truth poses from 2D poses, resulting in more refined triangulations. Through the proposed approach, the 3D poses can be estimated from a single 2D RGB image. Moreover, the proposed approach was evaluated on the HumanEva-I dataset yielding a Mean-Per-Joint-Position-Error (MPJPE) of 47.6 under self-supervision and an MPJPE of 29.9 under full supervision. In comparison with the other works, the proposed work has performed well in the self-supervision paradigm.

Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

MeSH terms

Algorithms
Humans
Imaging, Three-Dimensional* / methods
Neural Networks, Computer
Posture / physiology

Grants and funding

This study was supported by Princess Nourah Bint Abdulrahman University Researchers Supporting Project number (PNURSP2024R348), Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia. Nisreen Innab would like to express sincere gratitude to AlMaarefa University, Riyadh, Saudi Arabia, for supporting this research.