Accurate and flexible 3D pose estimation for virtual entities is a strenuous task in computer vision applications. Conventional methods struggle to capture realistic movements; thus, creative solutions that can handle the complexities of genuine avatar interactions in dynamic virtual environments are imperative. In order to tackle the problem of precise 3D pose estimation, this work introduces TRI-POSE-Net, a model intended for scenarios with limited supervision. The proposed technique, which is based on ResNet-50 and includes integrated Selective Kernel Network (SKNet) blocks, has proven to be efficient for feature extraction customised specifically to pose estimation scenarios. Furthermore, trifocal tensors and their trio-view geometry allow us to generate 3D ground truth poses from 2D poses, resulting in more refined triangulations. Through the proposed approach, the 3D poses can be estimated from a single 2D RGB image. Moreover, the proposed approach was evaluated on the HumanEva-I dataset yielding a Mean-Per-Joint-Position-Error (MPJPE) of 47.6 under self-supervision and an MPJPE of 29.9 under full supervision. In comparison with the other works, the proposed work has performed well in the self-supervision paradigm.
Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.