HandFI: Multilevel Interacting Hand Reconstruction Based on Multilevel Feature Fusion in RGB Images

Huimin Pan; Yuting Cai; Jiayi Yang; Shaojia Niu; Quanli Gao; Xihan Wang

doi:10.3390/s25010088

HandFI: Multilevel Interacting Hand Reconstruction Based on Multilevel Feature Fusion in RGB Images

Sensors (Basel). 2024 Dec 27;25(1):88. doi: 10.3390/s25010088.

Authors

Huimin Pan¹, Yuting Cai¹, Jiayi Yang¹, Shaojia Niu¹, Quanli Gao¹, Xihan Wang¹

Affiliation

¹ School of Computer Science, Xi'an Polytechnic University, Xi'an 710600, China.

Abstract

Interacting hand reconstruction presents significant opportunities in various applications. However, it currently faces challenges such as the difficulty in distinguishing the features of both hands, misalignment of hand meshes with input images, and modeling the complex spatial relationships between interacting hands. In this paper, we propose a multilevel feature fusion interactive network for hand reconstruction (HandFI). Within this network, the hand feature separation module utilizes attentional mechanisms and positional coding to distinguish between left-hand and right-hand features while maintaining the spatial relationship of the features. The hand fusion and attention module promotes the alignment of hand vertices with the image by integrating multi-scale hand features while introducing cross-attention to help determine the complex spatial relationships between interacting hands, thereby enhancing the accuracy of two-hand reconstruction. We evaluated our method with existing approaches using the InterHand 2.6M, RGB2Hands, and EgoHands datasets. Extensive experimental results demonstrated that our method outperformed other representative methods, with performance metrics of 9.38 mm for the MPJPE and 9.61 mm for the MPVPE. Additionally, the results obtained in real-world scenes further validated the generalization capability of our method.

Keywords: MANO; feature fusion; interacting hand reconstruction.

Abstract

Grants and funding