Cross-modal human pose estimation has a wide range of applications. Traditional image-based pose estimation will not work well in poor light or darkness. Therefore, some sensors such as LiDAR or Radio Frequency (RF) signals are now using to estimate human pose. However, it limits the application that these methods require much high-priced professional equipment. To address these challenges, we propose a new WiFi-based pose estimation method. Based on the Channel State Information (CSI) of WiFi, a novel architecture CSI-former is proposed to innovatively realize the integration of the multi-head attention in the WiFi-based pose estimation network. To evaluate the performance of CSI-former, we establish a span-new dataset Wi-Pose. This dataset consists of 5 GHz WiFi CSI, the corresponding images, and skeleton point annotations. The experimental results on Wi-Pose demonstrate that CSI-former can significantly improve the performance in wireless pose estimation and achieve more remarkable performance over traditional image-based pose estimation. To better benefit future research on the WiFi-based pose estimation, Wi-Pose has been made publicly available.
Keywords: CSI; WiFi; multi-head attention; pose estimation.