MAEMOT: Pretrained MAE-Based Antiocclusion 3-D Multiobject Tracking for Autonomous Driving

IEEE Trans Neural Netw Learn Syst. 2024 Oct 31:PP. doi: 10.1109/TNNLS.2024.3480148. Online ahead of print.

Abstract

The existing 3-D multiobject tracking (MOT) methods suffer from object occlusion in real-world traffic scenes. However, previous works have faced challenges in providing a reasonable solution to the fundamental question: "How can the interference of the perception data loss caused by occlusion be overcome?" Therefore, this article attempts to provide a reasonable solution by developing a novel pretrained movement-constrained masked autoencoder (M-MAE) for an antiocclusion 3-D MOT called MAEMOT. Specifically, for the pretrained M-MAE, this article adopts an efficient multistage transformer (MST) encoder and a spatiotemporal-based motion decoder to predict and reconstruct occluded point cloud data, following the properties of object motion. Afterward, the well-trained M-MAE model extracts the global features of occluded objects, ensuring that the features of the intraobjects between interframes are as consistent as possible throughout the spatiotemporal sequence. Next, a proposal-based geometric graph aggregation (PG 2 A) module is utilized to extract and fuse the spatial features of each proposal, producing refined region-of-interest (RoI) components. Finally, this article designs an object association module that combines geometric and corner affinities, which helps to match the predicted occlusion objects more robustly. According to an extensive evaluation, the proposed MAEMOT method can effectively overcome the interference of occlusion and achieve improved 3-D MOT performance under challenging conditions.