MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

Z Yang, M Liu, J Xie, Y Zhang, C Shen, W Shao… - arXiv preprint arXiv …, 2024 - arxiv.org
Z Yang, M Liu, J Xie, Y Zhang, C Shen, W Shao, J Jiao, T Xing, R Hu, P Xu
arXiv preprint arXiv:2406.10125, 2024arxiv.org
Autonomous driving without high-definition (HD) maps demands a higher level of active
scene understanding. In this competition, the organizers provided the multi-perspective
camera images and standard-definition (SD) maps to explore the boundaries of scene
reasoning capabilities. We found that most existing algorithms construct Bird's Eye View
(BEV) features from these multi-perspective images and use multi-task heads to delineate
road centerlines, boundary lines, pedestrian crossings, and other areas. However, these …
Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and use multi-task heads to delineate road centerlines, boundary lines, pedestrian crossings, and other areas. However, these algorithms perform poorly at the far end of roads and struggle when the primary subject in the image is occluded. Therefore, in this competition, we not only used multi-perspective images as input but also incorporated SD maps to address this issue. We employed map encoder pre-training to enhance the network's geometric encoding capabilities and utilized YOLOX to improve traffic element detection precision. Additionally, for area detection, we innovatively introduced LDTR and auxiliary tasks to achieve higher precision. As a result, our final OLUS score is 0.58.
arxiv.org