High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. We present MapTR, a structured end-to-end Transformer for efficient online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, i.e., modeling map element as a point set with a group of equivalent permutations, which accurately describes the shape of map element and stabilizes the learning process. We design a hierarchical query embedding scheme to flexibly encode structured map information and perform hierarchical bipartite matching for map element learning. MapTR achieves the best performance and efficiency with only camera input among existing vectorized map construction approaches on nuScenes dataset. In particular, MapTR-nano runs at real-time inference speed ($25.1$ FPS) on RTX 3090, $8\times$ faster than the existing state-of-the-art camera-based method while achieving $5.0$ higher mAP. Even compared with the existing state-of-the-art multi-modality method, MapTR-nano achieves $0.7$ higher mAP, and MapTR-tiny achieves $13.5$ higher mAP and $3\times$ faster inference speed. Abundant qualitative results show that MapTR maintains stable and robust map construction quality in complex and various driving scenes. MapTR is of great application value in autonomous driving. Code and more demos are available at \url{https://github.com/hustvl/MapTR}.
翻译:高清晰度(HD)地图为驱动场提供了丰富和准确的环境信息,作为自主驱动系统规划的基本和不可或缺的组成部分。我们提供MapTR,这是一个结构化的端到端转换器,用于高效的在线矢量制HD地图构建。我们建议采用统一的平整等建模方法,即将地图元素建模成一个点,配有一组相当的平整,准确描述地图元素的形状并稳定学习过程。我们设计了一个等级查询嵌入计划,以灵活编码结构化的地图信息,并进行分级双方匹配,用于地图元素学习。MPapTR在NuScenes数据集的现有矢量制地图构建方法中,只有摄像头投入,才能取得最佳的性能和效率。特别是,在RTX 3090 上,以实时的平整流速度运行(25.1美元的FPS),比现有最先进的摄像制方法快8美元,同时达到5.0美元的 mAAP。即使与目前先进的驱动力多清晰度-TR$的多清晰度计算方法相比, 也能够快速地显示和快速的地图-MA-AP-