Estimating local semantics from sensory inputs is a central component for high-definition map constructions in autonomous driving. However, traditional pipelines require a vast amount of human efforts and resources in annotating and maintaining the semantics in the map, which limits its scalability. In this paper, we introduce the problem of local semantic map learning, which dynamically constructs the vectorized semantics based on onboard sensor observations. Meanwhile, we introduce a local semantic map learning method, dubbed HDMapNet. HDMapNet encodes image features from surrounding cameras and/or point clouds from LiDAR, and predicts vectorized map elements in the bird's-eye view. We benchmark HDMapNet on nuScenes dataset and show that in all settings, it performs better than baseline methods. Of note, our fusion-based HDMapNet outperforms existing methods by more than 50% in all metrics. In addition, we develop semantic-level and instance-level metrics to evaluate the map learning performance. Finally, we showcase our method is capable of predicting a locally consistent map. By introducing the method and metrics, we invite the community to study this novel map learning problem. Code and evaluation kit will be released to facilitate future development.
翻译:从感官输入中估算本地语义学是自主驱动的高清晰地图构造的核心组成部分。然而,传统管道需要大量人力和资源来说明和维护地图中的语义,这限制了其可缩放性。在本文件中,我们引入了本地语义学学习问题,它动态地根据机载传感器观测构建了矢量性语义学。同时,我们引入了一种被称为HDMapNet的本地语义学地图学习方法。HDMapNet从周围摄像头和/或LIDAR的点云中加密图像特征,并预测鸟眼观的传动式地图元素。我们将HDMPNet基准于nuScenes数据集,并表明它在所有环境中都比基线方法要好。值得注意的是,我们基于聚变的HDMapNet在所有测量中将现有方法超出50%以上。此外,我们开发了语义级和实例级指标,以评价地图学习绩效。最后,我们展示了我们的方法能够预测本地的代码开发方法。我们通过引入了最新的地图学习方法来预测本地的版本。