Using deep learning, 3D autonomous driving semantic segmentation has become a well-studied subject, with methods that can reach very high performance. Nonetheless, because of the limited size of the training datasets, these models cannot see every type of object and scene found in real-world applications. The ability to be reliable in these various unknown environments is called domain generalization. Despite its importance, domain generalization is relatively unexplored in the case of 3D autonomous driving semantic segmentation. To fill this gap, this paper presents the first benchmark for this application by testing state-of-the-art methods and discussing the difficulty of tackling Laser Imaging Detection and Ranging (LiDAR) domain shifts. We also propose the first method designed to address this domain generalization, which we call 3DLabelProp. This method relies on leveraging the geometry and sequentiality of the LiDAR data to enhance its generalization performances by working on partially accumulated point clouds. It reaches a mean Intersection over Union (mIoU) of 50.4% on SemanticPOSS and of 55.2% on PandaSet solid-state LiDAR while being trained only on SemanticKITTI, making it the state-of-the-art method for generalization (+5% and +33% better, respectively, than the second best method). The code for this method will be available on GitHub.
翻译:利用深度学习,三维自动驾驶语义分割已成为一个研究热点,这些方法可以达到非常高的性能。然而,由于训练数据集的有限大小,这些模型无法看到现实应用中的每种物体和场景。在这些不同的未知环境下保持可靠性的能力称为领域通用性。尽管这一点很重要,但在三维自动驾驶语义分割中,领域通用化相对未受到研究。为填补这一空白,本文通过测试最先进的方法并讨论解决激光成像探测与测距(LIDAR)领域移位的难度,提出了这种应用的第一个基准。我们还提出了首个设计来解决这种领域一般化的方法,称为3DLabelProp。该方法依赖于提高LiDAR数据的一般化性能,通过对部分累积点云进行处理,利用LiDAR数据的几何和顺序性。它在SemanticPOSS和PandaSet固态LiDAR上分别达到了50.4%和55.2%的平均交并比(mIoU),而仅在SemanticKITTI上进行了训练,这使它成为通用化的最先进方法(分别比排名第二的方法高出5%和33%)。此方法的代码将在GitHub上提供。