Understanding and interpreting a 3d environment is a key challenge for autonomous vehicles. Semantic segmentation of 3d point clouds combines 3d information with semantics and thereby provides a valuable contribution to this task. In many real-world applications, point clouds are generated by lidar sensors in a consecutive fashion. Working with a time series instead of single and independent frames enables the exploitation of temporal information. We therefore propose a recurrent segmentation architecture (RNN), which takes a single range image frame as input and exploits recursively aggregated temporal information. An alignment strategy, which we call Temporal Memory Alignment, uses ego motion to temporally align the memory between consecutive frames in feature space. A Residual Network and ConvGRU are investigated for the memory update. We demonstrate the benefits of the presented approach on two large-scale datasets and compare it to several stateof-the-art methods. Our approach ranks first on the SemanticKITTI multiple scan benchmark and achieves state-of-the-art performance on the single scan benchmark. In addition, the evaluation shows that the exploitation of temporal information significantly improves segmentation results compared to a single frame approach.
翻译:3点云的语义分解将三点信息与语义学结合起来,从而对这项任务做出宝贵的贡献。在许多现实世界应用中,点云是由激光雷达传感器连续产生的。用时间序列而不是单一和独立的框架来利用时间信息。因此,我们提议一个经常性分解结构(RNN),以单一范围图像框架作为输入,利用循环汇总的时间信息。一个调和战略,我们称之为时间记忆调和,利用自我运动来将地貌空间连续框架之间的记忆时间对齐。一个残余网络和ConvGRU是用来调查记忆更新的。我们展示了在两个大型数据集上采用的方法的好处,并将其与若干最新方法进行比较。我们的方法首先排在SemanticKITTI多重扫描基准上,并在单一扫描基准上取得最新业绩。此外,评价显示,利用时间信息大大改进了与单一框架方法的分解结果。