Semantic segmentation of point clouds in autonomous driving datasets requires techniques that can process large numbers of points over large field of views. Today, most deep networks designed for this task exploit 3D sparse convolutions to reduce memory and computational loads. The best methods then further exploit specificities of rotating lidar sampling patterns to further improve the performance, e.g., cylindrical voxels, or range images (for feature fusion from multiple point cloud representations). In contrast, we show that one can build a well-performing point-based backbone free of these specialized tools. This backbone, WaffleIron, relies heavily on generic MLPs and dense 2D convolutions, making it easy to implement, and contains just a few parameters easy to tune. Despite its simplicity, our experiments on SemanticKITTI and nuScenes show that WaffleIron competes with the best methods designed specifically for these autonomous driving datasets. Hence, WaffleIron is a strong, easy-to-implement, baseline for semantic segmentation of sparse outdoor point clouds.
翻译:自动驱动数据集中点云的语义分割需要能够处理大视野中大量点数的技术。 今天, 设计用于此任务的大多数深层网络都利用3D稀疏变异变来减少内存和计算负荷。 最佳方法随后进一步利用旋转利达尔采样模式的特殊性来进一步改进性能, 比如圆柱形圆球, 或者范围图像( 用于多点云表显示的特性融合 ) 。 相反, 我们显示, 一个人可以建立一个运行良好的点基干骨架, 没有这些专门工具。 这个骨干WaffleIron( WaffleIron) 严重依赖通用的 MLP 和稠密的 2D 共变迁, 使得它易于执行, 并包含一些易于调和的参数。 尽管它很简单, 我们在Smantic KITTI 和 nuScenes 上的实验显示, WafffleIron 与专门为这些自主驱动数据集设计的最佳方法竞争。 因此, WafffleIron 是一个强大、 容易执行的基线, 用于隐蔽的户外点云。