In this work, we address the problem of unsupervised moving object segmentation (MOS) in 4D LiDAR data recorded from a stationary sensor, where no ground truth annotations are involved. Deep learning-based state-of-the-art methods for LiDAR MOS strongly depend on annotated ground truth data, which is expensive to obtain and scarce in existence. To close this gap in the stationary setting, we propose a novel 4D LiDAR representation based on multivariate time series that relaxes the problem of unsupervised MOS to a time series clustering problem. More specifically, we propose modeling the change in occupancy of a voxel by a multivariate occupancy time series (MOTS), which captures spatio-temporal occupancy changes on the voxel level and its surrounding neighborhood. To perform unsupervised MOS, we train a neural network in a self-supervised manner to encode MOTS into voxel-level feature representations, which can be partitioned by a clustering algorithm into moving or stationary. Experiments on stationary scenes from the Raw KITTI dataset show that our fully unsupervised approach achieves performance that is comparable to that of supervised state-of-the-art approaches.
翻译:在这项工作中,我们解决了4D LiDAR 固定传感器上记录的4D LiDAR 数据中不受监督移动物体分割(MOS)的问题,其中没有地面真相说明;LIDAR MOS的深学习最先进的方法在很大程度上取决于附加说明的地面真相数据,而这种数据是昂贵的,而且很少存在的。为了缩小固定环境的这一差距,我们提议了一个基于多变时间序列的新颖的4D LiDAR 代表制,它将不受监督的MOS 问题降为时间序列群集问题。更具体地说,我们建议用多变占用时间序列(MOTS)来模拟对 voxel 占用情况的变化,该序列将捕捉到 voxel 水平及其周围周围的 Spatio- 时空占用情况变化。为了进行不受监督的MOS,我们以自我监督的方式训练一个神经网络,将MOTS 编码为oxel 级特征演示,可以通过集集算算法进行分解为移动或固定状态。从Raw KITTI 的静止场的实验将显示我们没有监督的状态。