Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the 3D motion of a scene from its consecutive observations. Recently, there have been efforts to compute the scene flow from 3D point clouds. A common approach is to train a regression model that consumes source and target point clouds and outputs the per-point translation vector. An alternative is to learn point matches between the point clouds concurrently with regressing a refinement of the initial correspondence flow. In both cases, the learning task is very challenging since the flow regression is done in the free 3D space, and a typical solution is to resort to a large annotated synthetic dataset. We introduce SCOOP, a new method for scene flow estimation that can be learned on a small amount of data without employing ground-truth flow supervision. In contrast to previous work, we train a pure correspondence model focused on learning point feature representation and initialize the flow as the difference between a source point and its softly corresponding target point. Then, in the run-time phase, we directly optimize a flow refinement component with a self-supervised objective, which leads to a coherent and accurate flow field between the point clouds. Experiments on widespread datasets demonstrate the performance gains achieved by our method compared to existing leading techniques while using a fraction of the training data. Our code is publicly available at https://github.com/itailang/SCOOP.
翻译:场景流估计是计算机视觉领域的一个长期问题,其目标是从相邻的观测中找出场景的三维运动。最近,已经有许多努力从3D点云中计算场景流。一种常见方法是训练一个回归模型,该模型使用源点云和目标点云作为输入,输出每个点的平移向量。另一种方法是在同时学习点匹配和回归初始对应流细化的基础上计算点云之间的对应关系。在这两种情况下,学习任务是非常具有挑战性的,因为流回归是在自由的3D空间中完成的,而通常的解决方案是规避使用大量的合成数据集。我们引入了一个名为 SCOOP 的新方法,用于场景流估计,可以在不使用基础的流监督的情况下使用少量的数据进行学习。与以前的方法不同,我们训练一个纯粹的对应模型,专注于学习点特征表示,并将流初始化为源点与其柔性对应目标点之间的差。然后,在运行时阶段,我们直接优化具有自监督目标的流细化组件,这导致点云之间具有一致且准确的流场。在广泛的数据集上进行的实验表明了我们方法与现有领先技术相比所实现的性能提升,同时使用了一小部分训练数据。我们的代码可在 https://github.com/itailang/SCOOP 上公开获取。