Scene flow represents the motion of points in the 3D space, which is the counterpart of the optical flow that represents the motion of pixels in the 2D image. However, it is difficult to obtain the ground truth of scene flow in the real scenes, and recent studies are based on synthetic data for training. Therefore, how to train a scene flow network with unsupervised methods based on real-world data shows crucial significance. A novel unsupervised learning method for scene flow is proposed in this paper, which utilizes the images of two consecutive frames taken by monocular camera without the ground truth of scene flow for training. Our method realizes the goal that training scene flow network with real-world data, which bridges the gap between training data and test data and broadens the scope of available data for training. Unsupervised learning of scene flow in this paper mainly consists of two parts: (i) depth estimation and camera pose estimation, and (ii) scene flow estimation based on four different loss functions. Depth estimation and camera pose estimation obtain the depth maps and camera pose between two consecutive frames, which provide further information for the next scene flow estimation. After that, we used depth consistency loss, dynamic-static consistency loss, Chamfer loss, and Laplacian regularization loss to carry out unsupervised training of the scene flow network. To our knowledge, this is the first paper that realizes the unsupervised learning of 3D scene flow from monocular camera. The experiment results on KITTI show that our method for unsupervised learning of scene flow meets great performance compared to traditional methods Iterative Closest Point (ICP) and Fast Global Registration (FGR). The source code is available at: https://github.com/IRMVLab/3DUnMonoFlow.
翻译:3D 空间的点流代表了3D 空间的点的移动。 3D 空间是光学流的图象,代表着 2D 图像像素的移动。 然而,很难在真实的场景中获得现场流的地面真相,而最近的研究是以培训的合成数据为基础的。 因此, 如何用基于真实世界数据、不受监督的方法对场流进行现场流网络培训, 显示了至关重要的意义。 本文提出了一种没有监督的场流学习方法。 本文使用了由单镜相机拍摄的连续两条框图像,没有用于培训的场景流的地面真相。 我们的方法是用真实的世界数据对场流网络进行培训, 缩小培训数据和测试数据之间的差距, 扩大培训范围。 本文中未经监督的场流学习主要包括两个部分:(一) 深度估算和相机进行估算,以及(二) 基于四种不同的损失功能的场流估算, 深度估算和相机进行深度估算, 两次连续的图和相机进行对比,为下一次的场景流测提供了进一步的信息。 之后, 我们使用了快速的流程流数据, 学习了这个深度损失的网络 。