Real-scale scene flow estimation has become increasingly important for 3D computer vision. Some works successfully estimate real-scale 3D scene flow with LiDAR. However, these ubiquitous and expensive sensors are still unlikely to be equipped widely for real application. Other works use monocular images to estimate scene flow, but their scene flow estimations are normalized with scale ambiguity, where additional depth or point cloud ground truth are required to recover the real scale. Even though they perform well in 2D, these works do not provide accurate and reliable 3D estimates. We present a deep learning architecture on permutohedral lattice - MonoPLFlowNet. Different from all previous works, our MonoPLFlowNet is the first work where only two consecutive monocular images are used as input, while both depth and 3D scene flow are estimated in real scale. Our real-scale scene flow estimation outperforms all state-of-the-art monocular-image based works recovered to real scale by ground truth, and is comparable to LiDAR approaches. As a by-product, our real-scale depth estimation also outperforms other state-of-the-art works.
翻译:对 3D 计算机视野来说, 真实的场景流量估计已变得越来越重要 。 有些作品成功地估算了与 LiDAR 的 3D 场景流量 。 然而, 这些无处不在且昂贵的传感器仍不可能被广泛安装用于实际应用 。 其他作品使用单镜图像来估计场景流量, 但是它们的场景流量估计会与规模模糊化程度趋同, 需要额外的深度或点云层地面真实度才能恢复真实规模。 尽管在 2D 中表现良好, 这些作品并不能提供准确可靠的 3D 估计值 。 我们在 permutohedal lattice - MonoPLFlowNet 上展示了一个深层次的学习结构 。 与以往所有作品不同, 我们的 OnoPOLFlow Net 是第一个只将两张连续的单镜像用作输入的作品, 而其深度和 3D 场景流都是真实规模的估计。 我们的实规模的场景流量估计超过了所有状态的单镜像工程, 。 根据地面真相恢复到真实规模, 并且与LIDAR 类似。 作为副产品, 我们的深度估计也超越了其他状态工程 。