We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of a camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. To obtain training data, we build a synthetic dataset with multiscopic images. The experiments on the real-world Middlebury dataset and real robot demonstration show that our multiscopic vision system outperforms traditional two-frame stereo matching methods in depth estimation. Our code and dataset are available at https://sites.google.com/view/multiscopic.
翻译:我们设计了一个多镜视系统,利用低成本单镜 RGB 相机来获得准确的深度估计。 与以不受限制的相机拍摄图像的多镜立体器不同, 提议的系统控制一个相机的动作, 以水平或垂直对齐位置拍摄图像序列, 与同一种抛光镜相同。 在这个系统中, 我们提出一种新的超光速方法和强有力的基于学习的方法, 将参考图像及其周围图像的多量成本融合在一起。 为了获得培训数据, 我们用多科图像构建一个合成数据集。 真实世界的Midburry数据集和真正的机器人演示实验显示, 我们的多镜视系统在深度估计中比传统的双框立体相匹配方法要强。 我们的代码和数据集可以在 https://sites.gogle. com/view/ multiscopic查阅 。