We design a multiscopic vision system that utilizes a low-cost monocular RGB camera to acquire accurate depth estimation. Unlike multi-view stereo with images captured at unconstrained camera poses, the proposed system controls the motion of a camera to capture a sequence of images in horizontally or vertically aligned positions with the same parallax. In this system, we propose a new heuristic method and a robust learning-based method to fuse multiple cost volumes between the reference image and its surrounding images. To obtain training data, we build a synthetic dataset with multiscopic images. The experiments on the real-world Middlebury dataset and real robot demonstration show that our multiscopic vision system outperforms traditional two-frame stereo matching methods in depth estimation. Our code and dataset are available at \url{https://sites.google.com/view/multiscopic
翻译:我们设计了一个多镜视系统,利用低成本单镜 RGB 相机来获得准确的深度估计。 与以不受限制的摄像头拍摄图像的多视图立体器不同, 提议的系统控制一个相机的动作, 以水平或垂直对齐位置用同一副光蜡来捕捉一系列图像。 在这个系统中, 我们提出一种新的超速方法和一种强有力的学习方法, 将参考图像及其周围图像的多量成本融合起来。 为了获得培训数据, 我们用多科图像构建一个合成数据集。 真实世界的Midburry数据集和真正的机器人演示实验显示, 我们的多镜视系统在深度估计中比传统的双框立体相匹配方法要强。 我们的代码和数据集可以在\url{https://sites.gogle.com/view/multiscopic> 找到。 我们的代码和数据集可以在\url{https://sites.gogle. orgle. com/ view/ mixscopicpicpic}