Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. State-of-the-art approaches typically assume accurate camera poses as input, which could be difficult to obtain in realistic settings. In this paper, we present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses. The core of our approach is a fast and robust multi-view reconstruction algorithm to jointly refine 3D geometry and camera pose estimation using learnable neural network modules. We provide a thorough benchmark of state-of-the-art approaches for this problem on ShapeNet. Our approach achieves best-in-class results. It is also two orders of magnitude faster than the recent optimization-based approach IDR. Our code is released at \url{https://github.com/zhenpeiyang/FvOR/}
翻译:从一些图像观测中重建精确的 3D 对象模型仍然是计算机视觉上的一个棘手问题。 最先进的方法通常假定精确的相机作为输入, 在现实环境中很难获得。 在本文中, 我们介绍一个基于学习的物体重建方法FvOR, 这是一种基于学习的物体重建方法, 该方法预测精确的 3D 模型, 其图像带有一些噪音。 我们的方法核心是一个快速而有力的多视角重建算法, 以利用可学习的神经网络模块共同完善 3D 几何和相机构成估计。 我们为 ShapeNet 的这一问题提供了一个最先进的方法的完整基准。 我们的方法取得了最高级的结果。 与最近的基于优化的方法IDR 相比, 也快于两个数量级。 我们的代码在\url{https://github.com/zenpeiyang/FvOR/}发布。