We present a framework for learning 3D object shapes and dense cross-object 3D correspondences from just an unaligned category-specific image collection. The 3D shapes are generated implicitly as deformations to a category-specific signed distance field and are learned in an unsupervised manner solely from unaligned image collections and their poses without any 3D supervision. Generally, image collections on the internet contain several intra-category geometric and topological variations, for example, different chairs can have different topologies, which makes the task of joint shape and correspondence estimation much more challenging. Because of this, prior works either focus on learning each 3D object shape individually without modeling cross-instance correspondences or perform joint shape and correspondence estimation on categories with minimal intra-category topological variations. We overcome these restrictions by learning a topologically-aware implicit deformation field that maps a 3D point in the object space to a higher dimensional point in the category-specific canonical space. At inference time, given a single image, we reconstruct the underlying 3D shape by first implicitly deforming each 3D point in the object space to the learned category-specific canonical space using the topologically-aware deformation field and then reconstructing the 3D shape as a canonical signed distance field. Both canonical shape and deformation field are learned end-to-end in an inverse-graphics fashion using a learned recurrent ray marcher (SRN) as a differentiable rendering module. Our approach, dubbed TARS, achieves state-of-the-art reconstruction fidelity on several datasets: ShapeNet, Pascal3D+, CUB, and Pix3D chairs. Result videos and code at https://shivamduggal4.github.io/tars-3D/
翻译:我们提出了一个框架来学习 3D 对象形状和密度大的交叉点 3D 对应函文。 我们从一个不完全的分类特定图像收集中学习 3D 形状。 3D 形状是隐含的, 以变形的形式生成成一个特定类别指定的远程字段, 并且以不受监督的方式仅从不结盟图像收藏及其外观中学习。 一般来说, 互联网上的图像收藏包含多个类别内部的几何和地形变异, 比如, 不同的椅子可能有不同的地形, 这使得联合形状和通信估计的任务更具挑战性。 因此, 之前的工作要么侧重于学习每个3D 对象的单独形状, 而不建模跨类型视频或对具有最小类内部地形变异的类别进行联合形状和通信估计。 我们克服了这些限制, 我们学习了一个表面认知的隐含暗的变形字段, 绘制了特定类别星空空间中的 3D 点, 根据单一的图像, 我们重建了3D 基础的形状, 以第一个隐含的变形 3D 点, 利用最新的空间的变形模型, 将一个学习的实地变形变形变形, 成为了一个学习的实地 。