We present a new framework for learning 3D object shapes and dense cross-object 3D correspondences from just an unaligned category-specific image collection. The 3D shapes are generated implicitly as deformations to a category-specific signed distance field and are learned in an unsupervised manner solely from unaligned image collections without any 3D supervision. Generally, image collections on the internet contain several intra-category geometric and topological variations, for example, different chairs can have different topologies, which makes the task of joint shape and correspondence estimation much more challenging. Because of this, prior works either focus on learning each 3D object shape individually without modeling cross-instance correspondences or perform joint shape and correspondence estimation on categories with minimal intra-category topological variations. We overcome these restrictions by learning a topologically-aware implicit deformation field that maps a 3D point in the object space to a higher dimensional point in the category-specific canonical space. At inference time, given a single image, we reconstruct the underlying 3D shape by first implicitly deforming each 3D point in the object space to the learned category-specific canonical space using the topologically-aware deformation field and then reconstructing the 3D shape as a canonical signed distance field. Both canonical shape and deformation field are learned end-to-end in an inverse-graphics fashion using a learned recurrent ray marcher (SRN) as a differentiable rendering module. Our approach, dubbed TARS, achieves state-of-the-art reconstruction fidelity on several datasets: ShapeNet, Pascal3D+, CUB, and Pix3D chairs. Result videos and code at https://shivamduggal4.github.io/tars-3D/
翻译:我们提出了一个用于学习 3D 对象形状和密度大的交叉点 3D 函文的新框架。 我们从一个不结盟的分类特定图像收集中学习 3D 形状。 3D 形状是隐含的, 以变形的形式生成为特定类别特定图像签名的远程字段, 并且完全从不进行3D 监督的图像收藏中学习。 一般来说, 互联网上的图像收藏包含多个类别内部的几何和地形变异, 例如, 不同的椅子可能有不同的地形, 这使得联合形状和通信估计的任务更具挑战性。 因此, 之前的工作要么侧重于单个学习每个 3D 对象形状, 而不进行跨类型视频通信的模拟, 或者对具有最小类别内部地形变异的类别进行联合形状和通信估计。 我们克服了这些限制, 我们学习了一个表面认知的隐含式变形的图像字段, 将物体空间中的 3D 点映射到特定类别空间的更高维度点。 引用时间, 我们用一个单一的图像, 重建3D 基础的形状, 首先是隐含地变形的 3D, 在物体空间的每个3D 中, 将一个学习的变形的磁场, 将一个学习的磁场, 然后将一个学习的变形的变形的变形的磁场, 。