Approaches for single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry. We avoid all such supervision and assumptions by explicitly leveraging the consistency between images of different object instances. As a result, our method can learn from large collections of unlabelled images depicting the same object category. Our main contributions are two ways for leveraging cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; and (ii) neighbor reconstruction, a loss enforcing consistency between instances having similar shape or texture. Also critical to the success of our method are: our structured autoencoding architecture decomposing an image into explicit shape, texture, pose, and background; an adapted formulation of differential rendering; and a new optimization scheme alternating between 3D and pose learning. We compare our approach, UNICORN, both on the diverse synthetic ShapeNet dataset - the classical benchmark for methods requiring multiple views as supervision - and on standard real-image benchmarks (Pascal3D+ Car, CUB) for which most methods require known templates and silhouette annotations. We also showcase applicability to more challenging real-world collections (CompCars, LSUN), where silhouettes are not available and images are not cropped around the object.
翻译:单视图重建的方法通常依赖于观点说明、双影、背景的缺失、同一实例的多重观点、模板形状或对称性。我们避免所有这些监督和假设,明确利用不同对象实例图像的一致性。结果,我们的方法可以从大量未贴标签的图像中学习,描述同一对象类别。我们的主要贡献是利用跨视角一致性的两种方式:(一) 渐进调节,一种培训战略,逐步将模型从类别逐渐专门化为课程学习时的范例;和(二) 邻国重建,使具有类似形状或纹理的事例之间的一致性丧失。对于我们的方法成功也至关重要的是:我们结构有序的自动编码结构结构,将图像分解成清晰的形状、纹理、外形和背景;一种调整的显示差异的配制;以及一种在3D和表面之间交替的新的优化方案。我们比较了我们的方法,UNICORN,即不同的合成网络数据集-需要多种观点的方法的经典基准,以及标准真实目标(Pascal3D+CAR图像)之间标准目标(我们所知道的Sqal-Charstal replass real real restabations),我们最需要Slistal-shalls