Understanding the 3D world without supervision is currently a major challenge in computer vision as the annotations required to supervise deep networks for tasks in this domain are expensive to obtain on a large scale. In this paper, we address the problem of unsupervised viewpoint estimation. We formulate this as a self-supervised learning task, where image reconstruction provides the supervision needed to predict the camera viewpoint. Specifically, we make use of pairs of images of the same object at training time, from unknown viewpoints, to self-supervise training by combining the viewpoint information from one image with the appearance information from the other. We demonstrate that using a perspective spatial transformer allows efficient viewpoint learning, outperforming existing unsupervised approaches on synthetic data, and obtains competitive results on the challenging PASCAL3D+ dataset.
翻译:了解没有监督的3D世界目前是计算机愿景中的一大挑战,因为监督这一领域深层网络所需的说明是大范围获取的昂贵条件。 在本文中,我们解决了不受监督的观点估计问题。我们将此设计成一种自我监督的学习任务,在这种任务中,图像重建提供了预测相机观点所需的监督。具体地说,我们利用培训时同一对象的一对图像,从未知的角度,进行自我监督培训,将一个图像的视角信息与另一个图像的外观信息结合起来。我们证明,使用视角空间变压器可以高效地进行观点学习,比合成数据的现有不受监督的方法更出色,并在挑战性的 PSCAL3D+数据集中获得竞争性结果。