We propose a weakly-supervised multi-view learning approach to learn category-specific surface mapping without dense annotations. We learn the underlying surface geometry of common categories, such as human faces, cars, and airplanes, given instances from those categories. While traditional approaches solve this problem using extensive supervision in the form of pixel-level annotations, we take advantage of the fact that pixel-level UV and mesh predictions can be combined with 3D reprojections to form consistency cycles. As a result of exploiting these cycles, we can establish a dense correspondence mapping between image pixels and the mesh acting as a self-supervisory signal, which in turn helps improve our overall estimates. Our approach leverages information from multiple views of the object to establish additional consistency cycles, thus improving surface mapping understanding without the need for explicit annotations. We also propose the use of deformation fields for predictions of an instance specific mesh. Given the lack of datasets providing multiple images of similar object instances from different viewpoints, we generate and release a multi-view ShapeNet Cars and Airplanes dataset created by rendering ShapeNet meshes using a 360 degree camera trajectory around the mesh. For the human faces category, we process and adapt an existing dataset to a multi-view setup. Through experimental evaluations, we show that, at test time, our method can generate accurate variations away from the mean shape, is multi-view consistent, and performs comparably to fully supervised approaches.
翻译:我们提出一种监督不力的多视角学习方法,以学习特定类别的地貌绘图,而不作密集的注解。我们从这些类别中学习普通类别(如人的脸、汽车和飞机等)的基本表面几何学,从这些类别中发现一些实例。传统方法通过像素层次的注释形式进行广泛的监督来解决这一问题,但我们利用以下事实:像素层次的紫外线和网状预测可以与3D再预测相结合,形成一致性周期。通过利用这些周期,我们可以在图像像素像素与作为自我监督信号的网状之间建立一个密集的对应图谱,这反过来有助于改进我们的总体估计。我们的方法利用从目标的多重观点获得的信息来建立更多的一致性周期,从而在不需要明确说明的情况下改进地貌图绘制理解。我们还提议使用变形字段来预测某个具体介质的图。由于缺乏能够从不同角度提供多个完全相似对象情况的数据集,因此我们可以生成并发布一个多视角的图像网络和网形和网形图象,这又有助于改进我们的总体估计。我们的方法利用一个连续的轨迹来调整我们现有的图像和图象的变形图象。