Learning geometry, motion, and appearance priors of object classes is important for the solution of a large variety of computer vision problems. While the majority of approaches has focused on static objects, dynamic objects, especially with controllable articulation, are less explored. We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of articulated objects given only a set of color images as input. In a self-supervised manner, our novel representation learns shape, appearance, and articulation codes that enable independent control of these semantic dimensions. Our model is trained end-to-end without requiring any articulation annotations. Experiments show that our approach performs well for different joint types, such as revolute and prismatic joints, as well as different combinations of these joints. Compared to state of the art that uses direct 3D supervision and does not output appearance, we recover more faithful geometry and appearance from 2D observations only. In addition, our representation enables a large variety of applications, such as few-shot reconstruction, the generation of novel articulations, and novel view-synthesis.
翻译:对象类的学习几何、 运动和外观前缀对于解决大量各种计算机视觉问题很重要。 虽然大多数方法都侧重于静态对象, 但动态物体, 特别是可控的表达方式, 探索较少。 我们提出一种新的方法, 用于学习某类表达的物体的几何、 外观和运动, 仅作为一组颜色图像作为输入。 我们的新表达方式以自我监督的方式, 学习能够独立控制这些语义维度的形状、 外观和表达规范。 我们的模型是经过训练的端对端, 不需要任何语义说明。 实验显示, 我们的方法在不同的联合类型上表现良好, 比如, 交配方的交配, 以及这些组合的不同组合。 与使用直接 3D 监督而不是输出外观的艺术状态相比, 我们只从 2D 观测中恢复了更可靠的几面的几何和外观。 此外, 我们的表达方式使得大量应用得以实现, 例如微的重建、 生成新表达式和新视觉合成组合。