Rendering articulated objects while controlling their poses is critical to applications such as virtual reality or animation for movies. Manipulating the pose of an object, however, requires the understanding of its underlying structure, that is, its joints and how they interact with each other. Unfortunately, assuming the structure to be known, as existing methods do, precludes the ability to work on new object categories. We propose to learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views, with no additional supervision, such as joints annotations, or information about the structure. Our insight is that adjacent parts that move relative to each other must be connected by a joint. To leverage this observation, we model the object parts in 3D as ellipsoids, which allows us to identify joints. We combine this explicit representation with an implicit one that compensates for the approximation introduced. We show that our method works for different structures, from quadrupeds, to single-arm robots, to humans.
翻译:在控制其外形的同时进行显示显示对象对于虚拟现实或电影动画等应用至关重要。 但是, 操纵一个对象的外形需要理解其基本结构, 即它的接合和它们如何相互作用。 不幸的是, 假设其结构会像现有方法那样为人所知, 则排除了在新对象类别上工作的能力。 我们提议通过观察它们从多个视图中移动来了解先前看不见的外形和结构。 没有额外的监督, 例如联合说明, 或关于结构的信息。 我们的洞察力是, 相邻相相对的部件必须用一个联合连接连接。 为了利用这一观察, 我们把3D中的物体部分作为电子立体模型, 使我们能够识别联合。 我们把这一明确表述与暗含的描述结合起来, 以弥补引入的近似值。 我们显示我们的方法适用于不同的结构, 从四重到单臂机器人, 到人类。