Capsule networks are designed to parse an image into a hierarchy of objects, parts and relations. While promising, they remain limited by an inability to learn effective low level part descriptions. To address this issue we propose a novel self-supervised method for learning part descriptors of an image. During training, we exploit motion as a powerful perceptual cue for part definition, using an expressive decoder for part generation and layered image formation with occlusion. Experiments demonstrate robust part discovery in the presence of multiple objects, cluttered backgrounds, and significant occlusion. The resulting part descriptors, a.k.a. part capsules, are decoded into shape masks, filling in occluded pixels, along with relative depth on single images. We also report unsupervised object classification using our capsule parts in a stacked capsule autoencoder.
翻译:Capsule 网络的设计是将图像分析成一个物体、部件和关系的层次。 虽然很有希望, 但由于无法学习有效的低级别描述, 它们仍然受到限制。 为了解决这个问题, 我们提出一种新的自监管方法来学习图像的部分描述符。 在培训过程中, 我们利用运动作为部分定义的强大概念提示, 使用表达式解码器进行部分生成, 和以隐蔽方式形成层形图像。 实验显示, 在多个对象、 模糊的背景和显著的隐蔽状态下, 部分发现非常强。 由此产生的部分描述器, a. k. a. a. 部分胶囊, 被解码成形状面具, 填满环形像素, 以及单个图像的相对深度 。 我们还报告, 在堆叠的胶囊自动解密器中, 使用胶囊部件进行不受监督的物体分类 。