Capsule networks (see e.g. Hinton et al., 2018) aim to encode knowledge of and reason about the relationship between an object and its parts. In this paper we specify a generative model for such data, and derive a variational algorithm for inferring the transformation of each model object in a scene, and the assignments of observed parts to the objects. We derive a learning algorithm for the object models, based on variational expectation maximization (Jordan et al., 1999). We also study an alternative inference algorithm based on the RANSAC method of Fischler and Bolles (1981). We apply these inference methods to (i) data generated from multiple geometric objects like squares and triangles ("constellations"), and (ii) data from a parts-based model of faces. Recent work by Kosiorek et al. (2019) has used amortized inference via stacked capsule autoencoders (SCAEs) to tackle this problem -- our results show that we significantly outperform them where we can make comparisons (on the constellations data).
翻译:Capsule 网络(例如,Hinton等人,2018年)旨在将关于物体及其组成部分之间关系的知识和理性进行编码。在本文中,我们为这些数据指定了一个基因模型,并得出一个变量算法,用以推断每个模型对象在现场的变异,以及将观测到的部件分配给物体。我们根据变异预期最大化,为物体模型得出一个学习算法(约旦等人,1999年)。我们还根据RANSAC Fischler 和 Bolles (1981年) 的方法研究一种替代推论算法。我们将这些推论法应用于(一) 方形和三角形等多个几何物体产生的数据(“星座”), (二) 和(二) 基于部分面形模型产生的数据。Kosiorek 等人(2019年)最近的工作利用堆叠胶囊自动计算器(SCAEs) 来解决这个问题 -- 我们的结果显示,我们可以显著超越它们(在星座数据上)进行比较的地方。