Capsule networks (CapsNets) aim to parse images into a hierarchical component structure that consists of objects, parts, and their relations. Despite their potential, they are computationally expensive and pose a major drawback, which limits utilizing these networks efficiently on more complex datasets. The current CapsNet models only compare their performance with the capsule baselines and do not perform at the same level as deep CNN-based models on complicated tasks. This paper proposes an efficient way for learning capsules that detect atomic parts of an input image, through a group of SubCapsules, upon which an input vector is projected. Subsequently, we present the Wasserstein Embedding Module that first measures the dissimilarity between the input and components modeled by the SubCapsules, and then finds their degree of alignment based on the learned optimal transport. This strategy leverages new insights on defining alignment between the input and SubCapsules based on the similarity between their respective component distributions. Our proposed model, (i) is lightweight and allows to apply capsules for more complex vision tasks; (ii) performs better than or at par with CNN-based models on these challenging tasks. Our experimental results indicate that Wasserstein Embedding Capsules (WECapsules) perform more robustly on affine transformations, effectively scale up to larger datasets, and outperform the CNN and CapsNet models in several vision tasks.
翻译:Capsule 网络( CapsNets) 旨在将图像分析成由对象、部件及其关系组成的等级结构。 尽管它们具有潜力, 它们计算成本昂贵, 并构成重大缺陷, 从而限制在更复杂的数据集中高效地使用这些网络。 当前 CapsNet 模型只能将其性能与胶囊基线进行比较, 并且不与基于有线电视新闻网的深层次复杂任务模型进行同等的操作。 本文建议了一种有效的学习胶囊的方法, 通过一组子囊体来检测输入图像的原子部分, 并据此预测输入矢量。 随后, 我们展示了瓦瑟斯坦 嵌入模块模块, 它首先测量了由子囊体构建的输入和组件之间的异性, 从而限制了在更复杂的数据集中, 然后根据所学到的最佳运输模式, 利用新的见解来确定投入和子囊体之间的匹配性。 我们提议的模型, (i) 是轻量的, 并允许将胶囊应用到更复杂的视觉任务上。 (ii) 与基于CNISCS-Capreal 的更强型模型相比, 展示了更具有挑战性的模型。