Convolutional neural networks use pooling and other downscaling operations to maintain translational invariance for detection of features, but in their architecture they do not explicitly maintain a representation of the locations of the features relative to each other. This means they do not represent two instances of the same object in different orientations the same way, like humans do, and so training them often requires extensive data augmentation and exceedingly deep networks. A team at Google Brain recently made news with an attempt to fix this problem: Capsule Networks. While a normal CNN works with scalar outputs representing feature presence, a CapsNet works with vector outputs representing entity presence. We want to stress test CapsNet in various incremental ways to better understand their performance and expressiveness. In broad terms, the goals of our investigation are: (1) test CapsNets on datasets that are like MNIST but harder in a specific way, and (2) explore the internal embedding space and sources of error for CapsNets.
翻译:革命性神经网络使用集合和其他缩小缩放操作来维持功能探测的翻译差异,但在其结构中,它们没有明确保持特征相对位置的表示。这意味着它们不像人类那样以不同方向代表同一对象的两个实例,不同方向与人类不同,因此训练它们往往需要广泛的数据扩增和极高的网络。谷歌脑的一个团队最近发布消息,试图解决这个问题:Capsule网络。正常的CNN有线电视新闻与显示特征的卡路里输出合作,而CapsNet则与代表实体存在的矢量输出合作。我们要以各种渐进方式测试CapsNet,以更好地了解其性能和表达性。广义上,我们调查的目标是:(1) 在像MNIST这样的数据集上测试CapsNet,但具体来说则更难,以及(2)探索CapsNet的内部嵌入空间和错误源。