We present a convolutional neural network design with additional branches after certain convolutions so that we can extract features with differing effective receptive fields and levels of abstraction. From each branch, we transform each of the final filters into a pair of homogeneous vector capsules. As the capsules are formed from entire filters, we refer to them as filter capsules. We then compare three methods for merging the branches--merging with equal weight and merging with learned weights, with two different weight initialization strategies. This design, in combination with a domain-specific set of randomly applied augmentation techniques, establishes a new state of the art for the MNIST dataset with an accuracy of 99.84% for an ensemble of these models, as well as establishing a new state of the art for a single model (99.79% accurate). These accuracies were achieved with a 75% reduction in both the number of parameters and the number of epochs of training relative to the previously best performing capsule network on MNIST. All training was performed using the Adam optimizer and experienced no overfitting.
翻译:我们提出一个革命性神经网络设计,并在某些变迁后增加分支,这样我们可以提取不同有效可接收字段和抽象程度的不同功能。 我们从每个分支将每个最终过滤器转换成一对同质矢量胶囊。 当这些胶囊由整个过滤器组成时, 我们把它们称为过滤囊。 然后我们比较了三种方法, 将分支合并成一个同等重量的分支, 并与学到的重量合并在一起, 并采用两种不同的重量初始化战略。 这个设计, 结合一套随机应用的扩增技术的域特有组合, 为MNIST数据集建立了新状态, 这些模型的精度为99.84%, 并且为单一模型建立了新的艺术状态( 99.79% 准确性 ) 。 这些精度的实现是,参数数减少了75%, 与MNIST上以前最完善的胶囊网络相比, 培训次数也减少了75%。 所有培训都是使用亚当最优化且没有经验的。