Convolution blocks serve as local feature extractors and are the key to success of the neural networks. To make local semantic feature embedding rather explicit, we reformulate convolution blocks as feature selection according to the best matching kernel. In this manner, we show that typical ResNet blocks indeed perform local feature embedding via template matching once batch normalization (BN) followed by a rectified linear unit (ReLU) is interpreted as arg-max optimizer. Following this perspective, we tailor a residual block that explicitly forces semantically meaningful local feature embedding through using label information. Specifically, we assign a feature vector to each local region according to the classes that the corresponding region matches. We evaluate our method on three popular benchmark datasets with several architectures for image classification and consistently show that our approach substantially improves the performance of the baseline architectures.
翻译:革命区块作为本地地物提取器, 是神经网络成功的关键 。 为了让本地语义特征嵌入相当清晰, 我们根据最匹配的内核重新配置革命区块作为特征选择 。 这样, 我们显示典型的 ResNet 区块确实通过模板嵌入本地地物, 一旦批量正常化( BN), 之后再有一个纠正的线性单元( ReLU), 就会被解读为 rg- max 优化 。 从这个角度出发, 我们调整一个残余区块, 明确强制使用标签信息进行具有语义意义的本地地物嵌入 。 具体地说, 我们根据相应区域匹配的类别为每个本地区域指定了一个特性矢量器 。 我们评估了三个通用基准数据集的方法, 配有几套图像分类架构, 并不断显示我们的方法大大改进了基线架构的性能 。