Convolutional neural networks (CNNs) are remarkably successful in many computer vision tasks. However, the high cost of inference is problematic for embedded and real-time systems, so there are many studies on compressing the networks. On the other hand, recent advances in self-attention models showed that convolution filters are preferable to self-attention in the earlier layers, which indicates that stronger inductive biases are better in the earlier layers. As shown in convolutional filters, strong biases can train specific filters and construct unnecessarily filters to zero. This is analogous to classical image processing tasks, where choosing the suitable filters makes a compact dictionary to represent features. We follow this idea and incorporate Gabor filters in the earlier layers of CNNs for compression. The parameters of Gabor filters are learned through backpropagation, so the features are restricted to Gabor filters. We show that the first layer of VGG-16 for CIFAR-10 has 192 kernels/features, but learning Gabor filters requires an average of 29.4 kernels. Also, using Gabor filters, an average of 83% and 94% of kernels in the first and the second layer, respectively, can be removed on the altered ResNet-20, where the first five layers are exchanged with two layers of larger kernels for CIFAR-10.
翻译:在许多计算机视觉任务中,进化神经网络(CNNs)非常成功。然而,对于嵌入式和实时系统来说,高成本的推断是成问题的,因此有许多关于压缩网络的研究。另一方面,自省模型的最近进展表明,进化过滤器比早期层的自我关注更可取,这表明在早期层中,更强烈的进化偏差更好。如进化过滤器所示,强的偏差可以培养特定的过滤器,并将不必要的过滤器建为零。这类似于古典图像处理任务,其中选择合适的过滤器可以形成一个缩缩装词典。我们遵循这一想法,并将加博过滤器纳入前几层CNN系统进行压缩。加博过滤器的参数通过反演化来学习,因此这些特性仅限于Gabor过滤器的早期。我们显示,CFAR-10的VGG-16第一层有192个内核/内核,但学习加博过滤器需要29.4内核的平均值。此外,首先使用Gabor过滤器过滤器,然后将Gabor过滤器添加一个缩字典过滤器纳入前一层,加博层,将Gabor过滤器的Gabor过滤器中,将Gabor过滤器的过滤器纳入前一层,然后将Gabreal-20层的平层中,然后分别层的平层中,将83%和深层的平层中,将分别为10-94。在二层中,将分别层中, 。在二层中,在更深层中,在更的平层中,以内层中,在更深层中,将18层中,将185至更深层中。