Depthwise separable convolutions and frequency-domain convolutions are two recent ideas for building efficient convolutional neural networks. They are seemingly incompatible: the vast majority of operations in depthwise separable CNNs are in pointwise convolutional layers, but pointwise layers use 1x1 kernels, which do not benefit from frequency transformation. This paper unifies these two ideas by transforming the activations, not the kernels. Our key insights are that 1) pointwise convolutions commute with frequency transformation and thus can be computed in the frequency domain without modification, 2) each channel within a given layer has a different level of sensitivity to frequency domain pruning, and 3) each channel's sensitivity to frequency pruning is approximately monotonic with respect to frequency. We leverage this knowledge by proposing a new technique which wraps each pointwise layer in a discrete cosine transform (DCT) which is truncated to selectively prune coefficients above a given threshold as per the needs of each channel. To learn which frequencies should be pruned from which channels, we introduce a novel learned parameter which specifies each channel's pruning threshold. We add a new regularization term which incentivizes the model to decrease the number of retained frequencies while still maintaining task accuracy. Unlike weight pruning techniques which rely on sparse operators, our contiguous frequency band pruning results in fully dense computation. We apply our technique to MobileNetV2 and in the process reduce computation time by 22% and incur <1% accuracy degradation.
翻译:深度可分解的CNN的绝大多数操作都是在点变相层,但点感层使用1x1内核内核,这些内核不会从频率转换中受益。本文通过转换激活而非内核将这两种想法统一起来。我们的关键见解是:(1) 点感知的连接随着频率转换而通向频率转换,因此可以在频率域内不作修改地计算,(2) 某一层内的每个频道对频率域运行有不同程度的敏感度,(3) 每个频道对频率运行的敏感度在频率运行方面大致是单调的。我们利用这一知识的办法是提出一种新技术,将每个点的内核变换(而不是内核)包在一起,这种技术是按每个频道的需要对一个阈值超过一个特定阈值的有选择性的纯度系数。(2) 某一层内的每个频道应该从哪个频率运行一个新的频道,我们对频率运行的灵敏度的灵敏度有不同程度,我们引入了一个新学的参数,在每一个频道的精度中将每个频率运行的精度调调调调调调调调调调调调。