Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most existing approaches share the scaling factors layerwisely or channelwisely for quantization of convolutional layers. Channelwise quantization and layerwise quantization have been widely used in various applications. However, other quantization granularities are rarely explored. In this paper, we will explore the sub-layerwise granularity that shares the scaling factor across multiple input and output channels. We propose an efficient post-training quantization method in sub-layerwise granularity (PTQ-SL). Then we systematically experiment on various granularities and observe that the prediction accuracy of the quantized neural network has a strong correlation with the granularity. Moreover, we find that adjusting the position of the channels can improve the performance of sub-layerwise quantization. Therefore, we propose a method to reorder the channels for sub-layerwise quantization. The experiments demonstrate that the sub-layerwise quantization with appropriate channel reordering can outperform the channelwise quantization.
翻译:网络量化是压缩进化神经网络的强大技术。 量化颗粒质决定了如何在重量中共享缩放因子, 从而影响网络量化的性能。 大多数现有方法都以层次或渠道方式共享缩放因子, 以分层方式共享进化层的量化因子。 渠道性定量化和分层性量化在各种应用中被广泛使用。 但是, 也很少探索其他量化颗粒性能。 在本文中, 我们将探索在多个输入和输出渠道中共享缩放因子层的颗粒性。 我们提出一种高效的后培训量化方法( PTQ- SL ), 然后我们系统地实验各种颗粒性, 并观察四分层神经网络的预测准确性与颗粒性有着很强的关联性。 此外, 我们发现调整频道的位置可以改善次层四分级化的性能性能。 因此, 我们提出一种方法来重新排序次层级分级四分解的渠道。 实验表明, 亚层性二次量化的二次量化与适当的通道的量化, 分级的量化与适当的通道的量化可以调整。