Many neural network quantization techniques have been developed to decrease the computational and memory footprint of deep learning. However, these methods are evaluated subject to confounding tradeoffs that may affect inference acceleration or resource complexity in exchange for higher accuracy. In this work, we articulate a variety of tradeoffs whose impact is often overlooked and empirically analyze their impact on uniform and mixed-precision post-training quantization, finding that these confounding tradeoffs may have a larger impact on quantized network accuracy than the actual quantization methods themselves. Because these tradeoffs constrain the attainable hardware acceleration for different use-cases, we encourage researchers to explicitly report these design choices through the structure of "quantization cards." We expect quantization cards to help researchers compare methods more effectively and engineers determine the applicability of quantization techniques for their hardware.
翻译:为了减少深层学习的计算和记忆足迹,已经开发了许多神经网络量化技术,以减少深层学习的计算和记忆足迹。然而,对这些方法的评价是经过折中权衡,可能会影响推论加速或资源复杂程度,以换取更高的准确性。在这项工作中,我们阐述了各种权衡,其影响往往被忽视,并从经验角度分析了其对统一和混合精准培训后量化的影响,发现这些折中权衡对量化网络准确性的影响可能大于实际的量化方法本身。由于这些权衡限制了不同使用情况下可实现的硬件加速,我们鼓励研究人员通过“量化卡”的结构明确报告这些设计选择。 我们期待量化卡有助于研究人员更有效地比较方法,让工程师确定量化技术对其硬件的适用性。