Quantization is a technique for reducing deep neural networks (DNNs) training and inference times, which is crucial for training in resource constrained environments or time critical inference applications. State-of-the-art (SOTA) quantization approaches focus on post-training quantization, i.e. quantization of pre-trained DNNs for speeding up inference. Very little work on quantized training exists, which neither al-lows dynamic intra-epoch precision switches nor em-ploys an information theory based switching heuristic. Usually, existing approaches require full precision refinement afterwards and enforce a global word length across the whole DNN. This leads to suboptimal quantization mappings and resource usage. Recognizing these limits, we introduce MARViN, a new quantized training strategy using information theory-based intra-epoch precision switching, which decides on a per-layer basis which precision should be used in order to minimize quantization-induced information loss. Note that any quantization must leave enough precision such that future learning steps do not suffer from vanishing gradients. We achieve an average speedup of 1.86 compared to a float32 basis while limiting mean accuracy degradation on AlexNet/ResNet to only -0.075%.
翻译:量化是一种减少深神经网络(DNNS)培训和推断时间的技术,对于资源受限环境或时间关键推断应用中的培训至关重要。 高级技术(SOTA)量化方法侧重于培训后量化,即为加快推断,对培训前培训的DNNS进行量化。 量化培训工作很少,既非高能动态的局部精密开关,也非基于超能理论的信息理论。 通常,现有方法需要事后全面精确的完善,并在整个DNNN执行全球单词长度。这导致次优化的量化图和资源使用。认识到这些局限性,我们引入了MARVIN,这是使用基于信息理论的内部精密转换的新的量化培训战略,它决定了如何精确地使用权基,以尽量减少量化导致的信息损失。 请注意,任何量化必须留下足够精确的精确度,以便今后学习的步骤不会因消失梯度而受到影响。 我们仅使用平均的亚历克克斯网络的精确度,而仅使用平均速度基础。