ECQ${ ⁇ text{x}$:低Bit 和 Sparse DNS 的可解释性-驱动量 (ECQ$^{\text{x}}$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs)

The remarkable success of deep neural networks (DNNs) in various applications is accompanied by a significant increase in network parameters and arithmetic operations. Such increases in memory and computational demands make deep learning prohibitive for resource-constrained hardware platforms such as mobile devices. Recent efforts aim to reduce these overheads, while preserving model performance as much as possible, and include parameter reduction techniques, parameter quantization, and lossless compression techniques. In this chapter, we develop and describe a novel quantization paradigm for DNNs: Our method leverages concepts of explainable AI (XAI) and concepts of information theory: Instead of assigning weight values based on their distances to the quantization clusters, the assignment function additionally considers weight relevances obtained from Layer-wise Relevance Propagation (LRP) and the information content of the clusters (entropy optimization). The ultimate goal is to preserve the most relevant weights in quantization clusters of highest information content. Experimental results show that this novel Entropy-Constrained and XAI-adjusted Quantization (ECQ$^{\text{x}}$) method generates ultra low-precision (2-5 bit) and simultaneously sparse neural networks while maintaining or even improving model performance. Due to reduced parameter precision and high number of zero-elements, the rendered networks are highly compressible in terms of file size, up to $103\times$ compared to the full-precision unquantized DNN model. Our approach was evaluated on different types of models and datasets (including Google Speech Commands and CIFAR-10) and compared with previous work.

翻译：深度神经网络(DNNs)在各种应用中的显著成功伴随着网络参数和算术操作的大幅增加。记忆和计算需求的增加使得资源限制的硬件平台,如移动设备等,难以深思熟虑。最近的努力旨在尽可能减少这些间接费用,同时尽可能保存模型性能,包括减少参数技术、参数量化和无损压缩技术。在本章中,我们为DNS开发并描述一个新的量化模式:我们的方法利用了可解释的AI(XAI)概念和信息理论概念:与其根据它们与量化组的距离分配权重值,任务职能还考虑到从图层-相关性促进(LRP)和组的信息内容(优化)中获得的权重相关性。最终目标是保留最高信息内容四分化组中最相关的权重。实验结果显示,这种新型的Enproppy- concredit and XAI-经调整的Qalizional 理论概念化(ECQQQQQ_x$) 方法,而不是超低精度的精确度3 和高精确度网络的比值, 和低度- 以及高频度- 和高频- 基- 基- 数据化- 的计算- 的模型- 的模型- 的模型- 的模型- 和精确- 的模型- 和精确- 和精度- CLIL- 的模型- 和精确度- 的模型- 和精确度- 的模型- 和深度- 和精确- C- 和精确- 和精确- 和精确- 和精确- 和精确- 的模型- C- C- C- 和精确- 的模型- 和精确- 的模型- 和精确- 的模型- 的精确- 的模型- 的比- 和和的精确- 的模型- 和精确- 和精确- 和精确- 和精确- 和精确- 和精确- 和和精确- 和精确- 性- 性- 和精确- 和精确- 的- 的- 的- 的- 的- 和精确- 和精确- 和精确- 性- 的- 的- 的- 的- 的- 和