While joint pruning--quantization is theoretically superior to sequential application, current joint methods rely on auxiliary procedures outside the training loop for finding compression parameters. This reliance adds engineering complexity and hyperparameter tuning, while also lacking a direct data-driven gradient signal, which might result in sub-optimal compression. In this paper, we introduce CoDeQ, a simple, fully differentiable method for joint pruning--quantization. Our approach builds on a key observation: the dead-zone of a scalar quantizer is equivalent to magnitude pruning, and can be used to induce sparsity directly within the quantization operator. Concretely, we parameterize the dead-zone width and learn it via backpropagation, alongside the quantization parameters. This design provides explicit control of sparsity, regularized by a single global hyperparameter, while decoupling sparsity selection from bit-width selection. The result is a method for Compression with Dead-zone Quantizer (CoDeQ) that supports both fixed-precision and mixed-precision quantization (controlled by an optional second hyperparameter). It simultaneously determines the sparsity pattern and quantization parameters in a single end-to-end optimization. Consequently, CoDeQ does not require any auxiliary procedures, making the method architecture-agnostic and straightforward to implement. On ImageNet with ResNet-18, CoDeQ reduces bit operations to ~5% while maintaining close to full precision accuracy in both fixed and mixed-precision regimes.
翻译:尽管联合剪枝-量化在理论上优于顺序应用,但现有联合方法依赖训练循环外的辅助过程来确定压缩参数。这种依赖性增加了工程复杂性和超参数调优负担,同时缺乏直接的数据驱动梯度信号,可能导致次优压缩。本文提出CoDeQ,一种简单、完全可微的联合剪枝-量化方法。我们的方法基于一个关键观察:标量量化器的死区等效于幅度剪枝,可直接在量化算子内部诱导稀疏性。具体而言,我们参数化死区宽度并通过反向传播学习该参数,同时学习量化参数。该设计通过单一全局超参数正则化实现对稀疏性的显式控制,并将稀疏性选择与比特宽度选择解耦。由此产生的死区量化器压缩方法(CoDeQ)支持固定精度和混合精度量化(通过可选第二超参数控制),在单次端到端优化中同时确定稀疏模式和量化参数。因此,CoDeQ无需任何辅助过程,使该方法具有架构无关性且易于实现。在ImageNet数据集使用ResNet-18的实验中,CoDeQ在固定精度和混合精度模式下均将比特运算量降低至约5%,同时保持接近全精度的准确率。