The exponential growth in parameter size and computational complexity of deep models poses significant challenges for efficient deployment. The core problem of existing compression methods is that different layers of the model have significant differences in their tolerance to compression levels. For instance, the first layer of a model can typically sustain a higher compression level compared to the last layer without compromising performance. Thus, the key challenge lies in how to allocate compression levels across layers in a way that minimizes performance loss while maximizing parameter reduction. To address this challenge, we propose a Compression Error Theory (CET) framework, designed to determine the optimal compression level for each layer. Taking quantization as an example, CET leverages differential expansion and algebraic geometry to reconstruct the quadratic form of quantization error as ellipsoids and hyperbolic paraboloids, and utilizes their geometric structures to define an error subspace. To identify the error subspace with minimal performance loss, by performing orthogonal decomposition of the geometric space, CET transforms the optimization process of the error subspace into a complementary problem. The final theoretical analysis shows that constructing the quantization subspace along the major axis results in minimal performance degradation. Through experimental verification of the theory, CET can greatly retain performance while compressing. Specifically, on the ResNet-34 model, CET achieves nearly 11$\times$ parameter compression while even surpassing performance comparable to the original model.


翻译:深度模型参数规模与计算复杂度的指数级增长给高效部署带来了严峻挑战。现有压缩方法的核心问题在于,模型不同层对压缩程度的容忍度存在显著差异。例如,模型的首层通常能比末层承受更高的压缩程度而不损害性能。因此,关键挑战在于如何跨层分配压缩程度,以在最大化参数削减的同时最小化性能损失。为应对这一挑战,我们提出了压缩误差理论框架,旨在为每一层确定最优压缩程度。以量化为例,CET 利用微分展开与代数几何,将量化误差的二次型重构为椭球面与双曲抛物面,并利用其几何结构定义误差子空间。为识别性能损失最小的误差子空间,通过对几何空间进行正交分解,CET 将误差子空间的优化过程转化为一个互补问题。最终的理论分析表明,沿主轴方向构建量化子空间可实现最小的性能退化。通过该理论的实验验证,CET 能在压缩的同时极大程度地保持性能。具体而言,在 ResNet-34 模型上,CET 实现了近 11$\times$ 的参数压缩,同时性能甚至超越了与原始模型相当的水平。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员