We formulate the entropy of a quantized artificial neural network as a differentiable function that can be plugged as a regularization term into the cost function minimized by gradient descent. Our formulation scales efficiently beyond the first order and is agnostic of the quantization scheme. The network can then be trained to minimize the entropy of the quantized parameters, so that they can be optimally compressed via entropy coding. We experiment with our entropy formulation at quantizing and compressing well-known network architectures over multiple datasets. Our approach compares favorably over similar methods, enjoying the benefits of higher order entropy estimate, showing flexibility towards non-uniform quantization (we use Lloyd-max quantization), scalability towards any entropy order to be minimized and efficiency in terms of compression. We show that HEMP is able to work in synergy with other approaches aiming at pruning or quantizing the model itself, delivering significant benefits in terms of storage size compressibility without harming the model's performance.
翻译:我们把量化人造神经网络的增缩成一种不同的功能,可以作为正规化术语插入因梯度下降而最小化的成本函数中。我们的配制尺度在第一个顺序之外有效,并且是量化办法的不可知性。然后,可以对这个网络进行培训,以最大限度地减少量化参数的增缩,从而通过酶编码优化压缩这些参数。我们在对已知网络结构结构进行校正和压缩的多个数据集上试验我们的增缩配方。我们的方法优于类似的方法,比得上较高的顺序增缩估计,显示非单形量化的灵活性(我们使用劳埃德-最大量化),任何增缩放的变异性,在压缩方面的效率。我们表明,HEMP能够与其他旨在对模型本身进行分解或四分化的方法协同工作,在储存大小的可压缩性方面带来巨大好处,而不会损害模型的性能。