The increasing size of recently proposed Neural Networks makes it hard to implement them on embedded devices, where memory, battery and computational power are a non-trivial bottleneck. For this reason during the last years network compression literature has been thriving and a large number of solutions has been been published to reduce both the number of operations and the parameters involved with the models. Unfortunately, most of these reducing techniques are actually heuristic methods and usually require at least one re-training step to recover the accuracy. The need of procedures for model reduction is well-known also in the fields of Verification and Performances Evaluation, where large efforts have been devoted to the definition of quotients that preserve the observable underlying behaviour. In this paper we try to bridge the gap between the most popular and very effective network reduction strategies and formal notions, such as lumpability, introduced for verification and evaluation of Markov Chains. Elaborating on lumpability we propose a pruning approach that reduces the number of neurons in a network without using any data or fine-tuning, while completely preserving the exact behaviour. Relaxing the constraints on the exact definition of the quotienting method we can give a formal explanation of some of the most common reduction techniques.
翻译:最近提议的神经网络规模不断扩大,因此很难在嵌入装置上实施这些装置,因为内存、电池和计算能力是非三重瓶颈。为此原因,过去几年网络压缩文献蓬勃发展,并公布了大量解决方案,以减少操作数量和模型所涉参数。不幸的是,这些减少技术大多实际上是超速方法,通常需要至少再培训一个步骤才能恢复准确性。在核查和绩效评估领域也非常需要减少模型程序,在其中大力界定维护可观测基本行为的商数。在本文件中,我们试图弥合最受欢迎和最有效的网络削减战略与正式概念之间的差距,例如用于核实和评价Markov链的可拼凑性。关于可调和性,我们提出一个简化方法,在不使用任何数据或微调的情况下减少网络神经元的数量,同时完全保留准确行为。我们可正式解释一些常见的减少方法。