Deep neural networks exploiting millions of parameters are nowadays the norm in deep learning applications. This is a potential issue because of the great amount of computational resources needed for training, and of the possible loss of generalization performance of overparametrized networks. We propose in this paper a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm, while performing a classic update for relevant ones. This technique, which is an improvement of classical weight decay, is based on the definition of a regularization term which can be added to any loss functional regardless of its form, resulting in a unified general framework exploitable in many different contexts. The actual elimination of parameters identified as irrelevant is handled by an iterative pruning algorithm. We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics, while achieving strong models compression.
翻译:利用数以百万计参数的深神经网络如今已成为深层学习应用的规范,这是一个潜在的问题,因为培训需要大量计算资源,而且可能丧失过度平衡网络的通用性能。我们在本文件中提出一种方法,通过正规化技术学习稀有的神经结构学,该技术确定非相关重量,有选择地缩小其规范,同时对有关规范进行经典更新。这一技术是古典体重衰减的改进,它基于一个正规化术语的定义,可以添加到任何损失功能中,而不管其形式如何,从而形成一个可以在许多不同情况下加以利用的统一的一般框架。实际消除被确定为无关的参数,由迭代的剪裁算法处理。我们测试了关于不同图像分类和自然语言生成任务的拟议技术,在弹性和度量度方面取得相同或更好的结果,同时实现强大的模型压缩。