Deep neural networks have relieved a great deal of burden on human experts in relation to feature engineering. However, comparable efforts are instead required to determine effective architectures. In addition, as the sizes of networks have grown overly large, a considerable amount of resources is also invested in reducing the sizes. The sparsification of an over-complete model addresses these problems as it removes redundant components and connections. In this study, we propose a fully differentiable sparsification method for deep neural networks which allows parameters to be zero during training via stochastic gradient descent. Thus, the proposed method can learn the sparsified structure and weights of a network in an end-to-end manner. The method is directly applicable to various modern deep neural networks and imposes minimum modification to existing models. To the best of our knowledge, this is the first fully [sub-]differentiable sparsification method that zeroes out parameters. It provides a foundation for future structure learning and model compression methods.
翻译:深神经网络减轻了人类专家在地貌工程方面的巨大负担,然而,需要做出类似的努力来确定有效的结构。此外,由于网络规模的扩大过大,大量资源也投入到缩小规模上。过度完整的模型的简单化解决了这些问题,因为它消除了多余的部件和连接。在本研究中,我们建议了一种完全可区分的深神经网络封闭化方法,允许在通过随机梯度下降进行训练期间参数为零。因此,拟议的方法可以以端到端的方式学习网络的封闭结构和重量。该方法直接适用于各种现代深神经网络,并对现有模型进行最低限度的修改。据我们所知,这是第一个完全[次 分化的零参数的绝热化方法。它为未来的结构学习和模型压缩方法提供了基础。