Since sparse neural networks usually contain many zero weights, these unnecessary network connections can potentially be eliminated without degrading network performance. Therefore, well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources. In this work, we propose a new automatic pruning method - Sparse Connectivity Learning (SCL). Specifically, a weight is re-parameterized as an element-wise multiplication of a trainable weight variable and a binary mask. Thus, network connectivity is fully described by the binary mask, which is modulated by a unit step function. We theoretically prove the fundamental principle of using a straight-through estimator (STE) for network pruning. This principle is that the proxy gradients of STE should be positive, ensuring that mask variables converge at their minima. After finding Leaky ReLU, Softplus, and Identity STEs can satisfy this principle, we propose to adopt Identity STE in SCL for discrete mask relaxation. We find that mask gradients of different features are very unbalanced, hence, we propose to normalize mask gradients of each feature to optimize mask variable training. In order to automatically train sparse masks, we include the total number of network connections as a regularization term in our objective function. As SCL does not require pruning criteria or hyper-parameters defined by designers for network layers, the network is explored in a larger hypothesis space to achieve optimized sparse connectivity for the best performance. SCL overcomes the limitations of existing automatic pruning methods. Experimental results demonstrate that SCL can automatically learn and select important network connections for various baseline network structures. Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
翻译:由于稀疏的神经网络通常包含许多零权重,这些不必要的网络连接有可能在不降低网络性能的情况下被完全消除。 因此, 设计完善的智能神经网络有可能大幅降低 FLOP 和计算资源。 在这项工作中, 我们提出一种新的自动调整方法 - Sprass 连接性学习( SCL) 。 具体地说, 重量被重新定性为可训练重量变量和二元面罩的元素增殖。 因此, 网络连接由二进制的二进制掩罩充分描述, 由单位级函数调节。 我们理论上证明, 使用直通的自动计算器( STE) 来大幅降低网络运行。 我们提议, 使用更直通的直通的自动估算器( STE) 来降低网络性能。 这个原则是, STE的代理梯度梯度梯度梯度梯度值应该为正正数, 保证隐藏网络内的各种变现功能。 在找到 Leaky ReL, Softplus 之后, 我们提议, 将身份STEEEEED 用于离线系统化, 通过最不固定的Sload Strealal deal deal train commal train strain train strain strain strain strain strual train strual train strual struction.