Binary neural networks (BNNs) have received increasing attention due to their superior reductions of computation and memory. Most existing works focus on either lessening the quantization error by minimizing the gap between the full-precision weights and their binarization or designing a gradient approximation to mitigate the gradient mismatch, while leaving the "dead weights" untouched. This leads to slow convergence when training BNNs. In this paper, for the first time, we explore the influence of "dead weights" which refer to a group of weights that are barely updated during the training of BNNs, and then introduce rectified clamp unit (ReCU) to revive the "dead weights" for updating. We prove that reviving the "dead weights" by ReCU can result in a smaller quantization error. Besides, we also take into account the information entropy of the weights, and then mathematically analyze why the weight standardization can benefit BNNs. We demonstrate the inherent contradiction between minimizing the quantization error and maximizing the information entropy, and then propose an adaptive exponential scheduler to identify the range of the "dead weights". By considering the "dead weights", our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet, compared with recent methods. Code can be available at [this https URL](https://github.com/z-hXu/ReCU).
翻译:二进制神经网络(BNNs)因其计算和记忆的精度下降而日益受到越来越多的关注。大多数现有工作的重点要么是通过最大限度地缩小全精度重量和二进制重量之间的差距来减少量化错误,要么是设计一个梯度近似值以减缓梯度错配,而不要使用“死重”。这导致在培训BNS时“死重”的趋同速度缓慢。在本文中,我们第一次探索“死重”的影响,这是指一组重量(CUI)在培训BNNs时几乎没有更新的,然后引入经校正的夹器(RECU)来恢复“死重”的更新。我们证明,恢复RECU的“死重”可能会导致更小的四进制错误。此外,我们还考虑到重量的增重,然后从数学角度分析重量标准化为何让BNNFs受益。我们展示了一个适应的指数调度器,然后建议一个适应性指数调度器来确定“死重”范围,而不是比RAFRS/RNCs。在考虑业绩,而不是比B-10方法。