Binary neural networks (BNNs) have received increasing attention due to their superior reductions of computation and memory. Most existing works focus on either lessening the quantization error by minimizing the gap between the full-precision weights and their binarization or designing a gradient approximation to mitigate the gradient mismatch, while leaving the "dead weights" untouched. This leads to slow convergence when training BNNs. In this paper, for the first time, we explore the influence of "dead weights" which refer to a group of weights that are barely updated during the training of BNNs, and then introduce rectified clamp unit (ReCU) to revive the "dead weights" for updating. We prove that reviving the "dead weights" by ReCU can result in a smaller quantization error. Besides, we also take into account the information entropy of the weights, and then mathematically analyze why the weight standardization can benefit BNNs. We demonstrate the inherent contradiction between minimizing the quantization error and maximizing the information entropy, and then propose an adaptive exponential scheduler to identify the range of the "dead weights". By considering the "dead weights", our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet, compared with recent methods. Code can be available at https://github.com/z-hXu/ReCU.
翻译:由于计算和内存的减少率较高,双线神经网络(BNNs)受到越来越多的关注。大多数现有工作的重点要么是通过最大限度地缩小全精度重量与二进制重量之间的差距来减少量化错误,要么是设计一个梯度近似值,以减缓梯度错配,而将“死重数”保留下来。这导致当培训BNS时,“死重数”的趋同速度缓慢。在本文中,我们第一次探索“死重数”的影响,指的是一组重量,这些重量在对BNNS的培训中几乎未更新,然后引入纠正的夹器(RECU),以恢复“死重数”的更新。我们证明,恢复RECU的“死重数”可能会导致一个较小的四进制错误。此外,我们还考虑到加权数的增重数,然后从数学角度分析重量标准化为何使BNFCS受益。我们展示一个适应性指数调度器,然后建议一个适应性指数调度器来识别“死重/REFARs ”的范围。还考虑到“不比R-NFAR-NCS-CS-destru 的进度方法。