Shift neural networks reduce computation complexity by removing expensive multiplication operations and quantizing continuous weights into low-bit discrete values, which are fast and energy efficient compared to conventional neural networks. However, existing shift networks are sensitive to the weight initialization, and also yield a degraded performance caused by vanishing gradient and weight sign freezing problem. To address these issues, we propose S low-bit re-parameterization, a novel technique for training low-bit shift networks. Our method decomposes a discrete parameter in a sign-sparse-shift 3-fold manner. In this way, it efficiently learns a low-bit network with a weight dynamics similar to full-precision networks and insensitive to weight initialization. Our proposed training method pushes the boundaries of shift neural networks and shows 3-bit shift networks out-performs their full-precision counterparts in terms of top-1 accuracy on ImageNet.
翻译:移动神经网络通过消除昂贵的倍增操作和将连续重量量化为低位离散值,降低计算复杂性,这些值与常规神经网络相比是快速的,节能的。然而,现有的转移网络对权重初始化十分敏感,并且由于渐变的梯度和重量标志冻结问题而导致性能退化。为了解决这些问题,我们建议采用低位再分计技术,即培训低位转移网络的新技术。我们的方法以信号-偏斜的三倍方式分解一个离散参数。这样,它有效地学习了一个与全精度网络相似的重力动力和对权重初始化不敏感的低位网络。我们拟议的培训方法拉动了移动神经网络的边界,并显示三位转移网络在图像网络的顶端-1精度方面优于其全精度对应方位。