The optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights with their binarized representations. Current techniques for weight-updating use the same approaches as traditional Neural Networks (NNs) with the extra requirement of using an approximation to the derivative of the sign function - as it is the Dirac-Delta function - for back-propagation; thus, efforts are focused adapting full-precision techniques to work on BNNs. In the literature, only one previous effort has tackled the problem of directly training the BNNs with bit-flips by using the first raw moment estimate of the gradients and comparing it against a threshold for deciding when to flip a weight (Bop). In this paper, we take an approach parallel to Adam which also uses the second raw moment estimate to normalize the first raw moment before doing the comparison with the threshold, we call this method Bop2ndOrder. We present two versions of the proposed optimizer: a biased one and a bias-corrected one, each with its own applications. Also, we present a complete ablation study of the hyperparameters space, as well as the effect of using schedulers on each of them. For these studies, we tested the optimizer in CIFAR10 using the BinaryNet architecture. Also, we tested it in ImageNet 2012 with the XnorNet and BiRealNet architectures for accuracy. In both datasets our approach proved to converge faster, was robust to changes of the hyperparameters, and achieved better accuracy values.
翻译:优化二进制神经网络(BNN) 取决于以其二进制表示形式对实际价值加权值进行近似优化。 目前的加权提升技术使用与传统神经网络(NN)相同的方法,额外要求使用标志函数衍生物近似值,因为是Dirac- Delta函数,这是对后映化的附加要求; 因此, 努力的重点是将全精度技术调整到对BNNS的工作上。 在文献中, 仅有先前的一项努力通过使用对梯度的第一次原始精确度估计来直接培训 BNNS, 并对照决定何时翻转重量( Bop) 的门槛( NNNN) 。 在本文中, 我们采用与Adam 平行的第二个原始估计值, 使与门槛进行比较之前的第一个原始时刻正常化, 我们称之为 Bop2nd Order。 我们展示了两种最优化方法: 偏向一个是偏差, 一个是偏差的, 一个是偏差的, 每个都使用自己的应用程序。 另外, 我们展示了一个完整的2012年的BAR结构, 我们用这些模型的完整地, 测试了这些模型, 我们使用了这些模型的模型的模型, 和BIBARBARBAR 。