Quantization based model compression serves as high performing and fast approach for inference that yields highly compressed models compared to their full-precision floating point counterparts. The most extreme quantization is a 1-bit representation of parameters such that they have only two possible values, typically -1(0) or +1. Models that constrain the weights to binary values enable efficient implementation of the ubiquitous dot product by additions only without requiring floating point multiplications which is beneficial for resources constrained inference. The main contribution of this work is the introduction of a method to smooth the combinatorial problem of determining a binary vector of weights to minimize the expected loss for a given objective by means of empirical risk minimization with backpropagation. This is achieved by approximating a multivariate binary state over the weights utilizing a deterministic and differentiable transformation of real-valued continuous parameters. The proposed method adds little overhead in training, can be readily applied without any substantial modifications to the original architecture, does not introduce additional saturating non-linearities or auxiliary losses, and does not prohibit applying other methods for binarizing the activations. It is demonstrated that contrary to common assertions made in the literature, binary weighted networks can train well with the same standard optimization techniques and similar hyperparameters settings as their full-precision counterparts, namely momentum SGD with large learning rates and $L_2$ regularization. The source code is publicly available at https://bitbucket.org/YanivShu/binary_weighted_networks_public
翻译:以量子化为基础的模型压缩是一种高性能和快速的推论方法,可以产生与全精度浮点对等模型相比高度压缩的模型。 最极端的定量化是1位参数的表示, 这样它们只有两种可能的值, 通常为 - 1(0) 或 +1. 将权重限制到二进制值的模型, 能够通过添加来高效实施无处不在的点乘法, 而不需要浮动点乘法, 这有利于资源受限的推断。 这项工作的主要贡献是引入一种方法, 平滑确定重量的二进制矢量的组合问题。 最极端的定量化是1比位参数的表示, 这样它们只有两种可能的值值值, 通常 - 1 (0) 或 + 1 + 1 或 + 1 + 1 。 将加权的权重限制到二进制值为二进制值, 只能通过额外的量值连续参数转换来高效地执行无处位的点产品。 拟议的方法在培训中增加间接管理费用, 无需对原始源对原始结构作任何重大修改, 也不会引入额外的饱和非线化非线性或辅助损失, 不引入额外的加权的计算, 并且不使用普通的正态的公算法化的正常的 。