Binary networks are extremely efficient as they use only two symbols to define the network: $\{+1,-1\}$. One can make the prior distribution of these symbols a design choice. The recent IR-Net of Qin et al. argues that imposing a Bernoulli distribution with equal priors (equal bit ratios) over the binary weights leads to maximum entropy and thus minimizes information loss. However, prior work cannot precisely control the binary weight distribution during training, and therefore cannot guarantee maximum entropy. Here, we show that quantizing using optimal transport can guarantee any bit ratio, including equal ratios. We investigate experimentally that equal bit ratios are indeed preferable and show that our method leads to optimization benefits. We show that our quantization method is effective when compared to state-of-the-art binarization methods, even when using binary weight pruning.
翻译:二进制网络效率极高,因为它们只使用两个符号来定义网络:$$1,-1,$1,$1,$1,$1。我们可以将这些符号的先前分布作为设计选择。最近秦等人的IR-Net认为,强制实行比二进制重量具有同等前科(相等比位比率)的伯努利分配会导致最大增殖,从而最大限度地减少信息损失。然而,先前的工作无法准确控制训练期间的二进制重量分布,因此无法保证最大增殖。在这里,我们证明使用最佳运输方法的量化可以保证任何比特比率,包括等同比率。我们实验性地调查的是,相等比特比率确实更可取,并表明我们的方法可以带来优化效益。我们显示,与最先进的二进制方法相比,我们的四进制方法是有效的,即使使用二进制重量的计算方法。