Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simplicity and efficiency, in particular in deep networks where unbiased estimators are impractical. Several techniques were proposed to improve over ST while keeping the same low computational complexity: Gumbel-Softmax, ST-Gumbel-Softmax, BayesBiNN, FouST. We conduct a theoretical analysis of Bias and Variance of these methods in order to understand tradeoffs and verify the originally claimed properties. The presented theoretical results are mainly negative, showing limitations of these methods and in some cases revealing serious issues.
翻译:在许多机器学习模型中,特别是在具有二元潜伏状态的变异自动编码器和随机二元网络中,出现分化,特别是二元随机变量。当学习这些模型时,一个关键工具是对二元变量概率的预期损失梯度的估测。直通(ST)估测器由于其简单性和效率而受到欢迎,特别是在不切实际的公正估测器不切实际的深网络中。一些技术建议改进ST,同时保持同样的低计算复杂性:Gumbel-Softmax、ST-Gumbel-Softmax、BayesBINN、FouST。我们从理论上分析了Bias和这些方法的差异,以便理解权衡和核实最初索赔的属性。提出的理论结果主要是负面的,显示了这些方法的局限性,并在某些情况下揭示了严重问题。