Discrete and especially binary random variables occur in many machine learning models, notably in variational autoencoders with binary latent states and in stochastic binary networks. When learning such models, a key tool is an estimator of the gradient of the expected loss with respect to the probabilities of binary variables. The straight-through (ST) estimator gained popularity due to its simplicity and efficiency, in particular in deep networks where unbiased estimators are impractical. Several techniques were proposed to improve over ST while keeping the same low computational complexity: Gumbel-Softmax, ST-Gumbel-Softmax, BayesBiNN, FouST. We conduct a theoretical analysis of bias and variance of these methods in order to understand tradeoffs and verify the originally claimed properties. The presented theoretical results allow for better understanding of these methods and in some cases reveal serious issues.
翻译:在许多机器学习模型中,特别是在具有双重潜伏状态的变异自动编码器和随机二进制网络中,出现分化,特别是二进制随机变量。当学习这些模型时,一个关键工具是对二进制变量概率方面预期损失梯度的估测。直通(ST)估测器由于其简单性和效率而受到欢迎,特别是在不切实际的公正估测器不切实际的深层网络中。一些技术建议改进ST,同时保持同样的低计算复杂性:Gumbel-Softmax、ST-Gumbel-Softmax、BayesBINN、FouST。我们对这些方法的偏差和差异进行理论分析,以便了解权衡和核实最初索赔的属性。提出的理论结果使人们能够更好地理解这些方法,并在某些情况下揭示了严重问题。