$k$-subset sampling is ubiquitous in machine learning, enabling regularization and interpretability through sparsity. The challenge lies in rendering $k$-subset sampling amenable to end-to-end learning. This has typically involved relaxing the reparameterized samples to allow for backpropagation, with the risk of introducing high bias and high variance. In this work, we fall back to discrete $k$-subset sampling on the forward pass. This is coupled with using the gradient with respect to the exact marginals, computed efficiently, as a proxy for the true gradient. We show that our gradient estimator, SIMPLE, exhibits lower bias and variance compared to state-of-the-art estimators, including the straight-through Gumbel estimator when $k = 1$. Empirical results show improved performance on learning to explain and sparse linear regression. We provide an algorithm for computing the exact ELBO for the $k$-subset distribution, obtaining significantly lower loss compared to SOTA.
翻译:$k$ 子集取样在机器学习中是无处不在的,能够通过宽度实现正规化和可解释性。挑战在于使$k$子集取样便于端到端学习。这通常涉及放松重新计数的样品,以便进行背面调整,并有引入高偏差和高差异的风险。在这项工作中,我们返回到远道上离散的 $k$ 子集取样。这与使用精确边际梯度梯度梯度梯度(以高效计算)作为真实梯度的代用法相配合。我们显示,我们的梯度测仪SIMPLE(SIMPLE)与最先进的测算器相比,显示的偏差和差异较低,包括直通的 Gumbel 估测器(以美元=1美元计) 。精神结果显示,学习解释和细微线回归的成绩有所改善。我们提供了计算美元子集分布准确的ELBO的算法,与SATA相比,损失要低得多。