Partitioning a set of elements into subsets of a priori unknown sizes is essential in many applications. These subset sizes are rarely explicitly learned - be it the cluster sizes in clustering applications or the number of shared versus independent generative latent factors in weakly-supervised learning. Probability distributions over correct combinations of subset sizes are non-differentiable due to hard constraints, which prohibit gradient-based optimization. In this work, we propose the differentiable hypergeometric distribution. The hypergeometric distribution models the probability of different group sizes based on their relative importance. We introduce reparameterizable gradients to learn the importance between groups and highlight the advantage of explicitly learning the size of subsets in two typical applications: weakly-supervised learning and clustering. In both applications, we outperform previous approaches, which rely on suboptimal heuristics to model the unknown size of groups.
翻译:将一组元素分割成具有先验未知大小的子集在许多应用中至关重要。 这些子集大小很少被明确了解, 无论是群集应用中的群集大小, 还是在监管不力的学习中共享的和独立的基因潜在因素的数量。 子集大小的正确组合的概率分布是无法区分的, 因为困难的制约, 禁止基于梯度的优化。 在这项工作中, 我们建议了不同的超几何分布模式。 超几何分布模型是不同群规模的概率, 取决于它们的相对重要性。 我们引入了可重新测量的梯度, 以学习各组间的重要性, 并突出在两种典型应用中明确学习子群规模的优势: 弱于监督的学习和组合。 在这两种应用中, 我们比以往的方法要优, 前者依靠亚优度的外观来模拟未知的群体大小 。