Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-Softmax transformation) build continuous relaxations that are discrete approximations in the zero-temperature limit, while others (such as sparsemax transformations and the Hard Concrete distribution) produce discrete/continuous hybrids. In this paper, we build rigorous theoretical foundations for these hybrids, which we call "mixed random variables." Our starting point is a new "direct sum" base measure defined on the face lattice of the probability simplex. From this measure, we introduce new entropy and Kullback-Leibler divergence functions that subsume the discrete and differential cases and have interpretations in terms of code optimality. Our framework suggests two strategies for representing and sampling mixed random variables, an extrinsic ("sample-and-project") and an intrinsic one (based on face stratification). We experiment with both approaches on an emergent communication benchmark and on modeling MNIST and Fashion-MNIST data with variational auto-encoders with mixed latent variables.
翻译:神经网络和其他机器学习模型可以计算连续的表达, 而人类则主要通过离散符号进行交流。 调和这两种形式的交流,对于产生人类可读的解释或学习离散的潜伏变量模型是可取的, 同时保持端到端到端的差异性。 一些现有的方法( 如 Gumbel- Softmax 转换) 建立连续的放松, 这些方法在零温限度内是离散的近似值, 而另一些方法( 如稀释式移动和硬体分布) 则产生离散/ 连续的混合体。 在本文件中, 我们为这些混合体建立了严格的理论基础, 我们称之为“ 混合随机变量 ” 。 我们的起点是一个新的“ 直接总” 基度测量值, 定义在概率简单x的面宽度上。 从这个尺度中, 我们引入新的诱导和 Kullback- Leback- Leiber 差异功能, 以代码的最佳性为分解。 我们的框架建议了两种战略, 代表并取样混合随机变量、 外延( “ am- proad- prout- prout) commal- missionalfilstal constrational rofactal roduction 。