Neural networks and other machine learning models compute continuous representations, while humans communicate with discrete symbols. Reconciling these two forms of communication is desirable to generate human-readable interpretations or to learn discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-softmax transformation) build continuous relaxations that are discrete approximations in the zero-temperature limit, while others (such as sparsemax transformations and the hard concrete distribution) produce discrete/continuous hybrids. In this paper, we build rigorous theoretical foundations for these hybrids. Our starting point is a new "direct sum" base measure defined on the face lattice of the probability simplex. From this measure, we introduce a new entropy function that includes the discrete and differential entropies as particular cases, and has an interpretation in terms of code optimality, as well as two other information-theoretic counterparts that generalize the mutual information and Kullback-Leibler divergences. Finally, we introduce "mixed languages" as strings of hybrid symbols and a new mixed weighted finite state automaton that recognizes a class of regular mixed languages, generalizing closure properties of regular languages.
翻译:神经网络和其他机器学习模型计算连续表达, 而人类则与离散符号进行通信。 调和这两种通信形式是可取的, 以产生人类可读的解释, 或者学习离散潜伏变量模型, 同时保持端到端的差异性。 一些现有的方法( 如 Gumber- softmax 转换) 建立连续的放松, 这些方法在零温限度内是离散近, 而另一些方法( 如 稀疏的负轴转换和硬混凝土分布) 则产生离散/ 连续的混合体。 在本文中, 我们为这些混合体建立严格的理论基础。 我们的起点是一个新的“ 直接总和” 基度测量, 定义在概率简单x的面盘中。 从此测量中, 我们引入了一个新的微积函数, 包括离散和差异的元素, 在零温度限制范围内, 以及另外两个信息- 理论对应方, 将共同的信息和 Kullback- Leiter 差异化。 最后, 我们引入“ 混合语言”, 作为常规混合混合符号和新组合定定的自动状态等语言的字符。