Mutual Information (MI) has been widely used as a loss regularizer for training neural networks. This has been particularly effective when learn disentangled or compressed representations of high dimensional data. However, differential entropy (DE), another fundamental measure of information, has not found widespread use in neural network training. Although DE offers a potentially wider range of applications than MI, off-the-shelf DE estimators are either non differentiable, computationally intractable or fail to adapt to changes in the underlying distribution. These drawbacks prevent them from being used as regularizers in neural networks training. To address shortcomings in previously proposed estimators for DE, here we introduce KNIFE, a fully parameterized, differentiable kernel-based estimator of DE. The flexibility of our approach also allows us to construct KNIFE-based estimators for conditional (on either discrete or continuous variables) DE, as well as MI. We empirically validate our method on high-dimensional synthetic data and further apply it to guide the training of neural networks for real-world tasks. Our experiments on a large variety of tasks, including visual domain adaptation, textual fair classification, and textual fine-tuning demonstrate the effectiveness of KNIFE-based estimation. Code can be found at https://github.com/g-pichler/knife.
翻译:相互信息(MI)被广泛用作神经网络培训的“损失常规化” 。当学习解开或压缩高维数据时,这特别有效。然而,神经网络培训中并未广泛使用另一种基本的信息测量标准,即差异英特罗比(DE),尽管DE提供的应用范围可能比MI广泛,但现成的除去估计器要么无法区分,在计算上难以处理,或者无法适应基本分布的变化。这些缺陷使它们无法在神经网络培训中被作为规范者使用。为了解决以前提议的DE级估计器的缺陷,我们在此引入了KNIFE,这是完全参数化的、基于内核的DE的测量器。我们的方法的灵活性还使我们得以建立基于KNIFE的有条件(离散或连续变量)DE的估测器,以及MI。我们从经验上验证了我们关于高维合成数据的方法,并进一步将其用于指导神经网络的培训,以便进行真实世界的微调任务。我们关于大量参数的实验,包括视觉版数/在线文本的调整。