Notable progress has been made in numerous fields of machine learning based on neural network-driven mutual information (MI) bounds. However, utilizing the conventional MI-based losses is often challenging due to their practical and mathematical limitations. In this work, we first identify the symptoms behind their instability: (1) the neural network not converging even after the loss seemed to converge, and (2) saturating neural network outputs causing the loss to diverge. We mitigate both issues by adding a novel regularization term to the existing losses. We theoretically and experimentally demonstrate that added regularization stabilizes training. Finally, we present a novel benchmark that evaluates MI-based losses on both the MI estimation power and its capability on the downstream tasks, closely following the pre-existing supervised and contrastive learning settings. We evaluate six different MI-based losses and their regularized counterparts on multiple benchmarks to show that our approach is simple yet effective.
翻译:在许多基于神经网络驱动的相互信息(MI)的机器学习领域取得了显著进展。然而,利用常规的MI型损失往往因其实际和数学限制而具有挑战性。在这项工作中,我们首先找出其不稳定性背后的症状:(1) 神经网络即使在损失似乎趋同之后仍未趋同,(2) 导致损失的神经网络产出饱和导致差异。我们通过在现有损失中增加一个新的正规化术语来缓解这两个问题。我们理论上和实验性地证明,增加的正规化使培训稳定下来。最后,我们提出了一个新的基准,用以评估MI型损失,既包括MI型估算能力,也包括下游任务的能力,密切跟踪先前存在的监督和对比学习环境。我们评估了基于MI型的六种不同损失及其在多个基准上的正规化对应方,以表明我们的方法简单而有效。