Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation not only yields a new unified theoretical framework encompassing popular variational MI bounds but also leads to a novel, simple, and powerful contrastive MI estimator named as FLO. Theoretically, we show that the FLO estimator is tight, and it provably converges under stochastic gradient descent. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
翻译:成功应用InfoNCE及其变体,推广了在机器学习中使用对比式互换信息(MI)测算器。这些测算器虽然具有较高的稳定性,但主要依赖昂贵的大型批量培训,并且为了减少差异而牺牲紧紧的距离。为了克服这些限制,我们从未经标准化的统计模型和锥形优化的镜头中重新审视流行的变异MI界限的数学。我们的调查不仅产生一个新的统一理论框架,包括流行的变异MI界限,而且还导致一个新颖、简单和强大的对比式MI测算器,称为FLO。理论上,我们显示FLO的测算器很紧,在随机的梯度梯度下降下可被可喜地聚合。 随机看来,我们的FLO测算器克服了其前身的局限性,并更有效率地学习。通过一套广泛的基准来验证FLO的效用,这也揭示了实际MI估算的利弊。