Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural-network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.
翻译:高维概率分布之间差异和距离的变异性表现在许多研究领域提供了重要的理论洞察力和实际优势。最近,在机器学习中,它们作为培训概率模型和数据分布之间的统计区分的可移动和可扩展的方法,在机器学习中越来越受欢迎。它们的优点包括:1)它们可以从数据中估算出,作为高维概率分布之间的差异和距离。2)这种表现能够利用神经网络的能力,有效地接近功能空间的最佳解决方案。然而,缺乏系统而实用的方法来改进这种变异公式的紧凑性,从而加快数据统计学习和估算。我们在这里开发了这样一种方法,用于建立新的、更紧密的差异变化表征。我们的方法依赖于通过辅助优化问题构建更好的目标功能。此外,对目标功能赫斯的计算揭示了共同最佳变异性解决方案的局部曲度差异;这量化了不同变异性表述之间的近似性增益。最后,利用神经网络优化进行的数字模拟表明,更紧密的表达方式能够大大加快学习,并且更准确地估计差异的大小(往往以更快速的速度),而不是以更快速的速度(以更快速的速度)计算。