Density ratio estimation (DRE) is a fundamental machine learning technique for comparing two probability distributions. However, existing methods struggle in high-dimensional settings, as it is difficult to accurately compare probability distributions based on finite samples. In this work we propose DRE-\infty, a divide-and-conquer approach to reduce DRE to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We then estimate the instantaneous rate of change of the bridge distributions indexed by time (the "time score") -- a quantity defined analogously to data (Stein) scores -- with a novel time score matching objective. Crucially, the learned time scores can then be integrated to compute the desired density ratio. In addition, we show that traditional (Stein) scores can be used to obtain integration paths that connect regions of high density in both distributions, improving performance in practice. Empirically, we demonstrate that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
翻译:密度比估计(DRE)是比较两种概率分布的基本机器学习技术。 但是,现有的方法在高维环境中挣扎,因为很难准确地比较基于有限样本的概率分布。 在这项工作中,我们提议DRE-\infty, 一种分与分的方法, 将DRE减为一系列较容易的子问题。 在蒙特卡洛方法的启发下, 我们通过中间桥分布的无限连续线, 将两种分布顺利地相互交错。 然后我们估计按时间( “ 时间分 ” ) 索引的桥分布的瞬间变化速度( 时间分) -- -- 一种与数据( Stein) 分相近界定的数量 -- -- 与新颖的时间评分匹配目标。 关键是, 学过的时间分可以集成来计算理想的密度比率。 此外, 我们显示, 传统的( Stein) 分可以用来获得连接两种分布高度密度的区域的集成路径, 提高实际绩效。 我们生动地表明, 我们的方法在下游任务上表现得很好, 例如相互的信息估计和高度数据模型。