We propose a simple, scalable algorithm for using stochastic interpolants to sample from unnormalized densities and for fine-tuning generative models. The approach, Tilt Matching, arises from a dynamical equation relating the flow matching velocity to one targeting the same distribution tilted by a reward, implicitly solving a stochastic optimal control problem. The new velocity inherits the regularity of stochastic interpolant transports while also being the minimizer of an objective with strictly lower variance than flow matching itself. The update to the velocity field can be interpreted as the sum of all joint cumulants of the stochastic interpolant and copies of the reward, and to first order is their covariance. The algorithms do not require any access to gradients of the reward or backpropagating through trajectories of the flow or diffusion. We empirically verify that the approach is efficient and highly scalable, providing state-of-the-art results on sampling under Lennard-Jones potentials and is competitive on fine-tuning Stable Diffusion, without requiring reward multipliers. It can also be straightforwardly applied to tilting few-step flow map models.
翻译:我们提出了一种简单且可扩展的算法,利用随机插值函数从未归一化密度中采样并对生成模型进行微调。该方法——倾斜匹配——源于一个动力学方程,该方程将流匹配速度与一个以奖励函数为倾斜项的相同分布目标联系起来,从而隐式求解了一个随机最优控制问题。新的速度继承了随机插值传输的正则性,同时也是一个严格比流匹配本身方差更低的目标函数的最小化解。速度场的更新可解释为随机插值函数与奖励函数副本的所有联合累积量之和,其一阶近似即为它们的协方差。该算法无需获取奖励函数的梯度,也无需通过流或扩散的轨迹进行反向传播。我们通过实验验证了该方法的高效性和高度可扩展性,在Lennard-Jones势下的采样任务中取得了最先进的结果,并在微调Stable Diffusion模型上表现出竞争力,且无需引入奖励乘子。该方法也可直接应用于对少数步长流映射模型进行倾斜调整。