Diffusion-based samplers learn to sample complex, high-dimensional distributions using energies or log densities alone, without training data. Yet, they remain impractical for molecular sampling because they are often slower than molecular dynamics and miss thermodynamically relevant modes. Inspired by enhanced sampling, we encourage exploration by introducing a sequential bias along bespoke, information-rich, low-dimensional projections of atomic coordinates known as collective variables (CVs). We introduce a repulsive potential centered on the CVs from recent samples, which pushes future samples towards novel CV regions and effectively increases the temperature in the projected space. Our resulting method improves efficiency, mode discovery, enables the estimation of free energy differences, and retains independent sampling from the approximate Boltzmann distribution via reweighting by the bias. On standard peptide conformational sampling benchmarks, the method recovers diverse conformational states and accurate free energy profiles. We are the first to demonstrate reactive sampling using a diffusion-based sampler, capturing bond breaking and formation with universal interatomic potentials at near-first-principles accuracy. The approach resolves reactive energy landscapes at a fraction of the wall-clock time of standard sampling methods, advancing diffusion-based sampling towards practical use in molecular sciences.
翻译:基于扩散的采样器仅利用能量或对数密度即可学习从复杂高维分布中采样,无需训练数据。然而,其在分子采样中仍不实用,因为通常比分子动力学模拟更慢,且会遗漏热力学相关模态。受增强采样方法启发,我们通过沿定制化的、信息丰富的原子坐标低维投影(即集体变量)引入序列偏置来促进探索。我们构建了一个以近期样本的集体变量为中心的排斥势,该势能将未来样本推向新的集体变量区域,从而有效提升投影空间的温度。所提出的方法提升了采样效率与模态发现能力,支持自由能差计算,并通过偏置重加权保持从近似玻尔兹曼分布中独立采样的特性。在标准肽构象采样基准测试中,该方法成功恢复了多样化的构象态并获得了精确的自由能分布。我们首次实现了基于扩散采样器的反应性采样,在接近第一性原理精度的通用原子间势能下捕捉化学键断裂与形成过程。该方法以远少于标准采样方法的实际计算时间解析了反应性能量景观,推动了扩散采样在分子科学领域的实际应用。