Coarse-grained (CG) molecular simulations have become a standard tool to study molecular processes on time- and length-scales inaccessible to all-atom simulations. Parameterizing CG force fields to match all-atom simulations has mainly relied on force-matching or relative entropy minimization, which require many samples from costly simulations with all-atom or CG resolutions, respectively. Here we present flow-matching, a new training method for CG force fields that combines the advantages of both methods by leveraging normalizing flows, a generative deep learning method. Flow-matching first trains a normalizing flow to represent the CG probability density, which is equivalent to minimizing the relative entropy without requiring iterative CG simulations. Subsequently, the flow generates samples and forces according to the learned distribution in order to train the desired CG energy model via force matching. Even without requiring forces from the all-atom simulations, flow-matching outperforms classical force-matching by an order of magnitude in terms of data efficiency, and produces CG models that can capture the folding and unfolding transitions of small proteins.
翻译:粗粒分子模拟( CG) 已成为一种标准工具,用于在全原子模拟所无法进入的时间和长度尺度上研究分子过程; 使 CG 强制场与全原子模拟相匹配的参数主要依靠力配比或相对的酶最小化,这需要用所有原子或CG分辨率进行昂贵的模拟的很多样本; 我们在这里为CG 力场提供一种新的流动匹配培训方法,将两种方法的优势结合起来,即利用正常化的流程(一种基因深层次学习方法)。 流程配对第一个列正常化流程,以代表 CG 概率密度,这相当于在不需要反复进行 CG 模拟的情况下将相对的酶最小化。 随后, 流动根据所学的分布生成样本和力, 以便通过武力匹配来培训想要的 CG 能源模型。 即使不需要来自全原子模拟的力量, 流程配配制超出经典力的匹配, 在数据效率方面以数量顺序来测量, 并生成 CG 模型, 能够捕获小蛋白质的折叠和正在形成的转变 。