Coarse-grained (CG) molecular simulations have become a standard tool to study molecular processes on time- and length-scales inaccessible to all-atom simulations. Parameterizing CG force fields to match all-atom simulations has mainly relied on force-matching or relative entropy minimization, which require many samples from costly simulations with all-atom or CG resolutions, respectively. Here we present flow-matching, a new training method for CG force fields that combines the advantages of both methods by leveraging normalizing flows, a generative deep learning method. Flow-matching first trains a normalizing flow to represent the CG probability density, which is equivalent to minimizing the relative entropy without requiring iterative CG simulations. Subsequently, the flow generates samples and forces according to the learned distribution in order to train the desired CG free energy model via force matching. Even without requiring forces from the all-atom simulations, flow-matching outperforms classical force-matching by an order of magnitude in terms of data efficiency, and produces CG models that can capture the folding and unfolding transitions of small proteins.
翻译:粗粒分子模拟( CG) 已成为一种标准工具,用于在全原子模拟所无法进入的时间和长度尺度上研究分子过程; 使 CG 强制场与全原子模拟相匹配的参数主要依靠力配比或相对的酶最小化,这需要用所有原子或CG分辨率分别进行的费用昂贵的模拟的样本。 我们在这里为CG 力场展示一种新的流程匹配方法,它通过利用正常化流来结合两种方法的优势,这是一种基因化深层次学习方法。 流程匹配首先使流正常化,代表 CG 概率密度,这相当于在不需要反复进行 CG 模拟的情况下将相对的酶最小化。 随后, 流动会根据所学的分布生成样本和力, 以便通过武力匹配来培训想要的 CG 自由能源模型。 即使不需要来自全原子模拟的力量, 流程配比优优优的经典力在数据效率上达到一定的级, 并产生CG 模型, 能够捕捉到小蛋白质的折叠和不断转变。