We present a method to probe rare molecular dynamics trajectories directly using reinforcement learning. We consider trajectories that are conditioned to transition between regions of configuration space in finite time, like those relevant in the study of reactive events, as well as trajectories exhibiting rare fluctuations of time-integrated quantities in the long time limit, like those relevant in the calculation of large deviation functions. In both cases, reinforcement learning techniques are used to optimize an added force that minimizes the Kullback-Leibler divergence between the conditioned trajectory ensemble and a driven one. Under the optimized added force, the system evolves the rare fluctuation as a typical one, affording a variational estimate of its likelihood in the original trajectory ensemble. Low variance gradients employing value functions are proposed to increase the convergence of the optimal force. The method we develop employing these gradients leads to efficient and accurate estimates of both the optimal force and the likelihood of the rare event for a variety of model systems.
翻译:我们提出了一个方法来直接利用强化学习来探测稀有的分子动态轨迹。 我们考虑以在有限的时间内在配置空间区域之间过渡为条件的轨迹,例如与研究反应事件有关的轨迹,以及在较长的时限内在时间综合数量上呈现罕见波动的轨迹,如与计算大型偏移函数有关的轨迹。在这两种情况下,使用强化学习技术来优化一种增加的力量,以最大限度地减少有条件的轨迹堆和驱动的轨迹之间的Kullback-Lible差。在优化的加力下,系统将稀有的波动演变成一种典型的波动,对最初轨迹堆中的可能性作出不同的估计。建议采用低差异梯度功能来增加最佳力量的趋同。我们开发的这些梯度方法可以对各种模型系统的最佳力量和稀有事件的可能性进行高效和准确的估计。