对罕见的轨迹取样采用强化学习方法 (A reinforcement learning approach to rare trajectory sampling)

Very often when studying non-equilibrium systems one is interested in analysing dynamical behaviour that occurs with very low probability, so called rare events. In practice, since rare events are by definition atypical, they are often difficult to access in a statistically significant way. What are required are strategies to "make rare events typical" so that they can be generated on demand. Here we present such a general approach to adaptively construct a dynamics that efficiently samples atypical events. We do so by exploiting the methods of reinforcement learning (RL), which refers to the set of machine learning techniques aimed at finding the optimal behaviour to maximise a reward associated with the dynamics. We consider the general perspective of dynamical trajectory ensembles, whereby rare events are described in terms of ensemble reweighting. By minimising the distance between a reweighted ensemble and that of a suitably parametrised controlled dynamics we arrive at a set of methods similar to those of RL to numerically approximate the optimal dynamics that realises the rare behaviour of interest. As simple illustrations we consider in detail the problem of excursions of a random walker, for the case of rare events with a finite time horizon; and the problem of a studying current statistics of a particle hopping in a ring geometry, for the case of an infinite time horizon. We discuss natural extensions of the ideas presented here, including to continuous-time Markov systems, first passage time problems and non-Markovian dynamics.

翻译：当研究非平衡系统时,人们往往非常经常地对分析极低概率发生的动态行为感兴趣,因此称之为罕见事件。在实践上,由于稀有事件的定义是非典型的,因此往往难以以具有统计意义的方式获取。所需要的是“使稀有事件具有典型性”的战略,以便根据需求产生这些事件。我们在这里展示了一种适应性构建动态的通用方法,这种动态能够有效地模拟非典型事件。我们这样做的方法是利用强化学习方法(RL),这种方法指的是一套机器学习技术,旨在找到最佳行为,最大限度地发挥与动态相关的奖励作用。我们考虑了动态轨迹组合的总体观点,即稀有事件以共性重标注的方式描述。通过最小化重标注总和适当匹配控制动态动态之间的距离,我们达到了一套与RL相似的方法,从数字上近似于了解罕见的兴趣行为的最佳动态。作为简单的例子,我们从细述了动态轨迹轨迹的细化问题,即以串联式重标度重标度的方式描述罕见的自然时空的时空空间,我们研究一个难测的时空的时空的时空问题。