Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on physical and chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide mappings to a low-dimensional manifold from enhanced sampling simulations as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations.
翻译:在计算物理和化学中,增强的取样方法是必不可少的,因为在计算物理和化学中,原子模拟无法对由于取样问题而导致的动态系统的高维配置空间进行详尽的抽样。这类增强的取样方法的一类通过确定一些缓慢的自由度、称为集体变量(CVs)和增强这些CVs的取样方法发挥作用。选择分析和驱动取样的CVs并不是微不足道的,而且往往依赖物理和化学直觉。尽管经常利用多重学习来绕过这一问题,通过标准模拟直接估算CVs的变异性概率,但这类方法无法提供从强化的取样模拟到低维度的图象。在这里,我们处理这一关键的问题,并提供一个基于厌异性扩散图的总体再加权框架,用于多重学习,同时考虑到学习数据集是从偏差的概率分布中抽样抽样。我们考虑的多元学习方法基于描述高度样本之间过渡概率概率的模型。我们证明我们的框架会恢复产生正确描述平衡密度的CVs的偏差效应。这种进步使得我们能够直接地用高度数据模拟模型来构建。