Computing accurate reaction rates is a central challenge in computational chemistry and biology because of the high cost of free energy estimation with unbiased molecular dynamics. In this work, a data-driven machine learning algorithm is devised to learn collective variables with a multitask neural network, where a common upstream part reduces the high dimensionality of atomic configurations to a low dimensional latent space, and separate downstream parts map the latent space to predictions of basin class labels and potential energies. The resulting latent space is shown to be an effective low-dimensional representation, capturing the reaction progress and guiding effective umbrella sampling to obtain accurate free energy landscapes. This approach is successfully applied to model systems including a 5D M\"uller Brown model, a 5D three-well model, and alanine dipeptide in vacuum. This approach enables automated dimensionality reduction for energy controlled reactions in complex systems, offers a unified framework that can be trained with limited data, and outperforms single-task learning approaches, including autoencoders.
翻译:计算准确反应率是计算化学和生物学的一个中心挑战,因为使用不偏心分子动力进行免费能源估算的成本很高。在这项工作中,设计了一个数据驱动的机器学习算法,用多任务神经网络学习集体变量,其中有一个共同的上游部分,将原子配置的高度维度降低到低维潜伏空间,另外的下游部分绘制了预测盆地级标签和潜在能量的潜在空间图。由此形成的潜伏空间被证明是一个有效的低维代表,捕捉反应进展并指导有效的伞式取样,以获得准确的免费能源景观。这个方法被成功地应用于模型系统,包括5D M\uller Brown模型、5D 3-well模型和真空中的alanine Dipptide。这个方法使复杂系统中能源控制反应的自动维度下降成为了能够用有限数据进行训练的统一框架,并超越了单塔学习方法,包括自动电解器。