We develop a novel data-driven approach to the inverse problem of classical statistical mechanics: given experimental data on the collective motion of a classical many-body system, how does one characterise the free energy landscape of that system? By combining non-parametric Bayesian inference with physically-motivated constraints, we develop an efficient learning algorithm which automates the construction of approximate free energy functionals. In contrast to optimisation-based machine learning approaches, which seek to minimise a cost function, the central idea of the proposed Bayesian inference is to propagate a set of prior assumptions through the model, derived from physical principles. The experimental data is used to probabilistically weigh the possible model predictions. This naturally leads to humanly interpretable algorithms with full uncertainty quantification of predictions. In our case, the output of the learning algorithm is a probability distribution over a family of free energy functionals, consistent with the observed particle data. We find that surprisingly small data samples contain sufficient information for inferring highly accurate analytic expressions of the underlying free energy functionals, making our algorithm highly data efficient. We consider excluded volume particle interactions, which are ubiquitous in nature, whilst being highly challenging for modelling in terms of free energy. To validate our approach we consider the paradigmatic case of one-dimensional fluid and develop inference algorithms for the canonical and grand-canonical statistical-mechanical ensembles. Extensions to higher-dimensional systems are conceptually straightforward, whilst standard coarse-graining techniques allow one to easily incorporate attractive interactions.
翻译:我们开发了一种新型的数据驱动方法,以应对古典统计力的反面问题:根据关于古典多体系统集体运动的实验数据,实验数据如何将这个系统的自由能源景观定性为一种典型的实验性数据?我们开发了一种高效的学习算法,将非对称的贝叶斯推论与物理动机的制约结合起来,从而将大约自由能源功能的构建自动化。与优化的机器学习方法相比,基于优化的机器学习方法寻求将成本功能降到最低,拟议的巴耶斯推论的核心思想是通过模型,以直接的方式传播一套从物理原理中推导出来的先前的假设。实验数据被用来对可能的模式预测进行概率性权衡。这自然导致人性可解释的算法,同时对预测进行完全不确定的量化。就我们而言,学习算法的输出是自由能源功能系列的概率分布,这与观察的粒子数据数据数据数据数据数据数据数据数据数据采集方法相比,令人惊讶地包含足够的信息,用来推断基础自由能源功能的精确度表达方式,使我们的算法具有高度的数据效率。我们认为,我们测算算出高度的高级的量粒质的计算方法,在高度的模型中,在高度的模型中,我们研究中,可以将一个具有高度的量级的量级的模型中,可以进行一种推判量的能量的模型的模型学的模型的推算。