Scientists have long aimed to discover meaningful formulae which accurately describe experimental data. One common approach is to manually create mathematical models of natural phenomena using domain knowledge, then fit these models to data. In contrast, machine-learning algorithms automate the construction of accurate data-driven models while consuming large amounts of data. Ensuring that such models are consistent with existing knowledge is an open problem. We develop a method for combining logical reasoning with symbolic regression, enabling principled derivations of models of natural phenomena. We demonstrate these concepts for Kepler's third law of planetary motion, Einstein's relativistic time-dilation law, and Langmuir's theory of adsorption, automatically connecting experimental data with background theory in each case. We show that laws can be discovered from few data points when using formal logical reasoning to distinguish the correct formula from a set of plausible formulas that have similar error on the data. The combination of reasoning with machine learning provides generalizable insights into key aspects of natural phenomena. We envision that this combination will enable derivable discovery of fundamental laws of science. We believe that this is a crucial first step for connecting the missing links in automating the scientific method.
翻译:科学家长期以来一直致力于发现能准确描述实验数据的有意义的公式。 一种常见的方法是手动创建利用域知识的自然现象数学模型,然后将这些模型与数据相匹配。 相反,机器学习算法在消耗大量数据的同时,将构建精确的数据驱动模型自动化。 确保这些模型与现有知识相一致是一个开放的问题。 我们开发了一种方法,将逻辑推理与象征性回归相结合,从而能够对自然现象模型进行有原则的衍生。 我们展示了开普勒的行星运动第三定律、爱因斯坦的相对时间关系法以及兰穆尔的吸附理论的这些概念,将实验数据与每种情况的背景理论自动连接起来。 我们显示,在使用正式逻辑推理来区分正确的公式与在数据上出现类似错误的一套貌似公式之间,可以从少数几个数据点上发现法律。 将推理学与机器学习相结合,为自然现象的关键方面提供了可概括的洞察力。 我们设想,这种结合将使得基本科学法则能够产生可推断的发现。 我们认为,这是在科学方法自动化过程中将缺失的链接连接的关键第一步。