Scientists have long aimed to discover meaningful formulae which accurately describe experimental data. A common approach is to manually create mathematical models of natural phenomena using domain knowledge, and then fit these models to data. In contrast, machine-learning algorithms automate the construction of accurate data-driven models while consuming large amounts of data. The problem of enforcing logic constraints on the functional form of a learned model (e.g., nonnegativity) has been explored in the literature; however, finding models that are consistent with general background knowledge is an open problem. We develop a method for combining logical reasoning with symbolic regression, enabling principled derivations of models of natural phenomena. We demonstrate these concepts for Kepler's third law of planetary motion, Einstein's relativistic time-dilation law, and Langmuir's theory of adsorption, automatically connecting experimental data with background theory in each case. We show that laws can be discovered from few data points when using formal logical reasoning to distinguish the correct formula from a set of plausible formulas that have similar error on the data. The combination of reasoning with machine learning provides generalizeable insights into key aspects of natural phenomena. We envision that this combination will enable derivable discovery of fundamental laws of science and believe that our work is a crucial first step towards automating the scientific method.
翻译:科学家长期以来一直致力于发现能准确描述实验数据的有意义的公式。 一种共同的方法是手动创建利用域知识的自然现象数学模型,然后将这些模型与数据相匹配。 相反,机器学习算法在消耗大量数据的同时,将构建精确的数据驱动模型的自动化。 文献中探讨了对学习模型的功能形式的逻辑约束(例如非增强性)问题;然而,发现与一般背景知识相一致的模型是一个开放的问题。 我们开发了一种方法,将逻辑推理与象征性回归相结合,使自然现象模型能够有原则的衍生。 我们展示了开普勒第三代行星运动法、爱因斯坦相对论时间变法和朗穆尔的吸附理论的这些概念,自动将实验数据与每个案例的背景理论联系起来。 我们表明,在使用正式逻辑推理来区分正确的公式和在数据上存在类似错误的一套合理公式时,法律可以从几个数据中发现出来。 与机器学习相结合的推理法为自然现象的关键方面提供了可概括的洞察力的洞察力。 我们设想,这一基本的科学方法将使得我们的基本研究方法得以进行。