Equation discovery, also known as symbolic regression, is a type of automated modeling that discovers scientific laws, expressed in the form of equations, from observed data and expert knowledge. Deterministic grammars, such as context-free grammars, have been used to limit the search spaces in equation discovery by providing hard constraints that specify which equations to consider and which not. In this paper, we propose the use of probabilistic context-free grammars in the context of equation discovery. Such grammars encode soft constraints on the space of equations, specifying a prior probability distribution on the space of possible equations. We show that probabilistic grammars can be used to elegantly and flexibly formulate the parsimony principle, that favors simpler equations, through probabilities attached to the rules in the grammars. We demonstrate that the use of probabilistic, rather than deterministic grammars, in the context of a Monte-Carlo algorithm for grammar-based equation discovery, leads to more efficient equation discovery. Finally, by specifying prior probability distributions over equation spaces, the foundations are laid for Bayesian approaches to equation discovery.
翻译:方程式发现是一种自动模型,它从观察到的数据和专家知识中以方程式的形式表达出科学法,其表现形式为观察到的数据和专家知识。确定性语法,例如无上下文语法,已经用来限制方程发现中的搜索空间,办法是提供硬性限制,具体说明哪些方程需要考虑,哪些没有。在本文中,我们提议在方程发现中使用无概率背景语法。这种语法对方程空间的软性限制进行了编码,具体说明了可能的方程式空间的先前概率分布。我们表明,概率语法可以被用来优雅和灵活地拟订方程原则,这种方程有利于更简单的方程,办法是提供与语法规则相关的概率。我们证明,在基于方程式的方程式发现中,在蒙特-卡尔洛方程的算法中,使用不使用确定性方程的语法,而不是确定性方程的语法,导致更高效的方程式发现。最后,我们指出,通过说明先前的概率分布方程式,在空间上设定了Bay方程式的方程式。