Symbolic Regression (SR) algorithms learn analytic expressions which both accurately fit data and, unlike traditional machine-learning approaches, are highly interpretable. Conventional SR suffers from two fundamental issues which we address in this work. First, since the number of possible equations grows exponentially with complexity, typical SR methods search the space stochastically and hence do not necessarily find the best function. In many cases, the target problems of SR are sufficiently simple that a brute-force approach is not only feasible, but desirable. Second, the criteria used to select the equation which optimally balances accuracy with simplicity have been variable and poorly motivated. To address these issues we introduce a new method for SR -- Exhaustive Symbolic Regression (ESR) -- which systematically and efficiently considers all possible equations and is therefore guaranteed to find not only the true optimum but also a complete function ranking. Utilising the minimum description length principle, we introduce a principled method for combining these preferences into a single objective statistic. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding $\sim$40 functions (out of 5.2 million considered) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not necessarily prefer a $\Lambda$CDM expansion history, and traditional SR algorithms that return only the Pareto-front, even if they found this successfully, would not locate $\Lambda$CDM. We make our code and full equation sets publicly available.
翻译:符号回归(SR)算法学会了一种分析性表达方式,这些表达方式与传统的机器学习方法不同,它们都精确地符合数据,而且与传统的机器学习方法不同,它们都是高度可解释的。常规SR受到我们在工作中处理的两个根本性问题的影响。首先,由于可能的方程式数量随着复杂性而成倍增长,典型的SR方法会搜索空间,因此不一定找到最佳的功能。在许多情况下,SR的目标问题非常简单,因此,布鲁特力方法不仅可行,而且也是可取的。第二,用于选择最优平衡精确与简单相平衡的方程式所使用的标准是变异和动机不良的。为了解决这些问题,我们引入了一种新的SR方法 -- -- 累进性符号回归(ESR) -- 系统而高效地考虑所有可能的方程式数量,因此保证不仅能找到真正的最佳,而且不会找到完整的函数排序。我们运用了最低描述长度的原则性方法,将这些偏好的方法结合到一个单一的客观统计。因此,为了说明ESR的实力,我们将它应用到一个宇宙正数计计数计数计数的目录中,而不是Pantheon+Rial relial relial revial revial reviewal 。