Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.
翻译:-----
可解释性的符号回归可以搜索能够准确描述研究现象的解析表达式。 这种方法的主要吸引力在于返回一个可以为用户带来洞见的可解释模型。 历史上,符号回归的多数算法基于进化算法。 然而,最近出现了许多新的提议,这些提议使用了枚举算法、混合线性整数规划、神经网络和贝叶斯优化等方法。 为了评估这些新方法在现实世界数据中可能面临的常见挑战,我们在2022年基因和进化计算会议上举办了一项比赛,其中包括不同的合成和真实世界数据集,这些数据集对参赛者是盲的。 对于真实世界的比赛,我们通过使用领域专家来评估候选模型的可信度,以真实的方式评估可解释性。 我们对这次比赛的结果进行了深入分析,讨论了符号回归算法目前的挑战,并强调了未来比赛的可能改进之处。