Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.
翻译:符号回归是搜索能够准确描述研究现象的解析表达式的方法。这种方法的主要吸引力在于它返回了一个可解释的模型,对用户有洞见。历史上,符号回归的大多数算法都基于进化算法。然而,最近出现了许多新的提议,它们采用诸如枚举算法、混合线性整数规划、神经网络和贝叶斯优化等方法。为了评估这些新方法在一组常见的具有挑战性的任务中的表现,我们在2022年的遗传和进化计算会议上举办了一次竞赛,其中包含一些不对参赛者公开的合成和现实世界的数据集。对于现实世界的数据集,我们以真实的方式评估可解释性,即使用领域专家来评估候选模型的可信度。我们对竞赛结果进行了深入分析,讨论了符号回归算法的当前挑战,并突出了未来竞赛的可能改进之处。