為數據科學而設的可解釋符號回歸：2022年競賽的分析可解釋符號回歸搜索可以精確描述研究現象的解析表達式。這種方法的主要魅力是它可以返回一個對使用者有洞見的可解釋模型。歷史上符號回歸的大多數算法都基於進化算法，但近來出現了很多新的方法，例如枚舉算法、混合整數線性規劃、神經網絡和貝葉斯優化等。為了評估這些新方法在現實世界數據中可能會面臨的常見挑戰，我們在2022年基因和進化計算會議上舉辦了一場競賽，其中包括不同的合成和真實世界數據集，這些數據集對參賽者是盲的。對於真實世界的比賽，我們通過使用領域專家來評估候選模型的可信度，以真實的方式評估可解釋性。我們對這次比賽的結果進行了深入分析，討論了符號回歸算法目前的挑戰，並強調了未來比賽的可能改進之處。 (Interpretable Symbolic Regression for Data Science: Analysis of the 2022 Competition)

翻译：為數據科學而設的可解釋符號回歸：2022年競賽的分析可解釋符號回歸搜索可以精確描述研究現象的解析表達式。這種方法的主要魅力是它可以返回一個對使用者有洞見的可解釋模型。歷史上符號回歸的大多數算法都基於進化算法，但近來出現了很多新的方法，例如枚舉算法、混合整數線性規劃、神經網絡和貝葉斯優化等。為了評估這些新方法在現實世界數據中可能會面臨的常見挑戰，我們在2022年基因和進化計算會議上舉辦了一場競賽，其中包括不同的合成和真實世界數據集，這些數據集對參賽者是盲的。對於真實世界的比賽，我們通過使用領域專家來評估候選模型的可信度，以真實的方式評估可解釋性。我們對這次比賽的結果進行了深入分析，討論了符號回歸算法目前的挑戰，並強調了未來比賽的可能改進之處。

F. O. de Franca,M. Virgolin,M. Kommenda,M. S. Majumder,M. Cranmer,G. Espada,L. Ingelse,A. Fonseca,M. Landajuela,B. Petersen,R. Glatt,N. Mundhenk,C. S. Lee,J. D. Hochhalter,D. L. Randall,P. Kamienny,H. Zhang,G. Dick,A. Simon,B. Burlacu,Jaan Kasak,Meera Machado,Casper Wilstrup,W. G. La Cava

from arxiv, 13 pages, 13 figures, submitted to IEEE Transactions on Evolutionary Computation

Symbolic regression searches for analytic expressions that accurately describe studied phenomena. The main attraction of this approach is that it returns an interpretable model that can be insightful to users. Historically, the majority of algorithms for symbolic regression have been based on evolutionary algorithms. However, there has been a recent surge of new proposals that instead utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization. In order to assess how well these new approaches behave on a set of common challenges often faced in real-world data, we hosted a competition at the 2022 Genetic and Evolutionary Computation Conference consisting of different synthetic and real-world datasets which were blind to entrants. For the real-world track, we assessed interpretability in a realistic way by using a domain expert to judge the trustworthiness of candidate models.We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.

翻译：----- 可解释性的符号回归可以搜索能够准确描述研究现象的解析表达式。这种方法的主要吸引力在于返回一个可以为用户带来洞见的可解释模型。历史上，符号回归的多数算法基于进化算法。然而，最近出现了许多新的提议，这些提议使用了枚举算法、混合线性整数规划、神经网络和贝叶斯优化等方法。为了评估这些新方法在现实世界数据中可能面临的常见挑战，我们在2022年基因和进化计算会议上举办了一项比赛，其中包括不同的合成和真实世界数据集，这些数据集对参赛者是盲的。对于真实世界的比赛，我们通过使用领域专家来评估候选模型的可信度，以真实的方式评估可解释性。我们对这次比赛的结果进行了深入分析，讨论了符号回归算法目前的挑战，并强调了未来比赛的可能改进之处。