Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled with heuristics such as greedy or genetic algorithms and, while some works have hinted at the possible hardness of SR, no proof has yet been given that SR is, in fact, NP-hard. This begs the question: Is there an exact polynomial-time algorithm to compute SR models? We provide evidence suggesting that the answer is probably negative by showing that SR is NP-hard.
翻译:符号回归(SR)是学习数学表达形式的数据模型的任务,根据其性质,SR模型具有准确性和人与人同时解释的潜力。不幸的是,找到这样的模型,即执行SR,似乎是一项计算密集的任务。历史上,SR是用贪婪或基因算法等重力处理的,虽然有些作品暗示SR可能很强硬,但还没有证据证明SR事实上是坚硬的NP。这引出了一个问题:是否有精确的多米时算法来计算SR模型?我们提供证据表明,如果显示SR是坚硬的,答案可能是否定的。