Symbolic regression (SR) is the problem of learning a symbolic expression from numerical data. Recently, deep neural models trained on procedurally-generated synthetic datasets showed competitive performance compared to more classical Genetic Programming (GP) algorithms. Unlike their GP counterparts, these neural approaches are trained to generate expressions from datasets given as context. This allows them to produce accurate expressions in a single forward pass at test time. However, they usually do not benefit from search abilities, which result in low performance compared to GP on out-of-distribution datasets. In this paper, we propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure using a context-aware neural mutation model, which is initially pre-trained to learn promising mutations, and further refined from successful experiences in an online fashion. The approach demonstrates state-of-the-art performance on the well-known \texttt{SRBench} benchmark.
翻译:符号回归(SR) 是从数字数据中学习象征性表达方式的问题。 最近,在程序生成合成数据集方面受过培训的深神经模型与比较古典遗传方案(GP)算法相比,表现出了竞争性的性能。 与它们的GP对等方法不同, 这些神经方法经过培训, 可以从作为上下文的数据集中产生表达方式。 这使得它们能够在测试时用单个前方的传球来产生准确的表达方式。 但是, 它们通常不会从搜索能力中受益, 这导致与GP相比,在流出数据集方面的性能较低。 在本文中, 我们提出了一个新颖的方法, 以蒙特- Carlo 树搜索程序为基础, 提供了两个世界的最好效果, 其基础是使用一种符合环境的神经突变模型, 最初是接受过背景的训练, 学习有前途的突变, 并以在线方式从成功的经验中进一步加以完善。 该方法展示了众所周知的\ textt{SRBench} 基准上的最新表现。