Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search (e.g. using reinforcement learning) and genetic programming. In this work, we introduce a hybrid neural-guided/genetic programming approach to symbolic regression and other combinatorial optimization problems. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component, gradually learning better starting populations. On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. We demonstrate that running many genetic programming generations without interdependence on the neural-guided component performs better for symbolic regression than alternative formulations where the two are more strongly coupled. Finally, we introduce a new set of 22 symbolic regression benchmark problems with increased difficulty over existing benchmarks. Source code is provided at www.github.com/brendenpetersen/deep-symbolic-optimization.
翻译:符号回归是确定符合黑盒进程观测输出的数学表达式的过程。 它是一个离散的优化问题, 一般认为是 NP- 硬化的。 之前解决问题的方法包括神经引导搜索( 使用强化学习) 和基因编程。 在这项工作中, 我们对符号回归和其他组合优化问题采用了混合神经引导/基因编程方法。 我们提议一个神经引导构件, 用来为随机重启基因编程组件的初始人口播种, 逐渐学习更好的起始人口。 关于从数据集中恢复基本表达式的一些共同基准任务, 我们的方法恢复了比最近公布的使用同一实验设置的顶级表现模型多65%的表达式。 我们证明, 运行许多没有神经引导构件的基因编程代人对于象征性回归比其他配件( 两者结合更紧密) 更好。 最后, 我们提出一套新的22个象征性回归基准问题, 其难度高于现有基准。 源代码见www.github. com/ brendenersen/ deep- symblic-oplic-opimization) 。