Currently, the genetic programming version of the gene-pool optimal mixing evolutionary algorithm (GP-GOMEA) is among the top-performing algorithms for symbolic regression (SR). A key strength of GP-GOMEA is its way of performing variation, which dynamically adapts to the emergence of patterns in the population. However, GP-GOMEA lacks a mechanism to optimize coefficients. In this paper, we study how fairly simple approaches for optimizing coefficients can be integrated into GP-GOMEA. In particular, we considered two variants of Gaussian coefficient mutation. We performed experiments using different settings on 23 benchmark problems, and used machine learning to estimate what aspects of coefficient mutation matter most. We find that the most important aspect is that the number of coefficient mutation attempts needs to be commensurate with the number of mixing operations that GP-GOMEA performs. We applied GP-GOMEA with the best-performing coefficient mutation approach to the data sets of SRBench, a large SR benchmark, for which a ground-truth underlying equation is known. We find that coefficient mutation can help re-discovering the underlying equation by a substantial amount, but only when no noise is added to the target variable. In the presence of noise, GP-GOMEA with coefficient mutation discovers alternative but similarly-accurate equations.
翻译:目前,基因库最佳混合进化算法(GP-GOMAA)的基因编程版本是象征性回归(SR)的顶级算法之一。GP-GOMA的主要力量是它进行变异的方式,它动态地适应人口模式的出现。然而,GP-GOMA缺乏优化系数的机制。在本文件中,我们研究了如何将最优化系数的简单方法纳入GP-GOMAA。我们特别考虑了高斯系数突变的两大变异。我们利用23个基准问题的不同环境进行了实验,并利用机器学习来估计系数变异物质的各个方面。我们发现,最重要的方面是系数变异尝试的数量需要与GPG-GMEA进行的混合操作数量相匹配。我们在SRBench数据集中采用了最高效的系数突变率方法,这是一个大型的SRBench标准,其基础方程式虽为人们所知道,但我们发现系数变异性系数只能帮助重新确定GOA的变异方程式的变异等式。