改进基于模型的遗传方案拟定工作以促进小表达体的象征性倒退 (Improving Model-based Genetic Programming for Symbolic Regression of Small Expressions)

The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, i.e., the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR.

翻译：基因库最佳混合进化算术(GOMA)是一个基于模型的EA框架,在包括基因方案(GP)在内的多个领域表现良好。与传统的EA(GP)不同,GOMA在基因类型(即链接)中学习了一个相互依存模式,以估计传播模式。在文章中,我们研究了GOMA在符号回归(SR)中开展的链接学习(LL)的作用。我们发现,GOMA在GP人群中不统一地分配基因类型,对LLL不利,并提出了纠正的方法。我们还提出了在使用微量随机常数时改进LI的方法。此外,我们调整了一个内分流模式,以减轻调整人口规模的负担,LLL是SR的关键参数。我们在10个现实世界数据集上进行了实验,对解决方案的规模施加了严格的限制,从而能够解释。我们发现新的LL方法在GPA标准常数常数中也超越了GOA的标准, 并且通过GOA和GA的竞争性解决方案,我们发现,我们发现新的LA方法也使GOA形成一种标准型的GA。