分析下采样与选择之间的相互作用 (Analyzing the Interaction Between Down-Sampling and Selection)

Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection. However, evaluating populations on large training sets can be computationally expensive. Down-sampling training sets has long been used to decrease the computational cost of evaluation in a wide range of application domains. Indeed, recent studies have shown that both random and informed down-sampling can substantially improve problem-solving success for GP systems that use the lexicase parent selection algorithm. We use the PushGP framework to experimentally test whether these down-sampling techniques can also improve problem-solving success in the context of two other commonly used selection methods, fitness-proportionate and tournament selection, across eight GP problems (four program synthesis and four symbolic regression). We verified that down-sampling can benefit the problem-solving success of both fitness-proportionate and tournament selection. However, the number of problems wherein down-sampling improved problem-solving success varied by selection scheme, suggesting that the impact of down-sampling depends both on the problem and choice of selection scheme. Surprisingly, we found that down-sampling was most consistently beneficial when combined with lexicase selection as compared to tournament and fitness-proportionate selection. Overall, our results suggest that down-sampling should be considered more often when solving test-based GP problems.

翻译：遗传规划系统通常使用大型训练集以评估候选解的质量，从而进行选择。然而，在大型训练集上评估种群可能计算代价高昂。下采样训练集长期以来已被用于降低在广泛应用领域中评估的计算成本。事实上，最近的研究表明，随机和有序下采样都可以大大提高使用词典案例(parent)选择算法的GP系统的问题解决成功率。我们使用PushGP框架在两个其他常用选择方法(比例适应和锦标赛选择)的背景下对这些下采样技术进行实验性测试，涵盖了八个GP问题(四个程序综合和四个符号回归)。我们验证了下采样可以有利于比例适应和锦标赛选择的问题解决成功。然而，下采样改善问题解决成功的问题数量因选择方案而异，这表明下采样的影响取决于问题和选择方案的选择。令人惊讶的是，我们发现在与锦标赛和比例适应选择相比时，下采样与词典案例选择结合最具一致性地起到了益处。总之，我们的结果表明，在解决基于测试的GP问题时，应更多地考虑下采样。