Down-sampling training data has long been shown to improve the generalization performance of a wide range of machine learning systems. Recently, down-sampling has proved effective in genetic programming (GP) runs that utilize the lexicase parent selection technique. Although this down-sampling procedure has been shown to significantly improve performance across a variety of problems, it does not seem to do so due to encouraging adaptability through environmental change. We hypothesize that the random sampling that is performed every generation causes discontinuities that result in the population being unable to adapt to the shifting environment. We investigate modifications to down-sampled lexicase selection in hopes of promoting incremental environmental change to scaffold evolution by reducing the amount of jarring discontinuities between the environments of successive generations. In our empirical studies, we find that forcing incremental environmental change is not significantly better for evolving solutions to program synthesis problems than simple random down-sampling. In response to this, we attempt to exacerbate the hypothesized prevalence of discontinuities by using only disjoint down-samples to see if it hinders performance. We find that this also does not significantly differ from the performance of regular random down-sampling. These negative results raise new questions about the ways in which the composition of sub-samples, which may include synonymous cases, may be expected to influence the performance of machine learning systems that use down-sampling.
翻译:长期以来,已经显示下抽样培训数据可以提高一系列机器学习系统的通用性能。最近,下抽样在利用亲子选择技术的基因编程(GP)运行中证明是有效的。虽然这一下抽样程序表明可以大大改善各种问题的绩效,但似乎不是因为通过环境变化鼓励适应性而这样做的。我们假设,每一代随机抽样都造成不连续性,导致人口无法适应不断变化的环境。我们调查下抽样选择法的修改,希望通过减少世世代代环境之间的不连贯状态,促进环境的渐进变化,使变幻莫测演变。在经验研究中,我们发现,强迫递增环境变化对方案综合问题的演变解决办法来说并不比简单的随机下抽样要好得多。 对此,我们试图通过仅使用不连贯的下抽样来增加不连贯的不连续率,看它是否阻碍业绩。我们发现,这与定期随机结构的运行情况相比,这并没有很大差别。我们发现,在常规随机结构的生成结果中,这种结果可能会包括新的结果。