In genetic programming, an evolutionary method for producing computer programs that solve specified computational problems, parent selection is ordinarily based on aggregate measures of performance across an entire training set. Lexicase selection, by contrast, selects on the basis of performance on random sequences of training cases; this has been shown to enhance problem-solving power in many circumstances. Lexicase selection can also be seen as better reflecting biological evolution, by modeling sequences of challenges that organisms face over their lifetimes. Recent work has demonstrated that the advantages of lexicase selection can be amplified by down-sampling, meaning that only a random subsample of the training cases is used each generation. This can be seen as modeling the fact that individual organisms encounter only subsets of the possible environments, and that environments change over time. Here we provide the most extensive benchmarking of down-sampled lexicase selection to date, showing that its benefits hold up to increased scrutiny. The reasons that down-sampling helps, however, are not yet fully understood. Hypotheses include that down-sampling allows for more generations to be processed with the same budget of program evaluations; that the variation of training data across generations acts as a changing environment, encouraging adaptation; or that it reduces overfitting, leading to more general solutions. We systematically evaluate these hypotheses, finding evidence against all three, and instead draw the conclusion that down-sampled lexicase selection's main benefit stems from the fact that it allows the evolutionary process to examine more individuals within the same computational budget, even though each individual is examined less completely.
翻译:在基因编程中,一个用于制作计算机程序以解决特定计算问题的进化方法,父母选择通常以整个培训组的总体绩效衡量尺度为基础。不同的是,根据培训案例随机序列的性能进行选择;这证明在许多情况下加强了解决问题的能力。莱克卡选择也可以被视为更好地反映生物演变,办法是对生物在生命期内面临的挑战进行一系列的建模。最近的工作表明,通过下取样,可以扩大单数选择的优势,这意味着每一代只使用培训案例的随机子样本。这可以被视为模拟个体生物只遇到可能的环境的子集这一事实,以及环境会随着时间的变化而变化。在这里,我们提供了迄今为止最广泛的低抽样选择基准,表明生物在生命期中面临的一系列挑战的建模,表明其益处仍然有待加强。但是,从下游到下游,但从下游的角度分析包括让更多世代的训练案例与方案评估的同一预算进行处理。这可以模拟每个周期中发生的数据变化会减少,但从整个预算周期到整个预算周期的改变会减少。