Estimation-of-distribution algorithms (EDAs) are optimization algorithms that learn a distribution on the search space from which good solutions can be sampled easily. A key parameter of most EDAs is the sample size (population size). If the population size is too small, the update of the probabilistic model builds on few samples, leading to the undesired effect of genetic drift. Too large population sizes avoid genetic drift, but slow down the process. Building on a recent quantitative analysis of how the population size leads to genetic drift, we design a smart-restart mechanism for EDAs. By stopping runs when the risk for genetic drift is high, it automatically runs the EDA in good parameter regimes. Via a mathematical runtime analysis, we prove a general performance guarantee for this smart-restart scheme. This in particular shows that in many situations where the optimal (problem-specific) parameter values are known, the restart scheme automatically finds these, leading to the asymptotically optimal performance. We also conduct an extensive experimental analysis. On four classic benchmark problems, we clearly observe the critical influence of the population size on the performance, and we find that the smart-restart scheme leads to a performance close to the one obtainable with optimal parameter values. Our results also show that previous theory-based suggestions for the optimal population size can be far from the optimal ones, leading to a performance clearly inferior to the one obtained via the smart-restart scheme. We also conduct experiments with PBIL (cross-entropy algorithm) on two combinatorial optimization problems from the literature, the max-cut problem and the bipartition problem. Again, we observe that the smart-restart mechanism finds much better values for the population size than those suggested in the literature, leading to a much better performance.
翻译:估算分布算法( EDAs) 是最优化的算法, 它在搜索空间上学习了一种分布方法, 从而可以很容易地进行抽样。 大多数 EDA 的关键参数是样本大小( 人口规模 ) 。 如果人口规模过小, 更新概率模型会建立在少数样本上, 导致基因漂移的不理想效应。 人口规模过大, 避免基因漂移, 却减缓了这一过程。 基于最近对人口规模如何导致基因流动的定量分析, 我们设计了一个智能启动机制。 在基因漂移风险高时停止运行, 多数 EDA 的关键参数是样本大小( 人口规模 ) 。 通过数学运行时间分析, 我们证明这种智能模式的总体性能保证了这种智能重新启动计划。 这特别表明, 在许多情况下, 最佳( 具体风险) 参数值为人们所知道, 重新启动计划会自动发现这些现象, 导致以微调的最佳性能表现。 我们还进行了广泛的实验性能分析。 在四个典型的基准问题上, 我们清楚地观察了人口规模的关键性影响,, 也明显地看到, 以最精确的模型的性能显示我们最优化的性能表现。