With the goal to provide absolute lower bounds for the best possible running times that can be achieved by $(1+\lambda)$-type search heuristics on common benchmark problems, we recently suggested a dynamic programming approach that computes optimal expected running times and the regret values inferred when deviating from the optimal parameter choice. Our previous work is restricted to problems for which transition probabilities between different states can be expressed by relatively simple mathematical expressions. With the goal to cover broader sets of problems, we suggest in this work an extension of the dynamic programming approach to settings in which the transition probabilities cannot necessarily be computed exactly, but in which they can be approximated numerically, up to arbitrary precision, by Monte Carlo sampling. We apply our hybrid Monte Carlo dynamic programming approach to a concatenated jump function and demonstrate how the obtained bounds can be used to gain a deeper understanding into parameter control schemes.
翻译:我们最近建议了一种动态的编程方法,用以计算最佳预期的运行时间和偏离最佳参数选择时所推断的遗憾值。我们以前的工作仅限于处理不同国家间过渡概率可以用相对简单的数学表达方式表示的问题。为了涵盖更广泛的问题组,我们建议在本工作中将动态编程方法扩大到一些环境,在这些环境中,过渡概率不一定精确计算,但可以通过蒙特卡洛取样以数字方式加以估计,达到任意的精确度。我们把混合的蒙特卡洛动态编程方法应用于组合式跳跃功能,并展示如何利用所获得的界限来加深对参数控制计划的了解。