按政策梯度从优化经验中得出的适应性差异演变算法 (Learning adaptive differential evolution algorithm from optimization experiences by policy gradient)

Differential evolution is one of the most prestigious population-based stochastic optimization algorithm for black-box problems. The performance of a differential evolution algorithm depends highly on its mutation and crossover strategy and associated control parameters. However, the determination process for the most suitable parameter setting is troublesome and time-consuming. Adaptive control parameter methods that can adapt to problem landscape and optimization environment are more preferable than fixed parameter settings. This paper proposes a novel adaptive parameter control approach based on learning from the optimization experiences over a set of problems. In the approach, the parameter control is modeled as a finite-horizon Markov decision process. A reinforcement learning algorithm, named policy gradient, is applied to learn an agent (i.e. parameter controller) that can provide the control parameters of a proposed differential evolution adaptively during the search procedure. The differential evolution algorithm based on the learned agent is compared against nine well-known evolutionary algorithms on the CEC'13 and CEC'17 test suites. Experimental results show that the proposed algorithm performs competitively against these compared algorithms on the test suites.

翻译：差异进化是针对黑盒问题最有声望的基于人口的随机优化算法之一。差异进化算法的性能高度取决于其突变和交叉战略及相关的控制参数。但是, 最合适的参数设置的确定过程是麻烦和耗时的。适应问题景观和优化环境的适应性控制参数方法比固定参数设置更可取。本文根据从一系列问题中学习优化经验,提出了一种新的适应性参数控制方法。在这种方法中, 参数控制模式是一个限定- orizon Markov 决策程序。强化学习算法, 名为政策梯度, 用于学习一个在搜索过程中能够提供拟议差异进化控制参数的代理( 参数控制器) 。基于学习的代理器的差别进化演算法与CEC' 13 和 CEC' 17 测试套件的九种众所周知的进化算法相比较。实验结果显示, 拟议的算法比测试套件中的这些比较算法具有竞争力。