基于强化学习的适应性计量经济学 (Reinforcement learning based adaptive metaheuristics)

Parameter adaptation, that is the capability to automatically adjust an algorithm's hyperparameters depending on the problem being faced, is one of the main trends in evolutionary computation applied to numerical optimization. While several handcrafted adaptation policies have been proposed over the years to address this problem, only few attempts have been done so far at applying machine learning to learn such policies. Here, we introduce a general-purpose framework for performing parameter adaptation in continuous-domain metaheuristics based on state-of-the-art reinforcement learning algorithms. We demonstrate the applicability of this framework on two algorithms, namely Covariance Matrix Adaptation Evolution Strategies (CMA-ES) and Differential Evolution (DE), for which we learn, respectively, adaptation policies for the step-size (for CMA-ES), and the scale factor and crossover rate (for DE). We train these policies on a set of 46 benchmark functions at different dimensionalities, with various inputs to the policies, in two settings: one policy per function, and one global policy for all functions. Compared, respectively, to the Cumulative Step-size Adaptation (CSA) policy and to two well-known adaptive DE variants (iDE and jDE), our policies are able to produce competitive results in the majority of cases, especially in the case of DE.

翻译：参数适应,即根据所面临问题自动调整算法超参数的能力,是适用于数字优化的进化计算的主要趋势之一。虽然多年来提出了几项手工设计的适应政策来解决这一问题,但迄今为止,在应用机器学习来学习此类政策方面,只做了很少尝试。在这里,我们根据最新强化学习算法引入了一个通用框架,用于在连续多功能计量学中进行参数调整。我们展示了这一框架在两种算法上的适用性,即两种算法,即常变矩阵适应进化战略(CMA-ES)和差异进化战略(DE),为此我们分别学习了步进制适应政策(CMA-ES)以及比例因子和交叉率(DE)。我们用一套46个基准功能在不同维度上对这些政策进行了培训,并在两种环境下对政策提供了各种投入:每个功能一个政策,所有功能都有一个全球政策。分别与累积的逐步适应进化进化矩阵(CMA-ES)政策和差异进化(DE)政策(DE)相比,在两种公认的适应性变式中,我们最能产生的案件(DE),特别是多数变式。