动态环境中加强学习的加权递增进进进战略 (Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments)

Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing units (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation, while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move towards new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, Instance Weighted Incremental Evolution Strategies (IW-IES), is verified to achieve significantly improved performance on a suite of robot navigation tasks. This paper thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.

翻译：进化战略(ES)是黑盒优化算法的组合,最近作为强化学习(RL)方法(RL)的一种可扩缩的替代方法出现,如Q-学习或政策梯度等,由于更加平行化,许多中央处理单位(CPUs)可以使用,因此速度要快得多。在本文中,我们建议了动态环境中系统化的递增学习方法。目标是在环境变化时将以前学过的政策调整为一个新的,在环境变化时逐步调整为一个新的。我们加入了一个与ES的比重机制,以促进其学习适应,同时保留ES的可扩缩性。在更新参数时,对包含更多新知识的事例给予更高的加权,从而鼓励搜索分布向新的有希望的参数空间领域移动。我们提出了两种容易执行的衡量标准来计算重量:例如新颖和实例质量。试想新颖的度度度度度测量一个实例与原先环境的最佳度的差异,而实例的质量则与新环境中的表现相匹配。由此产生的算法、试想进进进进进进进战略(IW-IES)得到验证,以便大大改进在机器人动态导航空间空间的组合中进行快速学习。