Reward-based optimization algorithms require both exploration, to find rewards, and exploitation, to maximize performance. The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given sparingly, thus rendering it unsuitable for guiding the search process. In this work, we introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm, capable of efficiently exploring a search space, as well as optimizing rewards found in potentially disparate areas. Contrary to existing emitters-based approaches, SERENE separates the search space exploration and reward exploitation into two alternating processes. The first process performs exploration through Novelty Search, a divergent search algorithm. The second one exploits discovered reward areas through emitters, i.e. local instances of population-based optimization algorithms. A meta-scheduler allocates a global computational budget by alternating between the two processes, ensuring the discovery and efficient exploitation of disjoint reward areas. SERENE returns both a collection of diverse solutions covering the search space and a collection of high-performing solutions for each distinct reward area. We evaluate SERENE on various sparse reward environments and show it compares favorably to existing baselines.
翻译:以奖励和开发为目的的优化优化算法要求进行探索、寻找奖励和开发,以最大限度地提高绩效。在少有的奖励环境中,高效探索的需要更为重要,因为业绩反馈很少,因此不适合指导搜索过程。在这项工作中,我们引入了通过新星和 Emeriters (SERENE) 的SparsE 奖励探索算法,它能够有效地探索搜索空间,并优化在潜在差异地区发现的收益。与现有基于排放者的做法相反,SERENE将搜索空间探索和奖励开发分为两个交替过程。第一个过程通过新星搜索进行探索,一种不同的搜索算法。第二个过程利用了通过排放者发现的奖励领域,即以人口为基础的优化算法的当地实例。一个元时间表分配了全球计算预算,在两个过程之间交替进行,确保发现和高效利用不协调的奖励地区。SERENE返回了涵盖搜索空间的多种解决方案集,并为每个不同的奖励领域收集高绩效的解决方案。我们评估SERENE在各种分散的基线上,并展示现有基准。