While the field of Quality-Diversity (QD) has grown into a distinct branch of stochastic optimization, a few problems, in particular locomotion and navigation tasks, have become de facto standards. Are such benchmarks sufficient? Are they representative of the key challenges faced by QD algorithms? Do they provide the ability to focus on one particular challenge by properly disentangling it from others? Do they have much predictive power in terms of scalability and generalization? Existing benchmarks are not standardized, and there is currently no MNIST equivalent for QD. Inspired by recent works on Reinforcement Learning benchmarks, we argue that the identification of challenges faced by QD methods and the development of targeted, challenging, scalable but affordable benchmarks is an important step. As an initial effort, we identify three problems that are challenging in sparse reward settings, and propose associated benchmarks: (1) Behavior metric bias, which can result from the use of metrics that do not match the structure of the behavior space. (2) Behavioral Plateaus, with varying characteristics, such that escaping them would require adaptive QD algorithms and (3) Evolvability Traps, where small variations in genotype result in large behavioral changes. The environments that we propose satisfy the properties listed above.
翻译:虽然质量差异领域已发展成一个独特的质量差异优化分支,但少数问题,特别是交通和导航任务,已成为事实上的标准。这些基准是否足够?它们是否代表了QD算法面临的关键挑战?它们是否通过适当地将质量差异与他人区分开来,提供了集中应对一个特定挑战的能力?在可变性和普遍性方面,它们是否具有很大的预测力?现有基准没有标准化,目前QD没有等效的MNIST。在加强学习基准方面最近开展的工作的启发下,我们认为,确定QD方法面临的挑战和制定针对性、挑战性、可扩展但可负担的基准是重要的一步。作为初步努力,我们确定了在微薄报酬环境中具有挑战性的三个问题,并提出相关的基准:(1) 贝哈维尔标准偏差,其原因可能是使用与行为空间结构不相匹配的衡量尺度。 (2) 不同特征的Behavior Platoos,因此,摆脱它们需要适应性QD方法面临的挑战,以及制定有针对性的、挑战性、可扩展性但可承受的基准。(3) 作为初步努力,我们找出了在稀少的薪酬环境中具有挑战性的三个问题,并提出相关的基准:(1) 标准偏差度偏差。