争取建立强有力和域域不可知强化学习竞赛 (Towards robust and domain agnostic reinforcement learning competitions)

William Hebgen Guss,Stephanie Milani,Nicholay Topin,Brandon Houghton,Sharada Mohanty,Andrew Melnik,Augustin Harter,Benoit Buschmaas,Bjarne Jaster,Christoph Berganski,Dennis Heitkamp,Marko Henning,Helge Ritter,Chengjie Wu,Xiaotian Hao,Yiming Lu,Hangyu Mao,Yihuan Mao,Chao Wang,Michal Opanowicz,Anssi Kanervisto,Yanick Schraner,Christian Scheller,Xiren Zhou,Lu Liu,Daichi Nishio,Toi Tsuneda,Karolis Ramanauskas,Gabija Juceviciute

from arxiv, 20 pages, several figures, published PMLR

Reinforcement learning competitions have formed the basis for standard research benchmarks, galvanized advances in the state-of-the-art, and shaped the direction of the field. Despite this, a majority of challenges suffer from the same fundamental problems: participant solutions to the posed challenge are usually domain-specific, biased to maximally exploit compute resources, and not guaranteed to be reproducible. In this paper, we present a new framework of competition design that promotes the development of algorithms that overcome these barriers. We propose four central mechanisms for achieving this end: submission retraining, domain randomization, desemantization through domain obfuscation, and the limitation of competition compute and environment-sample budget. To demonstrate the efficacy of this design, we proposed, organized, and ran the MineRL 2020 Competition on Sample-Efficient Reinforcement Learning. In this work, we describe the organizational outcomes of the competition and show that the resulting participant submissions are reproducible, non-specific to the competition environment, and sample/resource efficient, despite the difficult competition task.

翻译：强化学习竞争已成为标准研究基准的基础,激发了最新技术的进步,并塑造了该领域的方向。尽管如此,大多数挑战都存在同样的根本性问题:参与者对挑战的解决方案通常是针对特定领域的,偏向于最大限度利用计算资源,不能保证可以复制。在本文件中,我们提出了一个新的竞争设计框架,促进制定克服这些障碍的算法。我们为此提出了四个核心机制:提交再培训、域随机化、通过域盘溶脱贫、限制竞争计算和环境抽样预算。为了展示这一设计的效率,我们提出、组织、组织并运行了《2020年地雷责任公约》关于抽样增强学习的竞争。在这项工作中,我们描述了竞争的组织结果,并表明,尽管竞争任务艰巨,但由此提交的参与者材料是可再生的、非竞争环境特有的、抽样/资源效率高的。