The NeurIPS 2020 Procgen Competition was designed as a centralized benchmark with clearly defined tasks for measuring Sample Efficiency and Generalization in Reinforcement Learning. Generalization remains one of the most fundamental challenges in deep reinforcement learning, and yet we do not have enough benchmarks to measure the progress of the community on Generalization in Reinforcement Learning. We present the design of a centralized benchmark for Reinforcement Learning which can help measure Sample Efficiency and Generalization in Reinforcement Learning by doing end to end evaluation of the training and rollout phases of thousands of user submitted code bases in a scalable way. We designed the benchmark on top of the already existing Procgen Benchmark by defining clear tasks and standardizing the end to end evaluation setups. The design aims to maximize the flexibility available for researchers who wish to design future iterations of such benchmarks, and yet imposes necessary practical constraints to allow for a system like this to scale. This paper presents the competition setup and the details and analysis of the top solutions identified through this setup in context of 2020 iteration of the competition at NeurIPS.
翻译:NeurIPS 2020 Procgen Procgen Procgen Procgen Procent Procent Procent Procent Company 设计为中央基准,其中明确规定了衡量强化学习中抽样效率和普及情况的任务; 普及化仍然是深层强化学习中最根本的挑战之一,然而,我们没有足够基准来衡量社区在强化学习中普遍化方面的进展; 我们提出了强化学习中集中基准的设计,该基准的制定有助于衡量强化学习中的抽样效率和普及情况,从而结束对数千名用户以可扩展方式提交的代码基础的培训和推出阶段的评估; 我们为已有的Procgen 基准的顶端设计了基准,界定了明确的任务,并将最终评估设置的结束标准化; 设计的目的是为想要设计此类基准未来迭代的研究人员最大限度地扩大现有灵活性,同时为扩大这种系统的规模规定了必要的实际限制; 本文介绍了竞争设置,并详细分析了在2020年NeurIPS竞争中通过这一设置的顶端解决方案。