We present \emph{MDP Playground}, an efficient testbed for Reinforcement Learning (RL) agents with \textit{orthogonal} dimensions that can be controlled independently to challenge agents in different ways and obtain varying degrees of hardness in generated environments. We consider and allow control over a wide variety of dimensions, including \textit{delayed rewards}, \textit{rewardable sequences}, \textit{density of rewards}, \textit{stochasticity}, \textit{image representations}, \textit{irrelevant features}, \textit{time unit}, \textit{action range} and more. We define a parameterised collection of fast-to-run toy environments in \textit{OpenAI Gym} by varying these dimensions and propose to use these for the initial design and development of agents. We also provide wrappers that inject these dimensions into complex environments from \textit{Atari} and \textit{Mujoco} to allow for evaluating agent robustness. We further provide various example use-cases and instructions on how to use \textit{MDP Playground} to design and debug agents. We believe that \textit{MDP Playground} is a valuable testbed for researchers designing new, adaptive and intelligent RL agents and those wanting to unit test their agents.
翻译:我们提出\ emph{ MDP 游戏场}, 一种有效的强化学习( RL) 代理器的测试, 具有\ textit{ orthoonal} 维度, 可以独立控制, 以不同的方式挑战代理器, 并在生成环境中获得不同程度的硬度。 我们考虑并允许对范围广泛的各种维度进行控制, 包括\ textit{ 延迟的奖励},\ textit{ 可评级的序列},\ textit{ stocity},\ textit{ image respresulation},\ textit{ 相关特性},\ textit{ 时间单位},\ textitilit{ 动作范围 和更多。 我们通过这些维度来定义一个参数化的快速运行的托盘环境的集合, 并提议将这些维度序列用于代理器的初始设计和开发。 我们还提供将这些维值维度维度的维度/ 度/ testristrudestrual destrual destrital 。