This work re-implements the OpenAI Gym multi-goal robotic manipulation environment, originally based on the commercial Mujoco engine, onto the open-source Pybullet engine. By comparing the performances of the Hindsight Experience Replay-aided Deep Deterministic Policy Gradient agent on both environments, we demonstrate our successful re-implementation of the original environment. Besides, we provide users with new APIs to access a joint control mode, image observations and goals with customisable camera and a built-in on-hand camera. We further design a set of multi-step, multi-goal, long-horizon and sparse reward robotic manipulation tasks, aiming to inspire new goal-conditioned reinforcement learning algorithms for such challenges. We use a simple, human-prior-based curriculum learning method to benchmark the multi-step manipulation tasks. Discussions about future research opportunities regarding this kind of tasks are also provided.
翻译:这项工作在开放源码金字塔引擎上重新实施了最初以商业 Mujoco 引擎为基础的OpenAI Gym多目标机器人操纵环境。通过比较在两种环境中的“闪见经验帮助的深确定性政策分级”工具的性能,我们展示了我们对原始环境的成功重新实施。此外,我们为用户提供了新的API,以定制相机和内置相机获取联合控制模式、图像观察和目标。我们进一步设计了一套多步骤、多目标、长视宽和稀有奖励机器人操纵任务,目的是激励为此类挑战制定新的、有目标的强化学习算法。我们使用简单、以人为主的课程学习方法来为多步操作任务基准。我们还提供了关于这类任务的未来研究机会的讨论。