Reinforcement learning is applied to solve actual complex tasks from high-dimensional, sensory inputs. The last decade has developed a long list of reinforcement learning algorithms. Recent progress benefits from deep learning for raw sensory signal representation. One question naturally arises: how well do they perform concerning different robotic manipulation tasks? Benchmarks use objective performance metrics to offer a scientific way to compare algorithms. In this paper, we present RMBench, the first benchmark for robotic manipulations, which have high-dimensional continuous action and state spaces. We implement and evaluate reinforcement learning algorithms that directly use observed pixels as inputs. We report their average performance and learning curves to show their performance and stability of training. Our study concludes that none of the studied algorithms can handle all tasks well, soft Actor-Critic outperforms most algorithms in average reward and stability, and an algorithm combined with data augmentation may facilitate learning policies. Our code is publicly available at https://github.com/xiangyanfei212/RMBench-2022, including all benchmark tasks and studied algorithms.
翻译:强化学习用于解决来自高维、感官投入的实际复杂任务。 过去十年已经开发了一个长长的强化学习算法清单。 最近的进展得益于对原始感官信号代表的深层次学习。 一个自然产生的问题是:它们在不同机器人操纵任务方面的表现如何? 基准使用客观的性能衡量标准来提供比较算法的科学方法。 在本文中,我们介绍了机器人操纵的第一个基准RMBench, 机器人操纵具有高维持续行动和国家空间。 我们实施和评估直接使用观测到的像素作为投入的强化学习算法。 我们报告其平均性能和学习曲线,以显示其培训的性能和稳定性。 我们的研究结论是,所研究的算法中没有一个能够很好地处理所有任务,软的Acor-Critic在平均报酬和稳定性方面超越大多数算法,与数据增强相结合的算法可以促进学习政策。 我们的代码在https://github.com/xiangyanfei212/RMBench-2022, 包括所有基准任务和研究算法。