This paper presents a benchmarking study of some of the state-of-the-art reinforcement learning algorithms used for solving two simulated vision-based robotics problems. The algorithms considered in this study include soft actor-critic (SAC), proximal policy optimization (PPO), interpolated policy gradients (IPG), and their variants with Hindsight Experience replay (HER). The performances of these algorithms are compared against PyBullet's two simulation environments known as KukaDiverseObjectEnv and RacecarZEDGymEnv respectively. The state observations in these environments are available in the form of RGB images and the action space is continuous, making them difficult to solve. A number of strategies are suggested to provide intermediate hindsight goals required for implementing HER algorithm on these problems which are essentially single-goal environments. In addition, a number of feature extraction architectures are proposed to incorporate spatial and temporal attention in the learning process. Through rigorous simulation experiments, the improvement achieved with these components are established. To the best of our knowledge, such a benchmarking study is not available for the above two vision-based robotics problems making it a novel contribution in the field.
翻译:本文对用于解决两个模拟视觉机器人问题的一些最新强化学习算法进行了基准研究。本研究中考虑的算法包括软演员-加速(SAC)、准政策优化(PPO)、内插政策梯度(IPG)及其变体与Hindsight 经验重现(HER)的变体。这些算法的性能与PyBullet的两个模拟环境(分别称为KukaDivioversobject Env和RizcarZEDGymEnv)的性能进行了比较。这些环境中的状态观测以RGB图像的形式提供,行动空间是连续的,因此难以解决。建议采取一些战略来提供执行HER算法所需的中间的后视目标,这些问题基本上是单一目标环境。此外,还提议将一些地貌提取结构在学习过程中纳入时空关注。通过严格的模拟实验,建立了这些组成部分的改进。根据我们的最佳知识,这种基准研究无法为以上基于视觉的机器人问题作出新的贡献。