Sim-and-real training is a promising alternative to sim-to-real training for robot manipulations. However, the current sim-and-real training is neither efficient, i.e., slow convergence to the optimal policy, nor effective, i.e., sizeable real-world robot data. Given limited time and hardware budgets, the performance of sim-and-real training is not satisfactory. In this paper, we propose a Consensus-based Sim-And-Real deep reinforcement learning algorithm (CSAR) for manipulator pick-and-place tasks, which shows comparable performance in both sim-and-real worlds. In this algorithm, we train the agents in simulators and the real world to get the optimal policies for both sim-and-real worlds. We found two interesting phenomenons: (1) Best policy in simulation is not the best for sim-and-real training. (2) The more simulation agents, the better sim-and-real training. The experimental video is available at: https://youtu.be/mcHJtNIsTEQ.
翻译:模拟和真实培训是机器人操纵的模拟到真实培训的一个有希望的替代办法,然而,目前的模拟和真实培训既不有效,即与最佳政策缓慢趋同,也不有效,即大量真实世界机器人数据。鉴于时间和硬件预算有限,模拟和真实培训的绩效并不令人满意。在本文中,我们提议为操纵者选取和真实世界的可比业绩提供基于共识的Sim-Real深层强化学习算法(CSAR ) 。在这个算法中,我们培训模拟器和真实世界的代理商,以获得对模拟和真实世界的最佳政策。我们发现两个有趣的现象:(1) 模拟的最佳政策并非对模拟和真实培训的最佳政策。(2) 更多的模拟代理商、更好的模拟和真实培训。实验视频见:https://yotu.be/mcHJtNIsTEQ。</s>