Real-time learning is crucial for robotic agents adapting to ever-changing, non-stationary environments. A common setup for a robotic agent is to have two different computers simultaneously: a resource-limited local computer tethered to the robot and a powerful remote computer connected wirelessly. Given such a setup, it is unclear to what extent the performance of a learning system can be affected by resource limitations and how to efficiently use the wirelessly connected powerful computer to compensate for any performance loss. In this paper, we implement a real-time learning system called the Remote-Local Distributed (ReLoD) system to distribute computations of two deep reinforcement learning (RL) algorithms, Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), between a local and a remote computer. The performance of the system is evaluated on two vision-based control tasks developed using a robotic arm and a mobile robot. Our results show that SAC's performance degrades heavily on a resource-limited local computer. Strikingly, when all computations of the learning system are deployed on a remote workstation, SAC fails to compensate for the performance loss, indicating that, without careful consideration, using a powerful remote computer may not result in performance improvement. However, a carefully chosen distribution of computations of SAC consistently and substantially improves its performance on both tasks. On the other hand, the performance of PPO remains largely unaffected by the distribution of computations. In addition, when all computations happen solely on a powerful tethered computer, the performance of our system remains on par with an existing system that is well-tuned for using a single machine. ReLoD is the only publicly available system for real-time RL that applies to multiple robots for vision-based tasks.
翻译:实时学习对于机器人代理器适应不断变化的、非静止的环境至关重要。 机器人代理器的常见设置是同时使用两种不同的计算机:一种资源有限的本地计算机系系于机器人,一种强大的远程计算机无线连接。 鉴于这种设置,尚不清楚学习系统的性能在多大程度上会受到资源限制的影响,以及如何有效地使用无线连接的强大计算机来补偿任何性能损失。 在本文中,我们实施了一个实时学习系统,称为远程-本地分配(ReLOD)系统,用于分配两个深度强化学习(RL)算法的计算:一个资源有限的本地计算机系系系于机器人,一个强大的本地计算机系系系系系于无线连接的计算机系系系(SAC)和Proximal Political D(PPPPO) 。 该系统的性能评估基于视觉的两种控制任务,使用机器人和移动机器人。 我们的结果表明,SAC的性能只能严重退化于一个资源有限的本地计算机。 Stritings, 当所有计算系统都完全安装在远程工作站上, Sopol- cridealal lavelop laction ex lavelop ex laction laction ex ex lavelop ex ex ex ex ex laveleval ex ex ex ex ex ex lavelopational laut ex ex ex ex ex ex lautututal ex ex ex ex laututututus lavelvical ex ex lautusal lautal ex ex ex ex lautal lautal lautal lautal laxal lautal lautal lautus ex ex labal labal labal labal lautal ex labal labal lauts) labal lautus ex ex ex ex ex ex lautal lautal lautal ex ex ex ex ex ex ex labal lautal exal exal exal ex ex ex ex ex ex