Deep reinforcement learning (DRL) is a promising approach to solve complex control tasks by learning policies through interactions with the environment. However, the training of DRL policies requires large amounts of training experiences, making it impractical to learn the policy directly on physical systems. Sim-to-real approaches leverage simulations to pretrain DRL policies and then deploy them in the real world. Unfortunately, the direct real-world deployment of pretrained policies usually suffers from performance deterioration due to the different dynamics, known as the reality gap. Recent sim-to-real methods, such as domain randomization and domain adaptation, focus on improving the robustness of the pretrained agents. Nevertheless, the simulation-trained policies often need to be tuned with real-world data to reach optimal performance, which is challenging due to the high cost of real-world samples. This work proposes a distributed cloud-edge architecture to train DRL agents in the real world in real-time. In the architecture, the inference and training are assigned to the edge and cloud, separating the real-time control loop from the computationally expensive training loop. To overcome the reality gap, our architecture exploits sim-to-real transfer strategies to continue the training of simulation-pretrained agents on a physical system. We demonstrate its applicability on a physical inverted-pendulum control system, analyzing critical parameters. The real-world experiments show that our architecture can adapt the pretrained DRL agents to unseen dynamics consistently and efficiently.
翻译:深入强化学习(DRL)是一个很有希望的方法,通过与环境互动学习政策,解决复杂的控制任务;然而,DRL政策的培训需要大量培训经验,直接在物理系统上学习政策不切实际,因此不切实际。 模拟到现实方法利用模拟来预先制定DRL政策,然后在现实世界中部署这些政策。 不幸的是,由于不同的动态,即所谓的现实差距,直接实际部署预先培训的政策通常会因为业绩恶化而受到影响。最近的一些模拟到现实的方法,如域随机化和域适应,侧重于提高预先培训的代理人的稳健性。然而,模拟培训政策往往需要与现实世界数据相适应,才能达到最佳性效绩,而由于真实世界样本成本高昂,这是具有挑战性的。这项工作提出一个分布式的云端结构,在现实世界中培训DR代理人员时,将前的推力和培训分配给边缘和云层,将实时控制循环与计算成本高昂的培训循环分开。要克服现实世界的数据差距,我们的架构在实际系统上利用了对正向方向分析工具的升级。