关于现实世界机器人的模拟经过训练的政策的零热不确定性软件部署 (Zero-Shot Uncertainty-Aware Deployment of Simulation Trained Policies on Real-World Robots)

from arxiv, Accepted for a poster and spotlight presentation at Neurips 2021 Workshop on Deployable Decision Making in Embodied Systems (DDM). arXiv admin note: substantial text overlap with arXiv:2107.09822

While deep reinforcement learning (RL) agents have demonstrated incredible potential in attaining dexterous behaviours for robotics, they tend to make errors when deployed in the real world due to mismatches between the training and execution environments. In contrast, the classical robotics community have developed a range of controllers that can safely operate across most states in the real world given their explicit derivation. These controllers however lack the dexterity required for complex tasks given limitations in analytical modelling and approximations. In this paper, we propose Bayesian Controller Fusion (BCF), a novel uncertainty-aware deployment strategy that combines the strengths of deep RL policies and traditional handcrafted controllers. In this framework, we can perform zero-shot sim-to-real transfer, where our uncertainty based formulation allows the robot to reliably act within out-of-distribution states by leveraging the handcrafted controller while gaining the dexterity of the learned system otherwise. We show promising results on two real-world continuous control tasks, where BCF outperforms both the standalone policy and controller, surpassing what either can achieve independently. A supplementary video demonstrating our system is provided at https://bit.ly/bcf_deploy.

翻译：虽然深度强化学习(RL)代理机构在机器人实现超模行为方面表现出了令人难以置信的潜力,但由于培训和执行环境的不匹配,它们在实际部署时往往会出错;相反,古典机器人社区开发了能够安全地在现实世界大多数州运作的一系列控制器,由于它们有明确的衍生作用,这些控制器开发了在现实世界大多数州能够安全运作的控制器;然而,这些控制器缺乏复杂任务所需的灵活性,因为分析建模和近似方面的限制。在本文中,我们提议巴伊西亚主计长Fusion(BCF)(BCF)(BCF)(BCF)(BCF)(BCF)(BCF)(BCF)超越独立政策和控制器(BCF)(BCF)(BCF)(BCFC)(BCF)(BCF)(BCFC)(BCF)(BCFC)(BCFEF)(BC) (BC) (BC) (BCF) (BC) (BCFOS-perperfround) (一种新的不确定的部署战略,它将深海政策和传统手制成型控制器) (LUD) (LUD) (LUD) (LE) (LE) (H) (M) (LO) (H) (LE) (LE) (M) (M) (M) (M) (L) (L) (M) (M) (M) (M) (M) (M) (T) (M) (LEVEVD) (M) (M) (M) (M) (M) (M) (R) (T) (T) (M) (M) (M) (M) (M) (M) (M) (M) (M) (M) (M) (L) (M) (M) (T) (M) ) (M) (M) (M) (L) (M) (M) (M) (T) (T) (M) (M) (M) (M) (M) (M) (M) (M) (M) (R) (M) (M) (M) (M