贝叶斯控制器融合: 利用控制先验在深度强化学习机器人学中 (Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics)

We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https://krishanrana.github.io/bcf.

翻译：我们提出了贝叶斯控制器融合(BCF): 一种混合控制策略，将传统的手动设计的控制器和无模型深度强化学习(RL)的优点相结合。BCF在机器人领域表现出色，在许多任务中都存在可靠但不尽如人意的控制先验，但从头开始的 RL仍然是不安全和数据低效的。通过融合每个系统的不确定性感知分布输出，BCF在它们之间仲裁控制，利用它们各自的优势。我们在两个现实世界的机器人任务中研究了BCF，包括在一个广阔且长期的环境中进行导航的任务，以及涉及可操作性最大化的一个复杂的到达任务。对于这两个域，存在简单的手动控制器，可以以风险规避的方式解决手头的任务，但不一定展现出最佳解决方案，由于分析建模、控制器误差校准和任务变化的限制。由于探索自然在训练的早期阶段受到先验的指导，BCF加速了学习，并且随着策略获得更多经验，显著提高了超出控制先验的性能。更重要的是，鉴于控制先验的风险规避性质，BCF确保安全的探索和部署，在策略不知道的状态下，控制先验自然地在行动分布中占主导地位。我们还展示了BCF在零-shot sim-to-real设置中的适用性，并且它的能力处理现实世界的分布状态。BCF是一种有前途的方法，可以将深度RL和传统的机器人控制的互补优势结合起来，超越任何一方独立地可以实现的效果。代码和补充视频材料可在 https://krishanrana.github.io/bcf 上公开获得。