Bayesian 主计长融合:机器人深层强化学习中的杠杆控制前程 (Bayesian Controller Fusion: Leveraging Control Priors in Deep Reinforcement Learning for Robotics)

We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, there exist simple handcrafted controllers that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration \emph{and} deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF's applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real-world. BCF is a promising approach for combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at \url{https://krishanrana.github.io/bcf}.

翻译：我们介绍了贝耶斯主计长Fusion(BCF):一种混合控制战略,将传统手工操作控制器和无模型深度强化学习(RL)的优势结合起来。BCF在机器人领域蓬勃发展,因为许多任务都存在可靠但不最优化的控制前科,但从零开始的RL仍然不安全,数据效率低下。通过利用每个系统的不确定性分布输出,BCF在它们之间进行控制,利用各自的优势。我们研究了两种真实世界机器人任务,涉及在广阔和长宽的轨道环境中进行导航,以及一项复杂的达标任务,涉及调控能力最大化。对于这两个领域来说,都存在简单手工操作控制器,能够以风险反向上,但是由于分析模型的局限性、控制错乱和任务变异异异。BCFCF在早期阶段自然地指导探索,随着政策经验的增多,BCFCF的运行能力大大改进了之前的运行。更重要的是,在风险和前期的部署中,我们把风险和前期的部署能力都显示为Blexield。