Bayesian扰动喷射:机械操纵灵活政策学习强力模仿 (Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies for Robot Manipulation)

Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, applicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterisation of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility.

翻译：人类在执行任务时表现出各种有趣的行为特征,例如选择看似等效的最佳行动,在偏离最佳轨迹时采取恢复行动,或者根据感知风险调整行动。然而,模拟学习试图从人类演示观察中教机器人执行同样任务,但往往无法捕捉这种行为。具体地说,常用的学习算法体现了学习假设(如单一最佳行动)和实际人类行为(如多种最佳行动)之间的内在矛盾,从而限制了机器人的通用性、适用性和示范可行性。为解决这一问题,本文件建议设计模拟学习算法,重点是利用人类行为特征,从而体现捕捉和利用实际示范者行为特征的原则。本文介绍了第一个模仿学习框架,即Bayesian Disrurcurcurcation Intation(BDI),它通过纳入模型灵活性、强力和风险敏感性,从而确定人类行为特征的内在弹性,并使用演示性弹性的多动作政策,同时通过引入风险敏感性的准确度干扰政策,通过真实性风险分析,通过测试展示方式,通过真实性评估,通过真实性分析任务,显示我们的行为风险分析任务,通过演示任务,展示风险分析任务。