In the context of teleoperation, arbitration refers to deciding how to blend between human and autonomous robot commands. We present a reinforcement learning solution that learns an optimal arbitration strategy that allocates more control authority to the human when the robot comes across a decision point in the task. A decision point is where the robot encounters multiple options (sub-policies), such as having multiple paths to get around an obstacle or deciding between two candidate goals. By expressing each directional sub-policy as a von Mises distribution, we identify the decision points by observing the modality of the mixture distribution. Our reward function reasons on this modality and prioritizes to match its learned policy to either the user or the robot accordingly. We report teleoperation experiments on reach-and-grasping objects using a robot manipulator arm with different simulated human controllers. Results indicate that our shared control agent outperforms direct control and improves the teleoperation performance among different users. Using our reward term enables flexible blending between human and robot commands while maintaining safe and accurate teleoperation.
翻译:在远程操作中,仲裁是指决定如何将人类和自主机器人命令混合在一起。我们提出了一个强化学习解决方案,在机器人遇到任务中的决定点时,学习一种最佳仲裁战略,将更多的控制权分配给人类。一个决定点是机器人遇到多种选择(次政策)的地方,例如有多重途径绕过障碍或者在两个候选目标之间作出决定。我们通过将每个指导子政策表达为 von Mises 分布方式,通过观察混合物分配方式,确定决定点。我们对这一模式的奖励功能,以及优先排序,以便将其学习的政策与用户或机器人相匹配。我们报告使用机器人操纵器和不同的模拟人类控制器对伸缩和抓取物体进行远程操作试验。结果显示,我们共同的控制代理器超越了直接控制,提高了不同用户的远程合作性。使用我们的奖励术语,可以灵活地将人类和机器人命令混合,同时保持安全和准确的远程操作。