In human-robot cooperation, the robot cooperates with humans to accomplish the task together. Existing approaches assume the human has a specific goal during the cooperation, and the robot infers and acts toward it. However, in real-world environments, a human usually only has a general goal (e.g., general direction or area in motion planning) at the beginning of the cooperation, which needs to be clarified to a specific goal (i.e., an exact position) during cooperation. The specification process is interactive and dynamic, which depends on the environment and the partner's behavior. The robot that does not consider the goal specification process may cause frustration to the human partner, elongate the time to come to an agreement, and compromise team performance. This work presents the Evolutionary Value Learning approach to model the dynamics of the goal specification process with State-based Multivariate Bayesian Inference and goal specificity-related features. This model enables the robot to enhance the process of the human's goal specification actively and find a cooperative policy in a Deep Reinforcement Learning manner. Our method outperforms existing methods with faster goal specification processes and better team performance in a dynamic ball balancing task with real human subjects.
翻译:在人类机器人合作中,机器人与人类合作,共同完成任务。现有方法假定,人类在合作期间有具体目标,而机器人的推理和行为则与之相对应。然而,在现实世界环境中,人类通常只在合作开始时有一个总体目标(例如,一般方向或活动规划领域),这需要在合作期间按照具体目标(即确切位置)加以澄清。规格过程是互动和动态的,取决于环境和伙伴的行为。不考虑目标规格过程的机器人可能会给人类伙伴带来挫折,延长达成协议的时间,并折中团队业绩。这项工作展示了进化价值学习方法,用以模拟目标规格过程的动态,以基于国家的多变贝耶斯指数和目标的特定特征为模型。这一模型使机器人能够积极加强人类目标规格的过程,并在深度强化学习中找到合作政策。我们的方法超越了现有方法,以更快的目标规格过程和更好的团队表现,与动态的球主题相平衡。