利用进化价值学习,为具有一般目标的任务建立人类-机器人合作 (Forming Human-Robot Cooperation for Tasks with General Goal using Evolutionary Value Learning)

In Human-Robot Cooperation (HRC), the robot cooperates with humans to accomplish the task together. Existing approaches assume the human has a specific goal during the cooperation, and the robot infers and acts toward it. However, in real-world environments, a human usually only has a general goal (e.g., general direction or area in motion planning) at the beginning of the cooperation, which needs to be clarified to a specific goal (e.g., an exact position) during cooperation. The specification process is interactive and dynamic, which depends on the environment and the partners' behavior. The robot that does not consider the goal specification process may cause frustration to the human partner, elongate the time to come to an agreement, and compromise or fail team performance. We present the Evolutionary Value Learning (EVL) approach, which uses a State-based Multivariate Bayesian Inference method to model the dynamics of the goal specification process in HRC. EVL can actively enhance the process of goal specification and cooperation formation. This enables the robot to simultaneously help the human specify the goal and learn a cooperative policy in a Deep Reinforcement Learning (DRL) manner. In a dynamic ball balancing task with real human subjects, the robot equipped with EVL outperforms existing methods with faster goal specification processes and better team performance.

翻译：在人类机器人合作(HRC)中,机器人与人类合作,共同完成任务; 现有方法假定人类在合作期间有具体目标,而机器人的推算和行为则与之相对应; 然而,在现实世界环境中,人类通常只在合作开始时有一个总体目标(例如,一般方向或活动规划领域),这需要在合作期间按照具体目标(例如,确切位置)加以澄清; 规格过程是互动和动态的,这取决于环境和合作伙伴的行为; 不考虑目标规格过程的机器人可能会给人类伙伴带来挫折,延长达成协议的时间,妥协或失败团队业绩; 我们介绍进化价值学习(EVL)方法,该方法使用基于国家的多变性贝耶斯理论方法,为人权理事会目标规格过程的动态(例如,确切位置)进行示范; EVL可积极加强目标规格与合作形成过程; 使不考虑目标规格过程的机器人能够同时帮助人类指定目标,学习合作政策,在深度强化学习过程中,延长时间,妥协或失败团队业绩表现; 我们介绍进化价值学习(EVL)方法,该方法以更快速的方式,使现有的机器人团队平衡现有变形工作; 动态任务,以更精细的机器人工作,以更精细的机器人。