In human-robot cooperation, the robot cooperates with the human to accomplish the task together. Existing approaches assume the human has a specific goal during the cooperation, and the robot infers and acts toward it. However, in real-world environments, a human usually only has a general goal (e.g., general direction or area in motion planning) at the beginning of the cooperation which needs to be clarified to a specific goal (e.g., an exact position) during cooperation. The specification process is interactive and dynamic, which depends on the environment and the behavior of the partners. The robot that does not consider the goal specification process may cause frustration to the human partner, elongate the time to come to an agreement, and compromise or fail team performance. We present Evolutionary Value Learning (EVL) approach which uses a State-based Multivariate Bayesian Inference method to model the dynamics of goal specification process in HRC, and an Evolutionary Value Updating method to actively enhance the process of goal specification and cooperation formation. This enables the robot to simultaneously help the human to specify the goal and learn a cooperative policy in a Reinforcement Learning manner. In experiments with real human subjects, the robot equipped with EVL outperforms existing methods with faster goal specification processes and better team performance.
翻译:在人类-机器人合作中,机器人与人类合作,共同完成任务; 现有方法假定,人类在合作期间有具体目标,而机器人的推算和行为则与此有关; 然而,在现实世界环境中,人类通常只在合作开始时有一个总体目标(例如,一般方向或活动规划领域),需要在合作期间对具体目标(例如,确切位置)加以澄清; 规格过程是互动和动态的,取决于环境和合作伙伴的行为; 不考虑目标规格过程的机器人可能会给人类伙伴带来挫折,延长达成协议的时间,妥协或失败团队业绩; 我们提出进化价值学习方法,采用基于国家的多变量贝叶素学方法,为人权理事会的目标规格过程的动态建模,采用进化价值更新方法,积极加强目标规格与合作形成过程; 使不考虑目标规格过程的机器人能够同时帮助人类确定目标,学习合作政策,在加强学习方式中延长时间,妥协或失败团队业绩; 我们提出进化价值学习方法,采用基于国家多变量的进化方法,用现有机器人的进化方法,以更快速的进化方法,用现有机器人的进化方法,以更先进的进化成更进制。