Training a robot that engages with people is challenging; it is expensive to directly involve people in the training process, which requires numerous data samples. This paper presents an alternative approach for resolving this problem. We propose a human path prediction network (HPPN) that generates a user's future trajectory based on sequential robot actions and human responses using a recurrent-neural-network structure. Subsequently, an evolution-strategy-based robot training method using only the virtual human movements generated using the HPPN is presented. It is demonstrated that our proposed method permits sample-efficient training of a robotic guide for visually impaired people. By collecting only 1.5 K episodes from real users, we were able to train the HPPN and generate more than 100 K virtual episodes required for training the robot. The trained robot precisely guided blindfolded participants along a target path. Furthermore, using virtual episodes, we investigated a new reward design that prioritizes human comfort during the robot's guidance without incurring additional costs. This sample-efficient training method is expected to be widely applicable to future robots that interact physically with humans.
翻译:培训一个与人接触的机器人是具有挑战性的; 直接让人参与培训过程需要大量数据样本, 费用昂贵。 本文提出了解决这一问题的替代方法。 我们建议建立一个人类路径预测网络( HPPN), 利用一个经常性的神经网络结构, 产生一个用户未来轨迹, 其基础是连续的机器人动作和人类反应。 随后, 演示了一种基于进化战略的机器人培训方法, 仅使用使用使用 HPPN 生成的虚拟人类运动。 证明我们建议的方法允许对视力受损者进行机器人指南的抽样高效培训。 通过从实际用户收集1.5K片子, 我们得以培训HPPN, 并生成了100多本机器人培训所需的虚拟片子。 训练有素的机器人在目标路径上精确地引导了蒙住眼睛的参与者。 此外, 我们利用虚拟场景, 调查了一种新的奖励设计, 将机器人指导期间的人类舒适度置于优先位置, 而不产生额外费用 。 这一样本高效的培训方法预计将广泛适用于未来的机器人与人类进行物理互动的机器人 。