In robot-assisted therapy for individuals with Autism Spectrum Disorder, the workload of therapists during a therapeutic session is increased if they have to control the robot manually. To allow therapists to focus on the interaction with the person instead, the robot should be more autonomous, namely it should be able to interpret the person's state and continuously adapt its actions according to their behaviour. In this paper, we develop a personalised robot behaviour model that can be used in the robot decision-making process during an activity; this behaviour model is trained with the help of a user model that has been learned from real interaction data. We use Q-learning for this task, such that the results demonstrate that the policy requires about 10,000 iterations to converge. We thus investigate policy transfer for improving the convergence speed; we show that this is a feasible solution, but an inappropriate initial policy can lead to a suboptimal final return.
翻译:在对自闭症光谱障碍患者的机器人辅助治疗中,如果治疗疗程期间治疗师必须手动控制机器人,其工作量就会增加。为了使治疗师能够专注于与人的互动,机器人应该更加自主,即它应该能够解释一个人的状况,并根据他们的行为不断调整其行动。在本文件中,我们开发了个人化的机器人行为模型,可用于在一项活动期间的机器人决策过程中使用;该行为模型是在从真实的互动数据中学习的用户模型的帮助下培训的。我们为此任务使用Q学习方法,这样,结果就表明政策需要大约10,000个迭代才能集中起来。因此,我们调查政策转移,以提高趋同速度;我们表明这是一个可行的解决方案,但是不适当的初始政策可以导致亚于最佳的最终回报。