In this paper, the robot-assisted Reminiscence Therapy (RT) is studied as a psychosocial intervention to persons with dementia (PwDs). We aim at a conversation strategy for the robot by reinforcement learning to stimulate the PwD to talk. Specifically, to characterize the stochastic reactions of a PwD to the robot's actions, a simulation model of a PwD is developed which features the transition probabilities among different PwD states consisting of the response relevance, emotion levels and confusion conditions. A Q-learning (QL) algorithm is then designed to achieve the best conversation strategy for the robot. The objective is to stimulate the PwD to talk as much as possible while keeping the PwD's states as positive as possible. In certain conditions, the achieved strategy gives the PwD choices to continue or change the topic, or stop the conversation, so that the PwD has a sense of control to mitigate the conversation stress. To achieve this, the standard QL algorithm is revised to deliberately integrate the impact of PwD's choices into the Q-value updates. Finally, the simulation results demonstrate the learning convergence and validate the efficacy of the achieved strategy. Tests show that the strategy is capable to duly adjust the difficulty level of prompt according to the PwD's states, take actions (e.g., repeat or explain the prompt, or comfort) to help the PwD out of bad states, and allow the PwD to control the conversation tendency when bad states continue.
翻译:在本文中,机器人协助的Reminiscoence Remissions Apressy (RT) 被研究为一种针对痴呆症患者的心理社会干预。 我们的目标是通过强化学习激励 PwD 谈话,为机器人设计一个对话策略。 具体地说, 为描述PwD对机器人行动的随机反应, 开发了一个PwD模拟模型, 其特征是不同PwD州之间的过渡概率, 包括反应相关性、 情感水平和混乱条件。 然后设计Q- 学习算法(QL), 以实现机器人的最佳对话策略。 目标是激励 PwD 尽可能多地交谈, 同时保持 PwD 状态的积极性。 在某些情况下, 实现的PwD 选择让 PwD 选择继续或改变话题, 或者停止交谈, 这样PwD 有控制感官的感觉。 为了达到这个目的, 标准 QL 算法将PwD 选择的影响力 继续纳入 QD 的反复对话策略 。 最后, 将 有效性 演示效果到 测试 的难度 到 测试 状态 的精确性, 度 测试 的 测试 显示 的 的 的 度 的 的 度 度 度 度 度 的 校验算为 。