支持口腔自助护理的在线强化学习的奖励性学习等级表 (Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care)

Dental disease is one of the most common chronic diseases despite being largely preventable. However, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of the current action on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been made simple in order to run stably and autonomously in a constrained, real-world setting (i.e., highly noisy, sparse data). We address this challenge by designing a quality reward which maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics, an oral self-care app that provides behavioral strategies to boost patient engagement in oral hygiene practices.

翻译：牙科疾病是最常见的慢性疾病之一,尽管在很大程度上是可以预防的。然而,关于最佳口腔卫生做法的专业建议往往被病人遗忘或遗弃。因此,病人可以受益于及时和个性化的鼓励,以从事口腔自理行为。在本文件中,我们开发了在线强化学习算法(RL)算法,用于优化提供移动性口腔卫生行为,鼓励口腔卫生行为。开发这种算法的主要挑战之一是确保算法考虑到当前行动对未来行动效力的影响(即延迟效应),特别是当算法已经简单化,以便在受限制的现实世界环境中(即高度吵闹、数据稀少),可自行操作。我们通过设计一种质量奖励,最大限度地实现预期的健康结果(即高质量刷牙),同时尽量减少用户的负担。我们还强调一种程序,通过建立模拟环境测试床和用测试床对候选人进行评估,优化奖励的超参数。本文讨论的口腔卫生算法将部署在口腔卫生中,口腔卫生行为自我提升。