Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (Predictability, Computability, Stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning (Yu and Kumbier, 2020), to the design of RL algorithms for the digital interventions setting. Further, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We illustrate the use of the PCS framework for designing an RL algorithm for Oralytics, a mobile health study aiming to improve users' tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.
翻译:在线强化学习(RL)算法越来越多地用于将移动卫生和在线教育领域的数字干预措施个性化化,在这些环境中设计和测试RL算法的共同挑战包括:确保RL算法能够在实时限制下学习和稳定运行,并对环境的复杂性进行核算,例如,缺乏准确的用户动态的精确机械模型;为指导如何应对这些挑战,我们推广PCS(可预测性、可比较性、稳定性)框架,一个数据科学框架,将机器学习和统计数据的最佳做法纳入监督学习(Yu和Kumbier,2020年),用于设计数字干预设置的RL算法。此外,我们就如何设计模拟环境提供了指导方针,这是利用PCS框架评估RL候选算法的关键工具。我们介绍了使用PCS框架设计RL口腔学算法的情况,这是一项流动健康研究,目的是通过个人化的干预信息传递来改进用户的牙印行为。口腔学将在2022年底进入实地。