There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.
翻译:随着数字健康领域对个性化服务需求的不断增长,使用强化学习(RL)进行数字健康领域的个性化服务成为了一个热点。这样的序列决策问题需要根据用户的上下文(例如,先前的活动水平、位置等)而做出关于何时进行治疗以及如何进行治疗的决策。在线RL是一个很有前途的数据驱动方法,它基于每个用户的历史响应进行学习,并利用这些知识来个性化地进行决策。但是,为了判断RL算法是否应该被包括在一个“优化”的干预中以进行现实世界的部署,我们必须评估表明RL算法实际上正在将治疗个性化地适用于其用户的数据证据。由于RL算法中的随机性,人们可能会得出虚假的结论认为在特定状态下,它正在进行学习,并利用这种学习提供特定的治疗。我们使用一个工作定义来刻画个性化,并引入一个基于重抽样的方法来研究RL算法个性化是否是RL算法随机性的产物。我们通过分析一个名为HeartSteps的身体活动临床试验的数据来演示我们的方法。我们展示了我们的方法如何增强算法个性化数据驱动的“真实性”(即应用在所有用户和特定用户中的个性化效果)。